Thursday, August 05, 2004

Feedster: An engineer's personal hell

Scott Johnson (VP of Engineering at Feedster) has a desperate sounding post about dealing with quality, reliability, and scaling issues at Feedster:
    I don't think anyone out there will deny that Feedster is, sadly, not delivering the best possible quality these days. Although our complexity, features and traffic have all grown dramatically -- our QA resources have not ... I no longer have the ability to reliably predict that "If I add feature X, feature Y will stil function correctly". More likely its like "feature Z will decide to take the weekend off, feature Q will go on a diet and feature X.12 will turn around, moon me and then give me the bird". Now Feedster is a highly interlinked system and the levels of isolation that perhaps should be there just aren't.
Feedster and Technorati are fantastic blog search engines, more targeted and useful than Google for finding weblogs and weblog articles. Lately, because of Technorati's own scaling issues, I've been tending to use Feedster a lot more. I haven't noticed quality or performance issues with Feedster, but it does sound like they're struggling.

My advice to Scott is basically the standard stuff. Add automated tests, constantly refactor the code and your architecture, and keep your code as simple and easy to maintain as possible. Unit and other automated tests allow you to quickly check for unexpected behavior after making a change. Use them in addition, not instead of, manual QA. Constant refactoring means redesigning the code around a change any time you touch the code to clean up the interfaces, reorganize the components, and reduce dependencies. Generally plan on spending half your time refactoring any time you go in. Other than avoiding balls of mud using constant refactoring, the other part of keeping your code easy to maintain is to avoid undocumented complexities like lengthy regular expressions, weird special cases, or cryptic algorithms. If it's not immediately obvious what the code does and why it's there, stick a comment by it that explains what the code does and why it's there.

Some might recommend taking time for a big rearchitecture project. "Stop doing anything, freeze the code, and rewrite everything," they'll say. I'd recommend against that approach. I've never seen anyone successfully deliver a rearchitecture project that had no other purpose but "cleaning up the code." You can't optimize without a goal to optimize toward. Rearchitecture should always be done as part of a larger goal. Want better performance? Rearchitect the code while seeking better performance. Adding feature X.12? Rearchitect the code to fit the new feature in gracefully.

No comments: