Friday, July 16, 2004

InfoWorld on RSS growing pains

Chad Dickerson at InfoWorld talks about scalability issues with RSS:
    InfoWorld.com sees a massive surge of RSS newsreader activity at the top of every hour, presumably because most people configure their newsreaders to wake up at that time to pull their feeds. If I didn't know how RSS worked, I would think we were being slammed by a bunch of zombies sitting on compromised home PCs. Our hourly RSS surge has all the characteristics of a distributed DoS attack, and although the requests are legitimate and small, the sheer number of requests in that short time period creates some aggravating scaling issues.
I commented earlier on a Wired article that expressed similar concerns about the scalability of the polling architecture of RSS.

There's a variety of ways to deal with this issue. The solution Chad seems to be suggesting is to randomize request times so that there aren't big spikes in traffic every hour at the hour. That's certainly a good idea. Clients should also respect the ttl (polling at the interval that is listed in the feed), support conditional GET, and handle 304 (not modified) responses to minimize the number of requests they make for the full feed.

But the primary solution will end up being caching. With the exception of personalized RSS feeds, RSS feeds easily can be cached. Web-based RSS readers like Bloglines and My Yahoo already only read the RSS feed once, cache it, and display it to multiple readers. But popular RSS feeds are also easily proxy cached just like web pages, reducing the load on the original source servers.

Thanks, RSS Weblog, for pointing out the InfoWorld article.

Update: An interesting discussion of RSS scaling over on Jeremy Zawodny's blog.

Update: Chad Dickerson follows up with an article summarizing all the suggestions he received. It's mostly the same as my suggestions above, but still probably worth skimming.

No comments: