Friday, April 30, 2004

Google Cluster Architecture

Another good paper describing Google's cluster in a fair amount of detail.

One fascinating part of the paper describes the "docservers" that provide the snippet of the search result that contains your keyword. These docservers require "access to an online, low-latency copy of the entire Web. in fact, because of replication ..., Google stores dozens of copies of the Web on its servers."

While the docservers probably only need to store text copies of the page (no images, flash, html, etc.) and can compress the data, it's still a staggeringly large amount of data. Google quite literally stores a copy of the entire web to implement their snippet feature.

No comments: