Thursday, February 21, 2008

Yahoo deploys large scale Hadoop cluster

Yahoo's Eric Baldeschwieler reports that Yahoo is now running a 10k+ core Hadoop cluster that holds over 5 petabytes of data.

Very cool. It appears Yahoo has now achieved most of the Hadoop design requirements they laid out back in July 2006.

Note, however, that 10k cores is quite a bit smaller than Google's cluster appears to be. Google used 11k machine years of computation in their cluster in Sept 2007, suggesting that they have at least 133k machines in their clusters (and likely substantially more).

Please see also my July 2006 post, "Yahoo building Google FS clone?", and my April 2007 post, "Yahoo Pig and Google Sawzall".

1 comment:

Anonymous said...

So 2500 quad core (or 2x dual core) machines, each with 4x 500GB drives, most likely.

133k servers is a pretty serious underestimate of google's scale, but their servers aren't all in one cluster, obviously.