Geeking with Greg: An update on Google's infrastructure

Friday, November 12, 2010

An update on Google's infrastructure

Google Fellow Jeff Dean gave a talk at Stanford for the EE380 class with fascinating details on Google's systems and infrastructure. Krishna Sankar has a good summary along with a link to video (Windows Media) of the talk.

Some fun statistics in there. Most amazing are the improvements in data processing that have gotten them to the point that, in May 2010, 4.4M MapReduce jobs consumed 39k machine years of computation and processed nearly an exabyte (1k petabytes) of data that month. Remarkable amount of data munching going on at Google.

The talk is an updated version of Jeff's other recent talks such as his LADIS 2009 keynote, WSDM 2009 keynote, and 2008 UW CS lecture.

[HT, High Scalability, for the pointer to Krishna Sankar's notes]

4 comments:

Alice said...: 39k machine years in one month...doesn't that mean almost half a million MR machines (assuming an unrealistic 100% utilization)? That's a lot of hardware.; November 13, 2010 at 12:58 AM
Greg Linden said...: Well, on the one hand, the machines probably have four cores (so 1/4 the machines), but the average utilization rate is probably a lot lower than 100%, probably more like 20-30%. So, I'd guess that 500k+ machines is a decent rough estimate for machines dedicated to MapReduce in the Google data centers based on the data they released.

What do others think? Roughly 500k physical machines a good estimate?; November 13, 2010 at 7:33 AM
Alice said...: I would guess they would have 8 to 16 cores...is a machine year the same as a core year? It's an odd unit of measurement.; November 13, 2010 at 12:11 PM
Anonymous said...: Let's say 8 to 24 cores per system, and more like 1 million motherboards.; November 15, 2010 at 11:28 AM

Geeking with Greg

Friday, November 12, 2010

An update on Google's infrastructure

4 comments:

About Me

Subscribe to the Feed

More Geeking with Me

Blog Archive

This is Personal