A recent article from Microsoft in IEEE Computing, "Server Engineering Insights for Large-Scale Online Services" (
PDF), has surprisingly detailed information about the systems running Hotmail, Cosmos (Microsoft's MapReduce/Hadoop), and Bing.
For example, the article describes the scale of Hotmail's data as being "several petabytes ... [in] tens of thousands of servers" and the typical Hotmail server as "dual CPU ... two attached disks and an additional storage enclosure containing up to 40 SATA drives". The typical Cosmos server apparently is "dual CPU ... 16 to 24 Gbytes of memory and up to four SATA disks". Bing uses "several tens of thousands of servers" and "the main memory of thousands of servers" where a typical server is "dual CPU ... 2 to 3 Gbytes per core ... and two to four SATA disks".
Aside from disclosing what appear to be some previously undisclosed details about Microsoft's cluster, the article could be interesting because of insights into the performance of these clusters on the Hotmail, Bing, and Cosmos workloads. Unfortunately, the article suffers from taking too much as a given, not exploring the complexity of interactions between CPU, memory, flash memory, and disk in these clusters on these workloads, and not attempting to explain the many oddities in the data.
Those oddities are fun to think about though. To take a few that caught my attention:
- Why are Bing servers CPU bound? Is it because, as the authors describe, Bing uses "data compression on memory and disk data ... causing extra processing"? Should Bing be doing so much data compression that it becomes CPU bound (when Google, by comparison, uses fast compression)? If something else is causing Bing servers to be CPU bound, what is it? In any case, does it make sense for the Bing "back-end tier servers used for index lookup" to be CPU bound?
- Why do Bing servers
have only 4-6G RAM each not have more memory when they mostly want to keep indexes in memory, appear to be hitting disk, and are "not bound by memory bandwidth"? Even if the boxes are CPU bound, even if it somehow makes sense for them to be CPU bound, would more memory across the cluster allow them to do things (like faster but weaker compression) that would relieve the pressure on the CPUs? - Why is Cosmos (the batch-based log processing system) CPU bound instead of I/O bound? Does that make sense?
- Why do Cosmos boxes have
more the same memory than as Bing boxes when Cosmos is designed for sequential data access? What is the reason that Cosmos "services maintain much of their data in [random access] memory" if they, like Hadoop and MapReduce, are intended for sequential log processing? - If Hotmail is mostly "random requests" with "insignificant" locality, why is it designed around sequential data access (many disks) rather than random access (DRAM + flash memory)? Perhaps the reason that Hotmail is "storage bound under peak loads" is that it uses sequential storage for its randomly accessed data?
Thoughts?
Update: An anonymous commenter
points out that the Bing servers probably are two quad core CPUs -- eight cores total -- so, although there is only 2-3G per core, there likely is a total of 16-24G of RAM per box. That makes more sense and would make them similar to the Cosmos boxes.
Even with the larger amount of memory per Bing box, the questions about the machines still hold. Why are the Bing boxes CPU bound and should they be? Should Cosmos boxes, which are intended for sequential log processing, have the same memory as Bing boxes and be holding much of their data in memory? Why are Cosmos machines CPU bound rather than I/O bound and should they be?
Update: Interesting discussion going on
in the comments to this post.