Bigtable is a massive, clustered, robust, distributed database system that is custom built to support many products at Google. From the paper:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.The paper is quite detailed in its description of the system, APIs, performance, and challenges.
Bigtable is used by more than sixty Google products and projects, including Google Analytics, Google Finance, Orkut, Personalized Search, Writely, and Google Earth.
A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
On the challenges, I found this description of some of the real world issues faced particularly interesting:
One lesson we learned is that large distributed systems are vulnerable to many types of failures, not just the standard network partitions and fail-stop failures assumed in many distributed protocols.Make sure also to read the related work section that compares Bigtable to other distributed database systems.
For example, we have seen problems due to all of the following causes: memory and network corruption, large clock skew, hung machines, extended and asymmetric network partitions, bugs in other systems that we are using (Chubby for example), overflow of GFS quotas, and planned and unplanned hardware maintenance.
See also my previous posts, "Google's BigTable", "C-Store and Google BigTable", and "I want a big, virtual database".