BigTable is a system for storing and managing very large amounts of structured data. The system is designed to manage several petabytes of data distributed across thousands of machines, with very high update and read request rates coming from thousands of simultaneous clients.Sounds very interesting. The talk will be broadcast live on the internet.
See also my previous posts on Google's cool distributed architecture including their file system, cluster, and the distributed data processing tools MapReduce and Sawzall.
By the way, if anyone can find a paper on BigTable, please let me know. I couldn't find one.
Update: The talk was interesting but a little different than I expected.
BigTable stores a distributed, replicated sparse matrix of data. For example, for their crawler, you might have a BigTable matrix with a row "com.cnn.www:WORLD/:http" that contains information about the world news page from CNN. A column for that row might be labeled "content:" and contain the content for that page. Another column might be "language:" and contain "EN" for English. BigTable allows each cell in the matrix to have timestamped data, so a history of changes for the cell can be maintained easily.
It is not, as I was first expecting, a structured distributed database like some Googlized version of MySQL Cluster. That's not what Google needs.
The kinds of data processing tasks that Google has to do everyday require extremely high performance and reliability, but only weak guarantees on data consistency. No database like this exists, so Google had to build their own, BigTable.
Looking at BigTable and Google's other tools, I think Brian Dennis was right when he called them "major force multipliers." Tools like these enable Google to move faster, build more, and learn more than their competitors.
Update: Andrew Hitchcock posted a nice summary of the talk.
Update: The talk is available on Google Video.
Update: Eleven months later, Google has published a paper on Bigtable.