Wednesday, May 10, 2006

MySQL Cluster and my big virtual database

MySQL Cluster is an in-memory database running across a cluster of machines that tries to be robust to failures of nodes.

I had been looking at it off and on for a while. My curiosity was peaked when I watched Stewart Smith's talk on Google Video, "A Googly MySQL Cluster Talk", and investigated a few questions I had after that talk.

Unsurprisingly, MySQL Cluster is not a giant, simple, virtual database that runs transparently over a cluster, nodes dropping in an out of service at will, with read-write replication and data migration all done automatically and robustly. MySQL Cluster is cool, but it is not quite that cool.

The core design behind MySQL Cluster seems to have started with one thought: Instead of storing tables in the local filesystem, what if we stored them in the memory of nearby machines on the network?

This is a really neat idea. I remember some fun research work a while back that tried to exploit the fact that it is an order of magnitude faster for a process to access data from RAM on a remote machine than it is to access data from local disk. Yes, disks are that slow.

Most of this research work focused on experiments with paging out to the free memory of other machines on the network instead of local disk, but the idea still seems similar to what the MySQL folks are doing with MySQL Cluster.

So, I am guessing MySQL folks started by deciding to experiment with a new type of in-memory storage engine, one that would say, this data isn't in this machine's memory, but in that machine's memory over there.

Then, it looks like they added a bunch of stuff on top to try to make this robust. Logging of transactions out to the local filesystem, replicas of the table fragments, and goodies like that. MySQL Cluster was born.

There's a lot of hard work left to be done though. According to their limitations page, new data nodes cannot be added without a restart of the entire cluster, so you cannot add new boxes in to increase your capacity with the system live.

Other serious issues appear to include performance issues with some queries, windows of opportunity for data loss, potential for stray locks, temporary transaction failures when nodes are dropped or rebooted, complicated configuration and maintenance of the cluster, and other problems.

Even so, it is impressive what the MySQL folks have built. It may not be a big, virtual database, but it is a big step in the right direction.

Update: There are more details on how MySQL handles failures and recovery in a 2005 VLDB paper, "Recovery Principles of MySQL Cluster 5.1" (PDF).

5 comments:

Anonymous said...

Glad you found my talk useful. (suggestions and comments are welcome)

We're aware that people want to add storage nodes to a running cluster and we're working on it.

You can already add SQL nodes to a running cluster (with a rolling restart).

You can even see some bk commits on the subject :) (that will actually let you add nodes to a running cluster).

However, the devil is always in the detail. Namely, making sure this works with Cluster replication (replicating from one cluster to another), making sure we can deal with node and system failure. On top of that, being able to add nodes without a rolling restart would be great too.

The redistribution of data so that the new nodes are used (and doing this in a way that allows queries to run while we're doing it as well as being resilient to node and system failure) is also non-trivial.

It's a good thing we're a clever bunch of people :)

- stewart
http://www.flamingspork.com/blog/

Mike Kruckenberg said...

"So, I am guessing MySQL folks started by deciding to experiment with a new type of in-memory storage engine..."

To be clear, it wasn't MySQL folks that started this (although some of them are at MySQL now). The research and development of the clustering technology was done at Ericsson over many years and acquired in 2004 by MySQL AB who integrated the cluster into MySQL and have since been making improvements.

Greg Linden said...

Hi, Stewart! Thanks for commenting on this post!

I appreciate that you folks are hard at work on many improvements to MySQL Cluster, including adding data nodes without a restart.

This post was mostly intended to explore how close MySQL Cluster is to the holy grail of a big clustered database.

MySQL Cluster is very cool, a big step forward. Remarkable what you folks have built.

Markbnj said...

I'm quite impressed too, if I understand some of the implied implications, this could help to obliviate some of the need for those "big iron" servers, I used to run.

It would also seem that it is also a great leap forward for reliability, and especially since you say there is cluster replication that works in that!
what a great idea...
Now if I only had a big enough database to run it on!

James said...

i'll be working on setting up a cluster in the next couple of weeks! i'll keep an eye at that video