Monday, November 16, 2009

Put that database in memory

An upcoming paper, "The Case for RAMClouds: Scalable High-Performance Storage Entirely in DRAM" (PDF), makes some interesting new arguments for shifting most databases to serving entirely out of memory rather than off disk.

The paper looks at Facebook as an example and points out that, due to aggressive use of memcached and caches in mysql, the memory they use already is about "75% of the total size of the data (excluding images)." They go on to argue that a system designed around in-memory storage with disk just used for archival purposes would be much simpler, more efficient, and faster. They also look at examples of smaller databases and note that, with servers getting to 64G of RAM and higher and most databases just a couple terabytes, it doesn't take that many servers to get everything in memory.

An excerpt from the paper:
Developers are finding it increasingly difficult to scale disk-based systems to meet the needs of large-scale Web applications. Many people have proposed new approaches to disk-based storage as a solution to this problem; others have suggested replacing disks with flash memory devices.

In contrast, we believe that the solution is to shift the primary locus of online data from disk to random access memory, with disk relegated to a backup/archival role ... [With] all data ... in DRAM ... [we] can provide 100-1000x lower latency than disk-based systems and 100-1000x greater throughput .... [while] eliminating many of the scalability issues that sap developer productivity today.
One subtle but important point the paper makes is that the slow speed of current databases have made web applications both more complicated and more limited than they should be. From the paper:
Traditional applications expect and get latency significantly less than 5-10 μs ... Because of high data latency, Web applications typically cannot afford to make complex unpredictable explorations of their data, and this constrains the functionality they can provide. If Web applications are to replace traditional applications, as has been widely predicted, then they will need access to data with latency much closer to what traditional applications enjoy.

Random access with very low latency to very large datasets ... will not only simplify the development of existing applications, but they will also enable new applications that access large amounts of data more intensively than has ever been possible. One example is ... algorithms that must traverse large irregular graph structures, where the access patterns are ... unpredictable.
The authors point out that data access patterns currently need to be heavily optimized, carefully ordered, and must conservatively acquire extra data in case it is later needed, all things that mostly go away if you are using a database where access has microsecond latency.

While the authors do not go as far as to argue that memory-based databases are cheaper, they do argue that they are cost competitive, especially once developer time is taken into account. It seems to me that you could go a step further here and argue very low latency databases brings such large productivity gains to developers and benefits to application users that they are in fact cheaper, but the paper does not try to do that.

If you don't have time to read the paper, slides (PDF) are also available that are very quick to skim from a talk by one of the authors.

If you can't get enough of this topic, please see my older post, "Replication, caching, and partitioning", which argues that big caching layers, such as memcached, are overdone compared to having each database shard serve most data out of memory.

HT, James Hamilton, for first pointing to the RAMClouds slides.


Daniel Lemire said...

I am not sure I understand their proposal, or rather, the novelty. Many web services run entirely on RAM already, except for a "backup/archival role" as they put it. It seems to me that they are presenting a newly funded research project (hence the very long list of authors), rather than the results of their research. But maybe I misunderstand.

Nevertheless, this is representative of an interesting trend in academia. For years, researchers were focused on external memory databases. The assumptions were that RAM was expensive and most databases exceeded by far the available RAM. This created many fun problems. This has changed to the point that external-memory algorithms appear "old school".

Greg Linden said...

Hi, Daniel. Absolutely, not much novelty in the general idea of in-memory databases. I'm sure we can both think of many off the top of our heads, and the related work section in the paper lists many in-memory databases from decades back to now.

I found a couple things to be new, at least in my experience. First, the argument that disk-based databases are obsolete is a little more extreme here than I've seen before. They argue that almost all databases will be in-memory in the coming years.

Second, the argument for the benefits of lower latency are much broader than I have seen before. Normally, I see people thinking of exclusively in-memory databases for specialized purposes (such as very small data sets or graph algorithms), otherwise sticking to databases with caching layers (e.g. Facebook). This paper argues that the productivity and performance benefits are widespread and apply to all web applications.

So, right, no novelty in the idea of in-memory databases, but some novelty in the argument that just about everything should be an in-memory database.

Ghalib said...

Maybe I'm wrong (and please do correct me if I am), but I was always under the impression that Google has been operating in this manner (i.e. putting everything in memory) for most of its existence.

Greg Linden said...

Hi, Ghalib. Yep, that's pretty much right for Google search. For example, Jeff Dean said Google switched to keeping indexes entirely in memory a while back. And they also keep a snapshot of the entire web available in memory to do the snippets in the search results.

Google's Bigtable, a more general purpose database used by many Google applications, also allows data to marked as to be kept in memory in the cluster. Bigtable is referenced in the paper.

In-memory databases are not novel. What might be novel is the claim that almost everything eventually will switch to in-memory databases.

Anonymous said...

I wonder how in-memory DB can be cheaper than disk-bases. Typical commodity server has 8Gb of RAM, let's bump it up to 16Gb. Typical HDD is around 400Gb. To store equivalent amount of data you need 25 hosts. Not taking into account that with RAM storages replication is a must so you most-likely need to triple this number. You can lover 400GB to 160Gb or raise 16 to 32 it does not make that big difference.
With reliability of a single host around 0.995, a cluster of 25 (no replication) hosts has reliability of 0.88
Administration, monitoring and management suddenly becomes a nightmare. And these is recurring operational cost as oppose to one-time development cost.

Peter Evans-Greenwood said...

Given that the move to in-memory DBs is driven by economic factors, it's hard to predict where it will go.

A couple of years ago there was a strong shift to in memory DBs in the enterprise as the cost of DRAM plumetted, driving many high performance sites (like Google) to move the normative data set into memory, with disk for failover. This shift is being tempered today by plumeting prices for SRAM/DRAM disks. While rotating storage might be in decline, external storageas a whole seems to have a bit of life left in it.

In this instance, the academic community seems to be out of touch with (and well behind) what is happening at the coal face.

Unknown said...

Interesting thoughts!
For large graphs with unpredictable traversing patterns normally that might be a valid approach. However, using e.g. a Graph Database, the data will get cached as it is accessed, not more. Given enough RAM that will lead to the performance of an in-memory structure, but not require any special configuration.

Disclaimer - I'm part of the Neo4j project

Anonymous said...

TimesTen anyone?

Ghalib said...

Understood; thanks for the clarification.

mightybyte said...

The Haskell web framework Happstack is designed around this idea. It came out several years ago and is still under active development. There are websites running today built on this framework and philosophy.

Anonymous said...

In memory DBs aren't novel and I definitely see advantages with OLTP type systems. But what about heavy decision support systems? The improvement of the cache on the disk i/o subsystems seems to negate a lot of the latency with the disks. Where I see the issues with performance is really the logical i/os where the design didn't meet getting the data quickly or the explain plan in Oracle could be improved with optimising the code.

Unknown said...

As always it comes down to using the right tool for the right job. And to simply say "all databases should live in memory" is just as short sighted as saying "no databases should live in memory".

The first think you need to do is understand your data and its usage.

The second thing is to define your requirements (response times, dataset size, # users, etc)

Once you have those two you can make a better informed decision about how you need to design your architecture.

Greg Linden said...

Hi, Igor. I think the paper is looking at the future when servers will have 64G of RAM or more. The argument essentially comes down to that the datasets needed by web applications grow at a slower rate than the growth of memory available in servers, so, at some point soon, most web application databases will fit in the memory of a small cluster of servers.

Anonymous, right, they're only talking about web application databases. MapReduce-style data mining of petabyte data sets like weblogs will still require disk-based solutions.

Jonathan, absolutely, I agree that we should use the right tool for the right job. The point the authors are trying to make, if I have it correct, is that in-memory databases will soon be the right tool for many more jobs than before.

Daniel Lemire said...

I definitely see advantages with OLTP type systems. But what about heavy decision support systems?

QlikTech does just fine selling RAM-based DSS (

(Non-disclaimer: I am not affiliated or related with them in any way.)

burtonator said...

I think this is pretty conventional industry wisdom by now.... 2-4 years ago this wasn't the case.

Also, to people arguing price here. You shouldn't price this out as $ per GB... you should price it as $ per IOPS.

Memory crushes disk subsystems in $ per IOPS.

Ironically, we're headed back to disk I think (in the form of SSD/flash).

At least for the short term, the main interface to flash is going to be SSD.

I suspect that with the new 6Gbit SATA standard that we won't need PCIe as hard so the Intel SSD (and SSD vendors) won't see as much pressure from FusionIO...

SSD isn't perfect for everyone.... it's not perfectly awesome if you're doing condition pushdown. If you're scanning data locally at 1GB/s then SSD will be maxed out..... memory can do this without a problem.

Anonymous said...

There is Microsoft Velocity and Oracle Coherence, too.