Friday, January 27, 2006

Early Amazon: Inventory cache

Like the projects at Google that have come out of 20% time, what people are supposed to be working on at Amazon sometimes could be less important than what they played with on the side.

I was working on a few projects, but I wanted to step out and learn more parts of the code. Just reading code gets to be dull. I needed a specific task in mind, a purpose that forced me to shine my light into the back corners of the source.

So, in idle moments, I wandered off looking for performance optimizations. Focusing on the high traffic pages -- home page, book detail pages, search results -- I asked, where was big bad obidos spending its time?

I turned up some interesting tidbits. The first thing I found had to do with shopping carts.

When you walk into a the grocery store, the first thing you probably do is grab a shopping cart. Similarly, the first thing Amazon did when someone appeared at our store was hand them a shopping cart, reserving a little bit of space in our database for them to store all their virtual loot.

However, grocery stores don't have to contend with robot hordes or other window shoppers. If they did, they would have to have a lot more shopping carts around, and almost all those shopping carts would be empty.

Given all the looky-loos, it makes more sense for Amazon to wait a bit and quickly slip a shopping cart into your hands when you grab the first thing you want to buy.

This little change helped more than you might think. All those shopping carts add up.

But a bigger issue was the real time availability lookups. When you looked at a book at Amazon, the site went off to the warehouse, rummaged among the shelves, and checked if we had any copies. If it turned up bupkis, it checked how quickly we could order the book. All in real time.

This turned out to be the single most expensive operation on a book detail page. Ugly business, checking availability.

But, do you really need to know the availability right now? Maybe knowing what it was N minutes ago is okay. Huh, right, cache the data. It's okay if it is a little stale.

Because I was doing this on my own time, I started playing with some less obvious methods of doing availability caching. I thought that, given how much this would be hammered by the site, I might try to find a way to minimize locking. I also thought that I might be able to load the cache preemptively, so there would be no delays to shoppers on the site when refreshing the parts of the cache.

I hacked up something that seemed to worked well. In tests, latency to a shopper on the website dropped from entirely too long to very near zero. I was starting to talk to a few other people about the prototype, asking what they thought, seeing how it could be improved.

Right about then, several other people were working on a major redesign of the Amazon site, some combination of an extreme makeover and new features. I was approached by someone who wanted to show book availability on search result pages, something that was completely impossible without caching, but would be possible if my quick prototype could be dressed up and pushed out the door. And out it went.

Of course, all of this is obsolete at this point. Back when I built the inventory cache, it was designed for one small Seattle warehouse and a single big honkin' iron webserver. The massive inventories across several huge distribution centers -- some of which can swallow thirteen football fields and come back for seconds -- combined with a switch to a cluster of commodity webservers eventually made the old cache inappropriate. It lasted well beyond its time, so long that the heroics of its youth lay forgotten under the problems of its senility.

Today, I look back at the inventory cache as just one of many examples of the benefits of time to wander. 20% time has value well beyond its proportions.

2 comments:

Anonymous said...

Your blog posts are far extremely interesting. Looking forward to reading more about Amazon.

Anonymous said...

Hi Greg!

I am graduate student from India and find your posts on Amazon *very* interesting. The 'free hand' given in the start up to anyone and everyone willing to contribute would indeed be amazing. Your write-ups on the posts about how infectious the enthusiasm in trying to 'help the world' (to buy books) is indeed riveting. I log on to Blog-lines several times a day to check whether you have posted another one of your wonderful adventures at Amazon.

On a completely different note, I find the user-interface at findory.com to be not very inviting...it is simple and minamalist but not very "inviting" - asking the user to click this or that...

Nevertheless, I wish you all the very best with Findory.com.