Monday, December 18, 2006

Talk on eBay architecture

Randy Shoup and Dan Pritchett gave a talk on scaling eBay, "The eBay Architecture", at SD Forum 2006. The slides are available (PDF).

The parallels with Amazon are remarkable. Like Amazon, eBay started with a two-tiered architecture. Like Amazon, they split the website into a cluster in the late 1990's, followed soon after by partitioning the databases.

Like Amazon, they soon encountered poor performance and difficulty compiling their massive, monolithic binary (150M for eBay, Randy and Dan say). Like Amazon, they started a major rewrite of their monolithic binary around 2001, eventually building a services architecture on top of partitioned databases.

They even built their own search engine because "no off-the-shelf search engine met [their] needs." Amazon did that as well.

It is interesting that their new architecture basically gives up on transactional databases. They say eBay has "absolutely no client side transactions", "no distributed transactions", and "auto-commit for [the] vast majority of DB writes". Instead, they apparently use "careful ordering of DB operations". It sounds like mistakes happen in this system, because they mention running "asynchronous recovery events" and "reconciliation batch" jobs, which, I assume, means asynchronous processes run over the database repairing inconsistencies.

In all, a very interesting talk for anyone who is working or wants to work on big websites and big data. As Tim Bray said, "This ought to be required reading for everyone in this business whose title contains the words 'Web' or 'Architect'."

See also Dan Pritchett's weblog post, "You Scaled Your What?", where he mentions his talk and these slides at the end.

See also some other interesting commentary ([1] [2] [3] [4]) on this talk.

For more on the early work I did scaling Amazon's systems, see my older post, "Early Amazon: Splitting the website". If you liked that, you might also like the rest of my Early Amazon series.

For more on what big companies like eBay, Amazon, and Google need and are not getting from databases, see my previous posts, "C-store and Google BigTable" and "I want a big, virtual database".

[Slides found via Rich Skrenta]

No comments: