Thursday, November 23, 2006

Amazon crashes itself with promotion ran a special promotion today that offered an Xbox system for $100, about 1/3 of the normal price, starting at 11am.

Broadband Reports posts about what happened:
So many people were waiting for the promotion that the entire Amazon website - not just the promotion page - sank without a trace from just before 2pm, to at least 2:12pm. The home page, the product pages, everything, were unavailable.
Sounds familiar. When I was at Amazon, every year we in engineering would try to avoid spikes in traffic, especially around peak holiday loads, and every year marketing folks would want to run some promotion specifically designed to create a mad frenzy on the site. Usually, we convinced them to change the promotion, but apparently engineering lost (or was asleep at the switch) this year.

Broadband Reports goes on to point out that this reflects badly on Amazon:
We wonder how many amazon shoppers elsewhere in the site abandoned their purchases halfway through after they found their experience destroyed by the vote rush going on in the next room ... Some people got quite irate.

The poor performance of the amazon site during the giveaway also reflects badly on the Amazon "elastic compute cloud" offering (Amazon EC2) which is designed, supposedly, to offer instant capacity to companies which need to deal with exactly this kind of sudden rush.
I don't think it quite works that way. A DDoS attack, which this effectively was, can generate way over the x10 peak load for which the website would be designed. Even so, it still is pretty lame for Amazon to DDoS itself.

It appears the contest is running again next week with the same structure. I wonder if Amazon will crash itself again?

Update: It appears Amazon is looking at changing the structure of this promotion to prevent another brownout. Currently, there is a message up that says, "Due to the popularity of Amazon Customers Vote, we are extending the Week 2 voting period. Customers who cast a vote will be sent an e-mail notification of the new sale date."

Update: Mike at TechDirt reports that "Amazon Cries 'Uncle' On Promotion Traffic" by changing the rules to prevent another outage.


Anonymous said...

How else could Amazon get people to stress-test their network (and their entire system) except by engineering a massive overload at a predictable time?

At a relatively small cost they can see exactly how the system reacts to high loads and where the points of failure are.

burtonator said...

Does this mean that S3 doesn't scale? :)

Anonymous said...

Any word on whether this affected Amazon's S3 or EC2 services?

Anonymous said...

Amazon doesn't use ec2 and s3 for its website. Also, you don't know where the bottleneck was - could have been an Oracle bug for you all we know.

Anonymous said...

I was one of the many people who tried to get at the Xbox360 deal. What amazed me was that other parts of the site were down as well as the promotion page. I would have expected better from them.

One thing I have noticed is that, when the website is down, Amazon's web services might still be responding. Last year, when I was doing a project with ECS, and the website was down, I was still able to query ECS. With regards to EC2, I have no idea...

Anonymous said...

I'm really curious to know what happened to EC2 and S3. I recently asked one of the leaders of the web services team why developers and web site owners should trust Amazon with their sites, data, etc. when Amazon was unwilling to provide any details on how the data was stored/served/etc or any guarantees in terms of uptime and responsiveness. His response was a smug "Trust us" and he pointed out that the main site ran on identical architecture. Well, clearly the site is not invulnerable. Furthermore, I'm curious to know whether Amazon starts flipping its "instant" capacity from those services over to the main site when they need it.

AdamTest said...

Forgive me for being perhaps overly blunt, but this is a frustratingly lame promotion for reasons way beyond the (obviously also important) DDOS'ing.

- It rewards speed over anything else. Dialup users are hosed. People with slower computers that can't refresh a browser page a zillion times a minute are SOL.

- Two out of two weeks, geeks are winning the vote. Hey, I *am* a geek, but I'm hella annoyed that no matter what the other three items are, the geek item is gonna win every time. Amazon Prime for 50 cents? Probably would still lose out to a geek toy. And you can't tell me that this accurately reflects's current or desired shopping demographic.

- 1000 items. How lame is that! That is a tiny, tiny drop in the bucket of goodwill, as someone else noted.

* * *

Seriously, is this all's marketing folks got up their sleeves for holiday time? Methinks there needs to be some more creative ideas available out there...