Sunday, January 21, 2007

I want differential pricing for Amazon EC2

One thing mystifies me about Amazon EC2. Why doesn't the service charge different pricing for low priority usage or usage at non-peak times?

The Amazon Elastic Compute Cloud (Amazon EC2) allows people to rent virtual servers by the hour, charging $.10 per hour and $.20 per gigabyte transferred.

As Amazon says in their description of EC2:
Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously.

Amazon EC2 passes on to you the financial benefits of Amazon's scale. You pay a very low rate for the compute capacity you actually consume.

This frees you from many of the complexities of capacity planning, transforms what are commonly large fixed costs into much smaller variable costs, and removes the need to over-buy "safety net" capacity to handle periodic traffic spikes.
In order for this to work out for Amazon, I would think Amazon also needs to avoid coordinated "periodic traffic spikes" in usage of EC2. Otherwise, they also will need to "over-buy safety net capacity" and will see low utilization rates on their cluster.

With the current pricing structure, there is no incentive to avoid peak load times. In fact, if I were using Amazon EC2 for a batch job, I probably would request my servers during US work hours, the same time EC2 is under heavy load from webservers or other real-time tasks. There is no reason to do otherwise.

I think EC2 should offer a lower rate for low priority requests for servers. Servers at this rate could be pulled from the client at any time for higher priority jobs.

Pricing could be very low because idle servers are worthless to Amazon. If the price point is near the marginal cost of the server time, this service would be attractive to many.

The benefits for Amazon are also apparent. There would be less need to over-buy capacity since capacity could be regained from low priority requests. Utilization would increase, Amazon would get paid for what would otherwise be idle time, and the economics of EC2 would improve.

I have a lot of big data processing tasks -- both for Findory and for side interests -- that fit a batch profile. I am sure other potential EC2 users do as well.

Amazon itself has many batch jobs that fit this profile, including web server log processing, personalization builds, search indexing, and data mining. All of these could be done on borrowed EC2 servers rather than using more expensive dedicated hardware.

Going a step further, I suspect many of these low priority batch jobs could benefit from a different API to Amazon EC2.

Rather than request servers at a time and then manually configure them myself, what I really want is to be able to request a MapReduce job and kick off hundreds or thousands of servers at low priority. Processing on servers that go down or are pulled from us during the job should be restarted elsewhere. At the end, I should get the completed data file.

It should be something like: "Here's my data file, some MapReduce code, and $10. Let me know when you're done."

Powerset already is running a MapReduce clone on Amazon EC2. Powerset has shown both that this service is possible and that there is demand for this service.

It would be MapReduce for the masses. No longer would you have to be at Google to do easy data processing on a massive cluster. You could borrow Amazon's EC2 cluster any time you want.

Update: Nearly three years later, Amazon launches differential pricing for EC2.

6 comments:

fishzle said...

If I were Amazon (which I'm not), I'd have a plan to introduce differential pricing at some stage. But not now.

I doubt there are enough users on EC2 to justify the complexity. Differential pricing would (as you've noted) distribute the load more evenly around the clock, but when the peak load at the moment is 5% (a guess), it's not worth the effort.

I expect that Amazon have a huge amount of CPU/disk/bandwidth available, and that Ec2 is still but a minor consumer in the grand scheme.

Wait and see. If Ec2 is still around and growing in 12 months time, then we may see differential pricing emerge.

Face it, you just want a price break on $0.10/h don't you?

Eran Sandler said...

Greg, what you're saying here is actually an interesting work plan for a company to start and run.

One could take the implementation of Hadoop (http://lucene.apache.org/hadoop/about.html) create an image for EC2, get money from other people and run this for them, notifying them when the operation is done via Email or something like that and making the result available for download (not on the same EC2 images that run the job, because you can optimize it and put it back on S3 and save it for a week or so - transfers between EC2 and S3 are not charged).

It's actually an interesting and feasible business model :-)

Greg Linden said...

Good point, Eran. However, I think Amazon would have a huge price advantage if they implemented this themselves.

An external provider of the service would have to pay Amazon at the normal rate .

Amazon has the information needed to run these jobs only when there is idle capacity on the EC2 cluster, so the pricing could be much lower.

Eran Sandler said...

It seems to me that at this point Amazon is more about providing the basic infrastructure and not the applications on top of it.

They could expose some of these APIs or have a deluxe partner program that will get extra data such as idle time so they can better utilize the EC2 grid.

Anonymous said...

Hi Greg, good post. I will make sure that the development team sees it.

Anonymous said...

Note that this is exactly the model that allowed electric utilities to scale profitably and to offer low-cost service to consumers as well as large industrial users, given their huge capital costs and inability to inventory generated power. Likewise with cellphone providers and "evening and weekend" pricing.

N-tier pricing with relaxed timing constraints could allow AWS to offer N instances guaranteed to be available for M hours sometime over the next 24, 36, 48, hours. I might not care when this job completes, given a low enough price point for usage.