Friday, April 10, 2009

MapReduce using Amazon's cluster and differential pricing

Amazon recently launched Elastic MapReduce, a web service that lets people run MapReduce jobs on Amazon's cluster.

Elastic MapReduce appears to handle almost all the details for you. You upload data to S3, then run a MapReduce job. All the work of firing up EC2 instances, getting Hadoop on them, getting the data out of S3, and putting the results back in S3 appears to be done for you. Pretty cool.

Even so, I have a big problem with this new service, the pricing. MapReduce jobs are batch jobs that could run at idle times on the cluster, but there appears to be no effort to run these during idle times nor is there any discount on the pricing. In fact, you actually pay a premium for MapReduce jobs above the cost of the EC2 instances used during the job.

It is a huge missed opportunity. Smoothing out peaks and troughs in cluster load improves efficiency. Using the idle time of machines in Amazon's EC2 cluster should be essentially free. The hardware and infrastructure costs are all sunk. In a non-peak time, only the marginal cost of the additional electricity used by a busy box over an idle box is a true cost.

What Amazon should be doing is offer a steep discount on EC2 pricing for interruptible batch jobs like MapReduce jobs, then only run those jobs in the idle capacity of non-peak times. This would allow Amazon to smooth the load on their cluster and improve utilization while passing on the savings to others.

For more on this topic, please see also my Jan 2007 post, "I want differential pricing for Amazon EC2".

Please see also Amazon VP James Hamilton's recent post, "Resource Consumption Shaping", which also talks about smoothing load on a cluster. Note that James argues that the marginal cost of making an idle box busy is near zero because of the way power and network use is billed (at the 95th percentile).

For some history on past efforts to run Hadoop on EC2, please see my Nov 2006 post, "Hadoop on Amazon EC2".

Update: Eight months later, Amazon launches differential pricing for EC2.


Matt said...

We were actually getting InsufficientInstanceCapacity errors when trying to start additional instances on April 8th. We were nowhere near our 100 instance limit, but rather, tech support indicated that the data center ran out of instances. Here's their solution:

"There are essentially three things you can do. Wait a little while and try again, try another availability zone, or buy reserved instance space from Amazon."

Your idea is right on... i can understand if they want to keep things simple now and introduce the lower prices a year from now, but to have the prices be higher for batch jobs? That's weird.

Unless they're already running their own interruptible batch jobs on the unused capacity of individual instances that we're already paying for. The individual hadoop servers are likely to have no leftover capacity, hence the price premium.... but i have no clue really, just speculation

Peter said...

Good points Greg. I expect this was probably the simplest way to launch the initial version of the service, but it would be great to have a lower cost version that ran at low priority on idle EC2 capacity.

Note that with the current approach you are spinning up a private Hadoop cluster for each job, so you have full access to each underlying instance for any custom installs, mounting EBS volumes, etc. Check out the "Grep the Web" series of articles and you will notice some similarities between the design of the two services.

Andrew Hitchcock said...

Hi Greg,

I'd just like to say thanks for the blog post and feedback. Before launch I was hoping we'd get written up on your blog.


Anonymous said...

Right on. I was initially excited by the news because I thought the incremental price for M/R is the whole price.

There is absolutely no incentive to use the new offering for people who actually want to use M/R on EC. They are just charging the incremental price for an integrated web UI.

Paco Nathan said...

Good points, Greg -

Pricing for off-peak makes sense, except that estimating run-time costs for a MR job with very large data is a hard problem. A job which starts during off-peak can easily overlap into peak-time. IMHO, the issue is that Hadoop itself is unaware of run-time costs from a workflow perspective, and requires some higher level abstraction to be effective, e.g. Cascading -- if that could estimate MR job costs vs. cluster utilization trends, and schedule intelligently.

Another issue is with large-scale, commercial Hadoop apps on EC2 which have a critical batch window. There's a risk unless one can touch the metal. Our shop ran large Hadoop jobs on EC2 (100+ m1.xl, several hours), while most of our peers kept MR on their own hardware. Potential issues include instrumentation, cost of troubleshooting by d/l all the logs involved, rack locality, block level replication on name nodes, high variance of S3 write times, etc.

Even more direct was the problem (and dollar value) of launching a large count of EC2 nodes at once -- a very different use case than a web farm spinning up or down based on traffic. You pay for node_1 in per hourly increments while waiting for node_N to complete its launch. Then after the long MR workflow completes, there are problems trying to dump resulting data out to S3, requiring many retries and a longer time window for the full cluster. With large N used in daily jobs, the costs for setup and tear-down become significant.

Using special AMIs with restricted access, AWS has opportunities to address some issues: setup cost, tear-down cost, instrumentation, name-node redundancy, rack locality. Not that they *have* resolved those issues completely, but AWS in is position to "touch the metal" and mitigate risk for large, mission critical batch apps based on MapReduce -- whereas an AWS customer alone cannot. Those issues make the difference for an architecture being cost-effective or not. I find that to be the core value of Elastic MapReduce pricing.

IMHO, a big win for EMR will be more along the lines of Hadoop On Demand. Consider the case of N nodes running an MR job, and say 1000 reduce tasks, where 997 reduce tasks finish in 2 minutes, but the last few take 2 hours. That's relatively common. Even with speculative execution, much of the cluster has nearly 0 utilization for those last 2 hourly increments of cost.

A service based on pay-as-you-use HOD could leverage the multi-tenancy of MR jobs to reduce that overhead.

Wytse said...

Cheaper runtime costs make sense if you don't want guarantees about start/endtime. This is probably no the case for everybody?

Isn't the costs increase also due to no longer needing a certain expertise or familiarity with hadoop. I know of at least one company with hadoop specialists for the cloud and the most cose-effective ways doing this etc. So instead of getting machines, you get machines preconfigured with hadoop saving you this overhead? I don't see why this couldn't cost more?

It does make me wonder if you can actually save money by using your own efficient way of using hadoop/ec2 versus amazon's ready-made hadoop. I.e. is it worth to get a specialist