Friday, October 13, 2006

Talk on Google AdWords and AdSense

Shiva Shivakumar (Director, Google Kirkland) who "led AdSense through beta, launch and hypergrowth" gave a talk at University of Washington Computer Science called "Google Ad Systems" (video available).

The talk is a light introduction to AdWords and AdSense. It mostly covers the history of the products with some brief discussions of the challenges around relevance, scale, optimal auction pricing, and click fraud.

The talk is worthwhile, a good overview of the work involved in building a system like AdSense and AdWords.

I tend to follow this stuff fairly closely, so the only thing that was new to me in the talk was hearing that AdWords is still running on top of a massive MySQL deployment. AdSense, in contrast, is running on top of the Google data infrastructure, GFS and Bigtable. Shiva was not clear about whether the difference was merely due to legacy issues or whether there is something special about the AdWords data access patterns that makes MySQL preferable.

Shiva's talk unfortunately only touches briefly on auction theory and click fraud issues. If you are interested in more details there, you might dive into Jan Pedersen's talk and a related paper I discussed in an earlier post.

5 comments:

burtonator said...

From everyone I've talked to at Google they plan on moving away from MySQL. It just isn't meant for huge datasets.

If you hav ea somewhat reasonable dataset (less than 50G) and decent hardware and are mostly READ based then you might be ok.....

burtonator said...

I still want an OSS bigtable though :)

John K said...

Kevin -

Memcached distributed across Amazon EC2 instances is amazingly powerful and affordable. It's close to an open source bigtable...

Greg Linden said...

Is it, John? I'm not sure EC2 is that great of a deal.

Amazon EC2 is ($.10 * 24 * 30) = $72/month not including data transfer at $.20/G for a shared virtual box equivalent to a 1.7 GHz machine with 1.75G RAM.

I can lease a dedicated 3 MHz server with 1G RAM for $80/month or dual 2.2 GHz with 4G RAM for $265/month. And that box can be on the same LAN as my other servers, unlike the EC2 boxes.

I'm not saying Amazon EC2 is a bad deal, but it doesn't seem "amazingly powerful and affordable" compared to alternatives.

Am I wrong on this? Is EC2 a better deal than it looks?

John K said...

Greg,

No, you are right, but that wasn't really what I meant.

Memcached was more the object of my superlatives rather than EC2.

Memcached provides the bigtable-like features.

EC2 is a good deal - not a great one. It's a good deal mainly because of the high amount of memory (1.7GB), especially when using Memcached.

I agree that's EC2's pricing (once you include the bandwidth) isn't revolutionary or even much cheaper than bargain dedicated places. (BTW where do you get that $80/month deal?)

The real power of EC2 is the ease of expansion.

I can clone my EC2 box and start running up to 20 instances of the same thing in under 1/2 day. THAT'S NOT REPLICATABLE under the lease model, in my experience.

That's the real benefit of EC2.

And when you imagine 20 boxes X 1.5gb dedicated to Memcache, you have a 30GB sized in-memory cache... for only $2 / hr :) Not bad...