Friday, November 12, 2010

An update on Google's infrastructure

Google Fellow Jeff Dean gave a talk at Stanford for the EE380 class with fascinating details on Google's systems and infrastructure. Krishna Sankar has a good summary along with a link to video (Windows Media) of the talk.

Some fun statistics in there. Most amazing are the improvements in data processing that have gotten them to the point that, in May 2010, 4.4M MapReduce jobs consumed 39k machine years of computation and processed nearly an exabyte (1k petabytes) of data that month. Remarkable amount of data munching going on at Google.

The talk is an updated version of Jeff's other recent talks such as his LADIS 2009 keynote, WSDM 2009 keynote, and 2008 UW CS lecture.

[HT, High Scalability, for the pointer to Krishna Sankar's notes]

Monday, November 08, 2010

More on why paywalls fail

Felix Salmon on why paywalls fail:
It’s not just that readers don’t see the value in paying for content when something “similar” can be found elsewhere. It’s also that there is positive extra value in reading free content, since it becomes much easier to share that content via email or blogs or Facebook or Twitter, you don’t need to worry about following links or running into paywalls, and in general you know that the site will play well with others on the open web.

If Newsday puts up a paywall and it fails, is that because readers can find content similar to Newsday’s elsewhere for free? Yes, in part. But it’s also because the people who would otherwise visit have lots of other things they also like to do. They like to spend time in Farmville, or they want to watch a video of a dog skateboarding, or they want to see their house on Google Earth, or they want to go walk their dog. These aren’t people who need certain information and are going to seek it out at the lowest cost; they’re just people who would visit Newsday’s website if it was free, but won’t if it isn’t.

That’s why gateways and paywalls are such problematic things, online: they’re a bit like that crappy VIP room in the back of the nightclub which is much less pleasant than the big main space. You might wander in there from time to time if it’s free, but if you need to buy an expensive bottle of Champagne to do so, forget it. There’s lots of other stuff to do, both online and off. And so the walled-off areas of the internet simply get ignored.
It is not just that the content has to be uniquely valuable to make the hurdle of a paywall worth it to readers. It is also that the experience of a paywall detracts from the value of the content because of the hassle to all readers, including when someone wants to share an article with a colleague but cannot because of the paywall.

I would add that, even ignoring the value of sharing, the hurdle of a paywall often seems to be underestimated. As described in Dan Ariely's Predictably Irrational ([1] [2]), among other places ([3]), free has no transaction costs, no risk of loss, and great appeal. Charging anything, anything at all, creates transaction costs and risk, to the point that the vast majority of people will not do it unless the perceived value is obvious and obviously high.