When Netflix announced its prize in October, [CEO Reed] Hastings said he didn't necessarily expect contestants to make a lot of quick progress.While the prize money adds excitement, I think most of the enthusiasm from the research community is simply from having access to such a massive data set.
Computer scientists say that Cinematch, along with Amazon's recommendation system, was already one of the most sophisticated. "We thought we built the best darn thing ever," Mr. Hastings said.
But Mr. Hastings underestimated the power of an open competition. Within days, many of the top people in a field known as machine learning were downloading the 100 million movie ratings Netflix had made public.
The experts have since been locked in a Darwinian competition to build a better Cinematch, with the latest results posted on a leader board at Netflix's Web site.
With four and a half years to go in the contest, [the lead team] is already 6.75 percent better than Cinematch. And Netflix hasn't had to pay for their time.
In effect, the company "has recruited a large fraction of the machine learning community for almost no money," as [Geoffrey] Hinton, [a University of] Toronto [Computer Science] professor, put it.
Until Netflix released their movie ratings data for this contest, the largest data sets available for experimenting with and evaluating recommender systems were the Movie Lens and EachMovie data sets. Those data sets are two orders of magnitude smaller.
Netflix made 100M ratings by 480k customers over 30k titles available to researchers. A data set of that size simply was not available until now.
This opens up new opportunities for research on recommender algorithms. Not only are there considerable challenges in scaling recommender algorithms to big data, but also, as Googler Peter Norvig points out, we may have more to learn from improving our ability to work with massive training data than we do from twiddling algorithms running over small data.
Yes, the money and visibility of the Netflix Prize is a motivator, I am sure. But, there is also excitement from getting access to big data that previously only was available inside companies like Amazon, Netflix, Yahoo, or Google.
See also my original post on the Netflix contest, "Netflix offers $1M prize for improved recs".
See also a Sept 2006 talk (PDF) by Netflix's VP of Recommendation Systems Jim Bennet that has details on their Cinematch recommender system.