Monday, October 02, 2006

Netflix offers $1M prize for improved recs

This is interesting. Netflix is offering $1M in a contest to anyone who can improve the predictive accuracy of their recommendation engine by 10%.

From their Rules page:
We're quite curious, really. To the tune of one million dollars.

We've developed our world-class movie recommendation system: Cinematch. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. We use those predictions to make personal movie recommendations based on each customer's unique tastes. And while Cinematch is doing pretty well, it can always be made better.

Now there are a lot of interesting alternative approaches to how Cinematch works that we haven't tried. Some are described in the literature, some aren't. We're curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business.

So, we thought we'd make a contest out of finding the answer. It's "easy" really. We provide you with a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.)

If you develop a system that we judge most beats that bar on the qualifying test set we provide, you get serious money and the bragging rights. But (and you knew there would be a catch, right?) only if you share your method with us and describe to the world how you did it and why it works.

Serious money demands a serious bar. We suspect the 10% improvement is pretty tough, but we also think there is a good chance it can be achieved. It may take months; it might take years.

There is no cost to enter, no purchase required, and you need not be a Netflix subscriber. So if you know (or want to learn) something about machine learning and recommendation systems, give it a shot. We could make it really worth your while.
Sounds like fun. I wonder how much time I'll end up wasting on this one.

If you are thinking of entering the contest, you might be interested to know that much of the Internet Movie Database (IMDb) database is available for download. Another good source for movie content is Amazon Web Services.

[Contest found via Pete Abilla and John Krystynak]

Update: I should explicitly point out that this Netflix data is by far the largest ratings data set available to the research community. Most work on recommender systems outside of companies like Amazon or Netflix has had to make do with the relatively small 1M rating MovieLens data or the 3M EachMovie data set. This Netflix data set is 100M ratings. It will be enormously useful for recommender system research.

Update: The comments on this post are starting to get pretty interesting.

Update: On the idea of using external movie data, Ilya Grigorik published data linking the Netflix movie ids to features extracted from IMDb data.


John K said...

You know what I think is the major problem with the current system - and the reason it will be hard to improve:

"The ratings are on a scale from 1 to 5 (integral) stars."

I don't think there's enough granularity in there. I save 5 stars for only my very favorite movies of all time. And I rarely rent those.

1 or 2 stars is for a really bad movie, and I rarely rent or rate those.

That leaves, basically 3 or 4 stars. I think about the consequences in my future recommendations before giving 4 stars...

Greg, I wonder if the recommendation literature deals with this issue of low dimensionality in the ratings? Is it a well understood problem?

Greg Linden said...

Good point, John. The problem is that more granularity requires more work from users.

In fact, in most data I have seen, most people tend to only use the extremes, 1 and 5 ratings, and avoid the middle. It will be interesting to see if that is true in the Netflix data.

I have seen this issue mentioned in the literature, but I cannot recall the specific papers at this moment. Sorry, I know that is not very helpful.

Anonymous said...

Not sure you are allowed to bring outside data to the system, how would that help netflix if you could bring in proprietary data that they don't have access to?

Aron said...

I've spent a couple hours working up some brain dead approaches and it's becoming clear to me that you can get darn close to the Netflix RMSE score without even trying.

I suspect that the complexity to payoff is hyperbolic right around that score. I mean they are claiming just under 1 star average deviation. Doing nothing but predicting 3.5 stars for everybody gets you pretty close to that.

and to anonymous, I have to assume that you can use outside data otherwise this is probably completely unattainable. Chances are that if you can get your hands on the data, Netflix can (through you if necessary).

Additionally, if you step back and look at the whole process there is the element of capturing data (such as the 5 star issue, but also what movies you ask for them to rate, etc.) and then what you do after you have a prediction set. I would wager that the system is improved more successfully on the ends of this process, not in the middle.

Nevertheless, I plan on wasting some hours on this.

Greg Linden said...

Thanks for sharing some of the results of your initial analysis, Aron.

I suspect it is pretty easy to get close to 1 on the predictions. For example, if the ratings are uniformly random, just predicting 3 all the time would get a RMSE of sqrt((4 + 1 + 0 + 1 + 4)/5) or sqrt(2), which is approximately 1.4. The distribution is almost certainly skewed, so getting to around 1.3 should be trivial.

The lower you go, though, the more difficult things get. I wouldn't use the difficulty of getting from 1.4 to 1.3 as an indicator of the difficulty of getting from .95 to .85.

Anonymous said...

Goods news and promising comments on the subject... sounds fun for the end of the year

Greg Linden said...

Following up on my comment to Aron, I have been playing with some simple analyses of the data set.

Simply predicting a 3 rating for everything gets an RMSE of 1.31 on the probe.txt test data.

Predicting the overall average rating of approximately 3.6 -- the distribution of the ratings is badly skewed -- for everything gets an RMSE of 1.13.

Predicting the median rating for each movie for all customers gets an RMSE of 1.10.

Predicting the average rating for each movie for all customers gets an RMSE of 1.05.

As I said, do not take this to mean it will be easy to get down to 0.95 and then again down to 0.85. Each further reduction in RMSE becomes exponentially more difficult to acquire.

Aron said...

I've been able to get down to .9577 thus far with an approach only mildly more complicated than the ones you just described. I am using the training file as my probe file, which of course introduces a favorable bias for me. How much, I don't know..

Performance is already getting a bit thorny. I am curious what the odds of missing the boat entirely are if one were to focus on say, 1M of the ratings rather than the full 100M. I'm kind of anticipating that the extra 2 orders of magnitude is a 1-2% difference at most.

Some other stats of the training set:
Mean: 3.60429
# of 1's: 4.6%
# of 2's: 10.1%
# of 3's: 28.7%
# of 4's: 33.6%
# of 5's: 23.1%
Total ratings: 100,480,507

Greg Linden said...

"... if one were to focus on say, 1M of the ratings ..."

Oh, that's no fun at all! Don't do that!

Seriously, though, this is a beautifully large data set. Take advantage of it. If you do some sampling or thresholding for performance, I'd recommend you be fairly selective about where and how you do that.

Thanks for sharing your numbers on the distribution. I got similar numbers. Sorry, I should have shared them earlier.

Greg Linden said...

By the way, cool that you are down to .9577 already! The performance of Netflix's Cinematch was only slightly lower (.9514, .9524, and .9474, depending on the data set used).

Anonymous said...

"I am using the training file as my probe file, which of course introduces a favorable bias for me. How much, I don't know.."

--- You really need to remove the probe set from the training set. Otherwise you really don't know where you are. The differences in score I see are sometimes large; depending on the method.