Wednesday, January 18, 2006

Recommender systems and toy problems

In a recent literature search, I came across a paper by Guy Shani, Ronen Brafman, and David Heckerman called "An MDP-Based Recommender System".

This paper starts off by recasting the recommendation problem:
Typical Recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of generating recommendations as a sequential decision problem.

We suggest the use of Markov Decision Processes (MDP) ... [that] can take into account the utility of a particular recommendation [and] suggest an item whose immediate reward is lower, but leads to more likely or more profitable rewards in the future.
The paper goes on to say that, in their tests, "all the MC [Markov Chain] models ... yielded better results than every one of the Predictor-k models."

Sounds good so far, but the first sign of trouble is in the definition of their model:
The states in our MC model ... contains all possible sequences of user selections.

Of course, this formulation leads to an unmanageable state space with the usual associated problems -- data sparsity and MDP solution complexity. To reduce the size of the state space, we consider only sequences of at most k items, for some relatively small value of k.
As you might expect, the state space for the model has to be harshly clipped to make it manageable. But it gets worse. Look at the data sets used in the experiments:
We used two data sets, one containing user transactions (purchases) and the other containing user browsing paths obtained from web logs. We filtered out items that were bought/visited less than 100 times and users who bought/browsed no more than one item. We were left with 116 items and 10820 users in the transactions data set, and 65 items and 6678 users in the browsing data set.
116 items in the largest data set, 11k users. In contrast, Amazon has 7M+ items in their catalog and 40M+ users.

It is not hard to beat an algorithm designed to work on massive data sets with an algorithm that can only work on toy problems. The challenge is to find techniques that outperform existing solutions on realistic data sets.

I can't tell you how many times this happens. So many papers I come across propose some exciting new technique with claims of higher accuracy, but the technique is impractical, unable to scale to real world problems.

In this case, the paper claims its contribution is an MDP recommender system. But applying a well-known technique like MDP to product recommendations is not difficult. Finding a way to make that technique scale is difficult.

In the end, papers like me leave me with the feeling that everyone just wasted their time. The hard problems were ducked. The researchers left us tinkering with toys.

I only hope that no readers are mislead into believing that their experience with small toys prepares them for the crushing data sets of the real world.

No comments: