Tuesday, July 15, 2008

Learning diversity when learning to rank

Filip Radlinski, Robert Kleinberg, and Thorsten Joachims have a ICML 2008 paper, "Learning Diverse Rankings with Multi-Armed Bandits" (PDF), that attempts to "directly learn a diverse ranking of documents based on users' clicking behavior."

An excerpt:
[We] show how clickthrough data can be used to learn rankings maximizing the probability that any new user will find at least one relevant document high in the ranking .... [even though] web queries often have different meanings for different users.

We propose an online learning approach for learning from usage data. As training data is being collected, it immediately impacts the rankings shown ... The goal is to minimize the total number of poor rankings displayed over all time.
The work appears to be largely theoretical due to very long convergence times -- sadly, investigating "how prior knowledge can be incorporated ... to improve the speed of convergence" is left to future work -- but still is a worthwhile and enjoyable read.

Please see my previous post, "Actively learning to rank", that discusses a fun KDD 2007 paper also by Filip Radlinski and Thorsten Joachims on learning to rank from click data.

Update: Two years later, Filip publishes a paper that appears to have a more practical and scalable technique for learning diversity from click data, "Learning optimally diverse rankings over large document collections". More nice work there from Filip.

3 comments:

Anonymous said...

Greg - What, in your opinion, is the relationship between diversity and personalization? Are there interesting opportunities in that direction? Does personalization decrease diversity? Or does personalization enable the chance to create a (paradoxically?) greater, while simultaneously more targeted, diversity? Or none of the above?

Greg Linden said...

Offhand, I'd say that personalization tends to increase overall diversity. It shows different results to different people, so it improves the overall spread of results shown and reduces the winner-takes-all effect.

But, by itself, I'd say personalization does nothing to increase the diversity of the results shown to each person. It probably would even tend to make the results shown to each person less diverse unless a specific effort was made to avoid that.

Is that what you would also predict, Jeremy?

Anonymous said...

Yes, the second part of what you say is what I am after. The "per person" diversity. My feeling is that, left alone, personalization would actually hurt per-person diversity.

But I was also having the brainstorm that, within the personalization context, there might be opportunities for increased diversity as well. Call it micro-diversity or something. Like, you might see Jaguar the automobile, and I might see jaguar the animal. But you might then see more types of pages containing the automobile: Dealerships, hobbyists, parts/service, uses or references inside of popular media (Jaguars driven in movies, sung about in songs, etc. "I've been driving in my car.. it's not quite a Jag-u-ar..") Instead of just all dealerships.

So, counterintuitively, personalization might allow you to increase diversity. But I guess it depends on your technique, and how well personalization sorted things out. If not done properly, personalization might not be able to "do" microdiversity. I dunno. Your thoughts?