Tuesday, October 28, 2008

Rakesh Agrawal at CIKM 2008

Rakesh Agrawal from Microsoft Research gave a keynote talk yesterday morning at CIKM 2008 on Humane Data Mining. Much of the talk was on the potential of data mining in health care, but I am going to highlight only the part on web search, particularly the talk's notable focus on serendipity, discovery, and diversity in web search.

When talking about web search, Rakesh first mentioned making orthogonal query suggestions to support better discovery. The idea here is that searchers may not always know exactly what they want. The initial query may just be a starting point to explore the information, learn what is out there, and figure out what they really want to do. Suggesting queries that are related but a bit further afield than simple refinements may help people who don't know quite what they need to get to what they need.

Rakesh then talked briefly about result diversification. This is particularly important on ambiguous queries, but also is important for ambiguous tasks, where a searcher doesn't quite know what he wants and needs more information about what information is out there. Rakesh mentioned the long tail of search results as part of improving diversity. He seemed surprisingly pessimistic about the ability of recommender system approaches to surface items from the tail, either in products or in search, but did not elaborate.

Finally, learning from click data came up repeatedly, once in learning to classify queries using similarities in click behavior, again in creating implicit judgments as a supplement or replacement for explicit human judges, and finally when talking about a virtuous cycle between search and data where better search results attract more data on how people use the search results which lets us improve the results which gives us more data.

The talk was filmed by the good people at videolectures.net and should be available there in a couple weeks.

Please see also some of my past posts on related topics, including "Evaluating search result pages", "Learning diversity when learning to rank", "Personalization, Google, and discovery", and "Recommender systems and diversity".

No comments: