Friday, April 24, 2009

Serendipity, diversity, and personalized search

Aside from the amusing double entendre in its title, a recent paper out of Microsoft Research, "From X-Rays to Silly Putty via Uranus: Serendipity and its Role in Web Search" (PDF), is notable for its take on two topics that seems to be attracting increasing attention lately, personalized search and improving the diversity of search results.

Some excerpts:
Partially-relevant search results, identified as "containing multiple concepts, [or] on target but too narrow," play an important role in a user's information seeking process and problem definition.

By studying Web search query logs and the results people judge relevant and interesting, we find many of the queries people perform return interesting (potentially serendipitous) results that are not directly relevant .... More than a fifth of all search results were judged interesting but not highly relevant to the search task.

Serendipity was more likely to occur in diverse result sets .... Personalization scores correlate with both relevance and also with interestingness, suggesting that information about personal interests and behaviour may be used to support serendipity.
So, the paper suggests that there may be multiple benefits to personalized search. Not only do we get the benefits of improved understanding of query intent and increased relevance, but also we can improve diversity and discovery.

For a discussion of yet another benefit, reducing the payoff to web spammers, please see also my July 2006 post, "Combating web spam with personalization".


Daniel Tunkelang said...

I see how personalization increases diversity across users (and thus helps disincent spam, but I don't understand why personalization would increase diversity within a single user's results. Shouldn't the effect be the opposite, assuming that my personal interests have less variance than those of the population as a whole?

Greg Linden said...

Sorry, Daniel, I wasn't clear. I meant click entropy for the query overall (which is I think how diversity is defined in the paper) would increase with personalization.

Interesting question whether diversity for a single user's results would also increase.

On the one hand, we should have a narrower interpretation of query intent, so the results displayed should be from a smaller area of the document space.

On the other hand, we should be more accurate about the intent, so more of the results we show should be likely to be viewed as relevant and clicked.

I don't really know, but I'd guess that whether a single user's results are more diverse (as measured by click entropy) with personalization might then depend on whether we usually are narrowing the intent down to one definitive result or to a group of all relevant results from a group that included some irrelevant results. The former would reduce diversity, the latter increase it.

What do you think, Daniel?