Monday, August 04, 2008

To personalize or not to personalize

Jaime Teevan, Sue Dumais, and Dan Liebling had a paper at SIGIR 2008, "To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent" (PDF), that looks at how and when different people want different things from the same search query.

An excerpt:
For some queries, everyone ... is looking for the same thing. For other queries, different people want very different results.

We characterize queries using features of the query, the results returned for the query, and people's interaction history with the query ... Using these features we build predictive models to identify queries that can benefit from personalized ranking.

We found that several click-based measures (click entropy and potential for personalization curves) reliability indicate when different people will find different results relevant for the same query .... We [also] found that features of the query string alone were able to help us predict variation in clicks.
Click entropy is a measure of variation in the clicks on the search results. Potential for personalization is a measure of how well any one ordering of results can match the ordering each individual searcher would most prefer. The query features that worked the best for predicting ambiguity of the query were query length, the number of query suggestions offered for the query, and whether the query contained a url fragment.

One tidbit I found interesting in the paper was that average click position alone was a strong predictor of ambiguity of the query and was well correlated with both click entropy and potential for personalization. That's convenient. Average click position seems like it should be much easier to calculate accurately when facing sparse data.

Please see also my older posts, "Effectiveness of personalized search", "Potential of web search personalization", and "Characterizing the value of personalized search".

No comments: