Friday, August 01, 2008

Modeling how searchers look at search results

Georges Dupret and Benjamin Piwowarski from Yahoo Research had a great paper at SIGIR 2008, "A User Browsing Model to Predict Search Engine Click Data from Past Observations" (ACM page). It nicely extends earlier models of searcher behavior to allow for people skipping results after hitting a lot of uninteresting results.

An extended excerpt:
Click data seems the perfect source of information when deciding which documents (or ads) to show in answer to a query. It can be thought of as the results of users voting in favor of the documents they find interesting.

Nevertheless, [click data] cannot be used without further processing: A fundamental problem is the position bias. The probability of a document being clicked depends not only on its relevance, but on other factors as its position on the result page ... Eye-tracking experiments show that a user is less likely to examine results near the bottom of the list ... [and] that a document is not clicked with the same frequency if situated after a highly relevant or mediocre document.

The Cascade Model assumes that users view search results from top to bottom, deciding whether to click each result before moving to the next ... The cascade model is based on a simplistic behavior model: Users examine all the documents sequentially until they find a relevant document and then abandon the search.

We generalize this and allow for the possibility that a user skips a document without examining it .... We propose that the probability of examination is dependent on the distance d from the last click as well as on the position r in the ranking. The intuition behind using the distance is that a user tends to abandon the search after seeing a long sequence of unattractive snippets on the page.

Our solution outperforms very significantly all previous models ... Our findings confirm that [users] almost always see the document directly after the clicked document .. and explain why documents situated just after a very relevant document are clicked more often.
The paper also discusses a "multiple browsing model" that attempts to capture that different people browse differently on different types of queries (e.g. navigational vs. informational), but that model, surprisingly, failed to yield an improvement.

Please see also Craswell et al., "An Experimental Comparison of Click-Position Bias Models" (PDF), a fun WSDM 2008 paper that proposes a Cascade Model for how searchers review search results.


Barry Kelly said...

I no longer follow this model, however - now, I middle-click on multiple promising results, and then iterate through tabs quickly sorting the wheat from the chaff. Under this model, clicks on e.g. the second or third result do not mean that the first result was most relevant.

I wonder if my pattern will become more prevalent, as users get more used to tabs?

Nick said...

second to the first :-)