Thursday, November 20, 2008

Finding task boundaries in search logs

There has been more talk lately, it seems to me, on moving away from stateless search where each search is independent and toward a search engine that pays attention to your previous searches when it tries to help you find the information you seek.

Which makes that much more relevant a paper by Rosie Jones and Kristina Klinkner from Yahoo Research at CIKM 2008, "Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs" (PDF).

Rosie and Kristina looked at how to accurately determine when a searcher stops working on one task and starts looking for something new. The standard technique people have used in the past for finding task boundaries is to simply assume that all searches within a fixed period of time are part of the same task. But, in their experiments, they find that "timeouts, whatever their length, are of limited utility in identifying task boundaries, achieving a maximum precision of only 70%."

Looking at the Yahoo query logs more closely to explain this low accuracy, they find some surprises, such as the high number of searchers that work on multiple tasks simultaneously, even interleaving the searches corresponding to one task with the searches for another.

So, when the simple stuff fails, what do most people do? Think up a bunch of features and train a classifier. And, there you go, that's what Rosie and Kristina did. They trained a classifier using a set of features that combined characteristics of searcher behavior (e.g. people searching for [tribeca salon] after [new york hairdresser]) with characteristics of the queries (e.g. lexically similar or return similar results from a search engine), eventually achieving much higher accuracy rates on finding task boundaries.

As the authors say, being able to accurately segment tasks could improve our ability to evaluate search engines. In particular, we could seek to minimize the amount of time needed by searchers "to satisfy an information need or fulfill a more complex objective" rather than just looking at click and usage data for one query at a time. Judging search engines by how well they help people get things done is something that, in my opinion, is long overdue.

Please see also my earlier post, "Tasks, not search, at DEMOfall2008", where Head of Yahoo Research Prabhakar Raghavan said that people really don't want to search; what they really want is to fulfill their tasks and get things done.

3 comments:

Daniel Tunkelang said...

I'm a big fan of the paper, for precisely the reasons you cite. It also reminds me of a paper presented at ECIR '08 on "Discounted Cumulated Gain based Evaluation of Multiple-Query IR Sessions":

http://www.info.uta.fi/tutkimus/fire/archive/2008/SessionDCG-ECIR-08.pdf

More examples of work in the area: http://thenoisychannel.com/2008/04/10/multiple-query-sessions/

jeremy said...

What's wrong with just having a single button right next to the search bar, that says "start a new session"? Or having a dual-halved search button, where the left half says "Search in a New Session", and the right half "Search in Current Session"?

It's no extra work for the user. They only have a single button to click before, and a single button to click, after. But by being able to explicitly announce their own intent, the user can get much better results from the search engine, than if the search engine has to guess (sorry, I mean "infer") it from the logs.

To me, this seems like a domain that is ripe/perfect for HCIR. Log analysis will always have errors. An intelligently designed interface, that allows the user not only to better communicate intent to the engine, but let's the engine communicate to the user that it either has or has not understood the user's intent, seems much more valuable than some hidden log analysis in which the user really has no idea about what is happening, if anything, and what he or she can do about it, if anything.

Hmm.. maybe I shouldn't have written all this. I see an HCIR 2009 paper in the making. Daniel? ;-)

Anonymous said...

Hi Greg,

this fragment it make me laugh:
"So, when the simple stuff fails, what do most people do? Think up a bunch of features and train a classifier..."

That is so frequent these days in the research of almost everything...