Tuesday, June 06, 2006

Standing interests, search, and recommendations

Googlers Beverly Yang and Gleh Jeh presented a paper, "Retroactive Answering of Search Queries" (PDF), at WWW 2006 that discusses an interesting idea around recommendations for web search.

The authors built a prototype that "retroactively answers queries from a user's history as new results arise." It is a little like automatically constructing Google Alerts for searchers and informing each person of any interesting, new results. Yang and Jeh say that they want to "focus on and address known, specific user needs" by "making web page recommendations corresponding to specific past queries."

The system itself is pretty straightforward. "Standing interests" are determined by queries that have a lot of activity (e.g. > 8 clicks on search results, > 3 refinements of the query, and repeated searches on the query). For each standing interest, the prototype tries to find new results with high PageRank for the query, then presents those results as recommendations.

One interesting result from the quick user study described in the paper was that rank -- how high the search result appears on the search result page -- was inversely correlated with perceived recommendation quality.

I think this may be best explained by people wanting the recommendations to help them discover things they would not have found on their own. The best recommendations are new, surprising, and useful. If I can find it easily myself because it shows up high on search results, that is not new, surprising, or useful.

The paper does not explore using data about what other people have found for the recommendations. That is unfortunate. In my experience, the most surprising and useful recommendations come from other people.

The paper does focus quite a bit on the problem of identifying these "standing interests", queries that a particular user does again and again. This seems similar to building a user subject interest profile (e.g. I like geeky computer and business stuff), but much more specific. As the authors briefly point out, the "automatic identification of standing interests in the form of specific queries can be especially valuable in ads targeting."


Greg Linden said...

Haven't read it yet. Sounds interesting! I'll have to get to it soon.

Anonymous said...

So let me get this straight: Standing interests are defined as "8 clicks on search results, > 3 refinements of the query, and repeated searches on the query"?

Actually, that does not sound like a standing interest, so much as it sounds like someone's query failed. They were looking for a particular piece of information, and either (1) They did not know the proper query terms to use, to uniquely identify their information need (the classic vocabulary mismatch problem) or else (2) the top of the ranked list was too filled with spam, and they were trying to filter out those pages that they didn't want, to find the pages that they did want.

So it seems to me that the solution to this is not to set up a standing interest, and feed the user with more spam and/or more slightly irrelevant, vocabulary mismatched results. The solution would be to give the user intuitive query refinement tools, to be able to better express what they are looking for. Then they wouldn't have to manually issue so many tweaked queries over and over again.

What you say further on confirms my suspicion: "One interesting result from the quick user study described in the paper was that rank -- how high the search result appears on the search result page -- was inversely correlated with perceived recommendation quality."

This tells me that the user's queries are failing. So giving the user more information on an already poorly-defined query seems like the absolute worst thing to do, no?

Greg Linden said...

Great point, Jeremy. I found their definition of "standing interests" unsatisfying as well.

Anonymous said...

Wait, wait.. but this is more than just an unsatifying definition. It sounds almost as if they are solving completely the wrong problem. Give the observed behavior whatever name you want to, but if I see a user click 8 of the 10 top-ranked documents, and re-issue a query 3 whole times, each time slightly modified, I personally would not draw the conclusion that the user wants to see more of the same type of documents, when these documents become available.

No, to me that sounds like a textbook example of query failure (no relevant docs found) due to vocabulary mismatch. So to solve this problem by recommending more of the same types of non-relevant documents seems.. well.. silly.

I could very well be wrong. Maybe it really is an example of the user having a perfectly-formed query syntax, and there just not being any relevant documents yet available. But it really does not sound like it. Especially given that finding about relevance being inversely correlated with rank. That sounds like a classic case of the search engine giving the user exactly what the user asked for, but the user not really asking the right question, to begin with. That is why, further down the list, you would find more relevant documents...you get further away from what the user actually asked, and more into the documents on things the user should have asked for. Know what I mean?

Greg Linden said...

Right, it seems like the wrong criteria for standing interests. As you said, I would think that repeated refinements mean I am failing to find what I want right now, not that I am permanently interested in the results of the query.

The other criteria are a little more intuitive to me, repeated searches on the query and multiple clicks on the query search results.

Thinking about it for a moment, rather than query refinements, I might look at related queries on similar topics as reinforcing a standing interest.

Generally, their criteria for standing interests seems ad hoc to me. I suspect better criteria might be found with a little more effort and experimentation. That's why I said it was unsatisfying.

Anonymous said...

To add to the "unsatisfactoriness" of the definition of standing interest, I think the element of time has been completely disregarded. I can think of at least two ways that it can be incorporated in defining standing interest:
1. Look at the interval between related queries. An interval of a few minutes each between multiple similar queries implies that (most likely) the user is thrashing. While a sustained interest in a topic will likely show up as related queries over a longer period of time.
2. Factor in the "temporality" of the queries, both from the query itself and the results for that query. For example, any query about an event (a concert, a TV program, World Cup Soccer) is usually uninteresting after the event has happened. Similarly, if I am querying for "perl for dummies" today, doesn't mean I'll still be a dummy 3 months down the line.

Anonymous said...

Agreed with the others that this is a meaningless way to define standing interest.

How, then, would you do so from a search engine? Do we think that standing interest actually exists, or that searches are almost entirely one-offs (except for Googling yourself)?

Anonymous said...

Ah, Greg, sorry.. I misunderstood the fact that you did understand what I was saying.

I like the idea of using related queries from similar topics as a way of reinforcing the standing interest. Almost like an LSI/pLSI/LDA for queries, rather than for documents, eh?

Pranav also has some excellent points about the temporality of the queries. Actually, that same sort of issue exists in personalized recommender systems like Amazon, too. I have a friend who bought a book for his daughter when she was 7. It was a 1- or 2-off purchase, with no repetition over time. Now, the daughter is 14, and the system still recommends books from that 7 year-old age-level genre. It seems to me that the system should "expire" some of those recommendations, if there is not repeat temporality.

And to answer anonymous, I do actually think standing interests exist. Correct me if I am wrong, but all we are talking about is the old TREC task of document routing and filtering, no? I think that has long been acknowledged as useful.

[BTW: I hope this post goes through.. I've been having trouble posting to blogger for the past two days]