Wednesday, September 27, 2006

Potential of web search personalization

There are some great data points on the potential of personalized web search in a KDD 2006 paper, "A Large-Scale Analysis of Query Logs for Assessing Personalization Opportunities", by Steve Wedig and Omid Madani from Yahoo Research.

The paper starts with what should by now be a familiar-sounding motivation for personalized search:
Interacting with search engines has traditionally been an impersonal affair, with the returned results a function only of the query entered.

Unfortunately the average query length is consistently reported to be around two, so many queries are too short to disambiguate the user's information need. Moreover, users often view only the first page of results, which makes precision critically important.

These limitations have motivated researchers to look beyond the query and consider how a search's context can provide further evidence about the user's information need.
To determine the potential for personalized search, the researchers analyzed "six months of query logs from the Yahoo! search engine" that "contained about 1.35 million cookies, 26 million searches, and 20 million clicks." Their goal was to determine "the extent of short and long term history available" and the "consistency and convergence rate" of user's interests.

Right at the beginning, the authors distinguish between using a searcher's short-term history to change search results, which they call "adjustment", and modifying searcher results using a profile built from their long-term history, which they refer to as "personalization".

Frequent readers of this weblog would know that I would call the first personalization and the second "probably not worth doing". But this paper does a good job quantifying the potential impact of both the short-term and long-term approaches to personalized search.

In particular, the authors looked at the number of searchers who had enough information for profiles built from long-term history. In their analysis, 50% of queries to Yahoo Search came from "users who performed at least 100 queries over the 6 month period." That seems promising.

However, later in the paper, they analyze the number of queries necessary for a user's interests to clearly converge and become distinct from the population as a whole. They determined it required "a few hundred queries". Less than 25% of queries and less than 3% of users appeared to have that much data.

This does not mean that a long-term, profile-based approach to personalization is not worth doing, but it does mean that it would only impact a minority of the queries and users.

The short-term approach, which they call "adjustment", appears to have potential to influence many queries. The researchers talk a bit about some promising approaches for that in the last part of the paper, including focusing on less common clickthroughs, clickthroughs that users tend to return to, and related clickthroughs. They claim that "with short-term adjustment, a single click ... could dramatically improve results for the rest of your search, even without any prior user history."

In the end, it is probably worth doing both approaches, but this paper is useful for understanding some of the limitations of each. Well worth reading.

For more on personalized web search, please also see some of my previous posts: "Beyond the commons: Personalized web search", " Google Personalized Search and Bigtable", " More on Google personalized search", and " New personalized web search at Findory".

By the way, if you like this post, you may also be interested in my post, "Recommending advertisements", on another of Omid Madani's papers.

Update: If you have trouble downloading the paper from Yahoo Research, you can also get it from the ACM.


Anonymous said...

Greg, aren't there a considerable amount of people who would say that Personalization=Spying. You would look into the query logs to personalize, to learn the user behaviour , but I dont think everybody would welcome it.What do you think about these social issues ? I like Findory because
it doesn't need a login but hwat about the other like Google?

Greg Linden said...

It's a good point. There are some things that can help here.

One is to keep people anonymous. That is what Findory does. To Findory, you are some random number, just a unique identifier. Findory does not know who you are, where you live, or any other personal information.

Another approach is to give user's the ability to easily disable the personalization. The most careful form of this is to make the personalization opt-in only, as Google Personalized Search currently does.

A third approach is to give people the ability to edit their histories. Findory, Google, and other sites with personalization usually allow this, not just to give user's more control, but also because the edits can improve the quality of the personalization.

I think the most important thing here is to make the personalization genuinely helpful. For example, at, the personalization and recommendations are genuinely useful for discovering items in their massive catalog that would be hard to find on your own. Your history is not used to show you yet more types of annoying advertising; it is used to help you find what you need.

Anonymous said...

Hey Greg - you've undoubtedly seen this, but has an article where Netflix puts some coin behind improving personalization/recommendations...