Friday, September 22, 2006

Winner takes all, relevancy, and personalized search

Eric Goldman has an interesting article in InformIT that talks about personalization and its coming impact on search. Some excerpts:
Currently, search engines principally use "one size fits all" ranking algorithms to deliver homogeneous search results to searchers with heterogeneous search objectives.

Personalized algorithms produce search results that are custom-tailored to each searcher's interests, so different searchers will see different results.

Personalized ranking algorithms represent the next major advance in search relevancy ... Improvements in one-size-fits-all algorithms will yield progressively smaller relevancy benefits. Personalized algorithms transcend those limits [by] optimizing relevancy for each searcher.

Personalized ranking algorithms also reduce the effects of search engine bias. Personalized algorithms mean that there are multiple "top" search results for a particular search term, instead of a single "winner," so web publishers won't compete against each other in a zero-sum game ... Also, personalized algorithms necessarily will diminish the weight given to popularity-based metrics (to give more weight for searcher-specific factors), reducing the structural biases due to popularity.
See also my March 2005 post, "The key challenge is personalization", where I said:
With only one generalized relevance rank, further improvements to search quality become increasingly difficult because people disagree on how relevant a particular page is to a particular search.

At some point, to get further improvements, relevance rank will have to be customized to each person's definition of relevance.
See also my July 2006 post, "Combating web spam with personalization", where I said:
Another way to reduce the value [of web spam] is to reduce the maximum payoff. If different people see different search results, spamming becomes much less attractive. The jackpot from getting to the top of the page disappears.

Personalized search shows different search results to different people based on their history and their interests. Not only does this increase the relevance of the search results, but also it makes the search results harder to spam.
See also my August 2006 post, "Web spam, AIRWeb, and SIGIR", where I said:
"Winner takes all" encourages spam. When spam succeeds in getting the top slot, everyone sees the spam. It is like winning the jackpot.

If different people saw different search results -- perhaps using personalization based on history to generate individualized relevance ranks -- this winner takes all effect should fade and the incentive to spam decline.


Kimoon said...

Personalization is still one shot approach. It tries to give good answer by reading user's mind. If the answer works, great. Otherwise, nothing left to the user.

Better approach is category clustering with pagerank level precision. It would allow user to divide-and-conquer.

Greg Linden said...

Hi, Kimoon. What do you mean exactly? Clustering like Clusty? Or something different?

I do like Clusty and similar clustering approaches. It does give a lot of power to the searcher. The problem with it is that it requires effort from the user to evaluate and understand all the choices.

With personalization, I agree that you want to design the interface so that the cost of an incorrect guess is low. You certainly don't want to be putting the searcher into a dead end by overly constraining their choices. "First, do no harm" should be the motto of anyone working on personalization of search results.

jeremy said...

The problem with it is that it requires effort from the user to evaluate and understand all the choices

But does it require more effort than the user having to read through the top 5 or 10 or 20 results, manually, and come up with additional query terms himself, type those in, and see if the resulting "cluster" (or set) of returned documents is more along the lines of his information need?

Like I've said in the past, why not do both? Why not have personalized clusters of personalized ranked list results?

And the beautiful thing about clustering is that, by its very nature it will "first do no harm", as you say. The user still sees the "original" ranked list, and the clusters on the side. He does not have to use cluster refinements any more than he has to click on a Google ad.