Monday, November 22, 2004

Personalized search vs. clustering

Raul Valdes-Perez (CEO of Vivisimo) has a CNet article attacking personalized web search and touting the virtues of document clustering.

Raul makes some excellent points on the difficulties of doing personalized web search well. He says people's interests are fleeting and noisy. Raul says it's difficult to accurately infer interests from clickstream data, which is also noisy and imprecise.

It's true that personalized search is challenging. But Raul criticism is overstated. If personalized search learns immediately in response to new data, it can react to people's immediate goals and interests, even if they differ from their long-term behavior. If the personalization helps in more cases than it hurts, then the personalization has value, even if the data is noisy and the assumptions made from the data are speculative.

Raul's solution is to give up on personalized search and do document clustering instead. Vivisimo's Clusty is an excellent clustering web search -- if you haven't tried it, go try it, it's great -- but it requires effort. Users have to refine their query repeatedly using the clusters to find what they want.

People are lazy. They want what they want and they want it now. Google recognizes this, providing an "I'm feeling lucky" button that just sends you to the top search result immediately. They recognize that it's better to just find what the searcher wants on the first try, no refining, no effort.

Personalized search offers improvements to relevance rank by recognizing that relevance differs from individual to individual. Personalized search makes it more likely that you find what you need on the first try.

4 comments:

Seun Osewa said...

Some thoughts on personalization in news:
* What if people read news to be informed about what other people are reading or what's popular in particular categories (e.g. tech) and have completely unpredictable interests?
* What if tests need to be performed to determine if personalization algorithms in use (for example, on findory) increase satisfaction scores compared to a news site that just show the most popular stories in each category.
* What if non-AI approaches to personalization offer the greatest benefit (save my searches, store my history, alert me when news on this topic comes up).
* What if personalization techniques that give the user more control will yield better results? (e.g. infer my interests in form of keyworks from my clickstream and let me reject irrelevant ones, allow me to indicate both interest and disinterest in a particular news item, and of course clustering).

Concerning this article:
It appears that it'll be easy to 'personalize' the results of a clustering search engine. In a clustering search engine, use the user's clickstream to choose which cluster of results will be displayed by default for various searches (while still allowing the user to choose a different cluster)

Greg Linden said...

Hi, Seun! That's a lot of questions!

I'm not sure anyone has completely unpredictable interests. People who read news in no pattern at all? People who read news in a way unlike any other person reads news? I'd be very surprised to see this happen.

Saving search history is popular. My Yahoo Search, Ask Jeeves, A9, and a few others started offering this feature. It does make it easier to search for something again thar you found once before. But it doesn't help you find something new that you're trying to find.

Providing control over personalization, as you suggest, is valuable. Findory does allow readers to delete articles form their history and will explain why an article was recommended. Rating articles is an interesting idea; unfortunately, most people won't bother with it, but it is useful for power users.

Many people have tried personalization based on clustering. It does work okay. In my experience, it fails to capture fine-grained interests, but your mileage may vary.

Rob said...

Greg hit the nail on the head. People are lazy. I think of time as my investment and I demand as much return on that investment as I can possibly generate. People want what they want now and with as little effort as possible. I remember a knowledge management seminar where perfectly good collaborative efforts fail because the general employment saw no gain to them by adding knowledge to the pump.

Anonymous said...

hi,i am trying to find personalization algorithms but i have not found any yet.i saw your article and thought that you migth have something.could you help me?