Tuesday, November 29, 2005

Is personalized search a dead end?

Raul Valdes-Perez, CEO of the excellent clustering search engine Vivisimo, wrote a one page paper called "Why Search Personalization is a Dead End" (PDF).

He lists five reasons why personalized search is doomed:
People are not static; they have many fleeting and seasonal interests.

The surfing data used for personalizing search is weak [compared to purchase data].

The user's decision to visit the page is based on the title
and brief excerpt (snippet) that are shown in the search results, not the whole page.

Home computers are often shared among family members.

Queries tend to be short.
The criticisms might be summarized as a claim that clickstream data is too dynamic, noisy, and sparse to support personalization.

There are two problems with this argument. First, Amazon.com's personalization works just fine from similar clickstream data. Sure, it's true that the data is dynamic, noisy, and sparse, but Amazon deals with that by using algorithms that adapt rapidly, are tolerant to errors, and work from very little data.

Second, personalization doesn't have to be perfect. It just has to be better than the alternative. In Amazon's case, the alternative to a personalized front page is a generic front page with a top sellers list or a bunch of marketing goo. It's easy to be more useful to shoppers than that. Mistakes are just fine. The guesses just need to be right more often than the alternative.

Personalized search is no different. The algorithms need to adapt rapidly, be tolerant to noise, and work from little data. Mistakes are fine. Personalized search just needs to be more useful than unpersonalized search.

See also my earlier posts, "Perfect Search and the clickstream" and "Personalized search vs. clustering".

[Valdes-Perez paper via John Battelle]

9 comments:

Jay said...

Nice post.I think personalization will take a long time to evolve..From trying to show personalized stuff which may not be based on person's current preferences (they change as you said) and instead filtering out information which may be relevant to the person....To adapting to a more general yet accurate algorithm which predicts the trend of person's preferences by the time and shows information accordingly. Its more like Adaptive Artificial Intelligence which always takes time to train. There is no shortcut for it.

Pete Cashmore said...

Greg,

I agree with you - personalization doesn't need to be perfect, just better than the alternative. Over time we'll develop algorithms which can deal with the irregularities in our behaviour. I certainly don't think personalization in search is a dead end - in fact, it's an exciting opportunity. Google, for one, understands this.

Andrew Goodman said...

Greg, I side with your leaning here. Those caveats don't mean it's a "dead end." Home computers shared? That would be like a blanket statement that just because husbands and wives sometimes "share razors," the Mach III Turbo Juicepower Plus will have no market. (OK bad analogy.) Someone sharing my laptop? Not if I can help it. So, I guess personalized search might make sense and serve me better. Fleeting interests? I'm in my hometown 70+% of the time. I have a postal code. An age and a gender. There are lots of things that are far from fleeting, but are in fact recurring elements of behavior. I'm surprised to hear otherwise from a "search" person.

Brad said...

I think people are thinking to narrowly about how to create personalized search. Creating an individuals online "DNA" is just not an effective way of organizing content effectively, because peoples actions are too widely dispersed and untrackable. There are solutions waiting to be developed to make this type of search and media consumption more effective, but people are thinking to narrowly in terms of the user and how their preferences and information can be defined.

Greg Linden said...

Hi, Brad. Great point. I think personalized search needs to focus less on building a long-term profile and more on using what I'm doing right now to help me find what I need.

That's what the alpha of personalized web search on Findory tries to do.

Xuehua said...

The article written by Mr. Raul Valdes-Perez had been there for quite a while. Another online article with negative opinion about personalization can be found using the link http://blog.outer-court.com/archive/2005-03-24-n33.html.

I more agree with the personalization using the short-term context than using the long-term context. A publication about using the short-term context can be found http://sifaka.cs.uiuc.edu/xshen/research_files/cikm05_ium.pdf.
Some research show that using long-term context, the result is indeed not good (http://research.microsoft.com/~sdumais/SIGIR2005-PersonalizedSearch.pdf)

Greg Linden said...

Thanks, Xuehau. I was aware that the Valdes-Perez article had been out in some form -- I have an old post with some comments on a version of it -- but the one page version was new to me. You may be right that it might be old as well.

Thanks for the references on the papers. I had skimmed your paper (Shen et al.) before. I'm not sure I like the focus on query expansion -- I think the changes can be too drastic and unintuitive to the user -- but it is an approach many have tried.

The Teevan et al. paper is quite good, I agree. A few months ago, I wrote up my thoughts on it if you're interested.

Thanks again, Xuehau.

Christian Langreiter said...

There's an alpha of personalized web search on Findory? Who'd have known! ;-)

Xuehua said...

Hi, Greg
I read SIGIR 2005 personalized search paper by Jaime Teevan, Susan Dumais and Eric Horvitz again. It is a very solid and good work and the paper is very well written. I have a couple of thoughts about it.

a) In Section 5.2 about the comparison of the performance, authors found "Web search personalization also performed somewhat better than ideal relevance feedback". Here the ideal relevance feedback means we know all highly relevant and relevant documents and use all of them to do relevance feedback. I think it is really counterintuitive. Here is the question. If we know all relevant documents in the document set, should a good ranking formula to put all relevant documents on the top of search result list? Or can we find a ranking formula to guarantee that the all relevant document are put on the top if we use all relevant documents to do relevance feedback?

b) The authors found that "no one parameter setting consistently returned better results than the original Web ranking, but there was always some parameter setting that led to improvements". Here the parameter setting means the choice of corpus representation, user profile presentation and document presentation. I am a little disappointed to see that NO personalized search algorithm tried from around 67 different combinations of the corpus, users and documents can beat the original web ranking, although the authors found the combination of personalized search and original Web ranking yields a small but (statistically) significant improvement over the original Web ranking. I wish we could find a principled way to do personalized search and improve the original Web ranking consistently and significantly (and of course statistically significantly).