Some selected excerpts:
Search engines in the future will become better for a lot of different reasons, but one of the reasons will be that we understand the user better ... Search engines will have become more personalized.At the end of the interview, Marissa also indicated that they will not be personalizing advertising any time soon; the focus is on using individual searcher history to improve relevance, not to improve revenue directly.
We've been working on personalized search now for almost 4 years. It goes back to the Kaltix acquisition .... They were on the cutting edge of how personalization would be done on the web, and they were capable of looking at things like a searcher’s history and their past clicks, their past searches, the websites that matter to them.
We acquired them in 2003 and we've worked for some time since to outfit our production system to be capable of doing that [personalization] computation and holding a vector for each user in parallel to the base [vector].
Our standards are really high. We only want to offer personalized search if it offers a huge amount of end user benefit ... We're very comfortable and confident in the relevance seen from those technologies.
Overall, we really feel that personalized search is something that holds a lot of promise, and we're not exactly sure of the signals that will yield the best results. We know that search history, your clicks and your searches together provide a really rich set of signals ... It's a matter of understanding how... The more signals that you have and the more data you have about the user, the better it gets.
See also some of the papers -- "Scaling Personalized Web Search" and "An Analytical Comparison of Approaches to Personalizing PageRank" -- by the Kaltix folks.
See also my Feb 2007 post, "Google expands personalization", and my June 2005 post, "More on Google personalized search".
2 comments:
Overall, we really feel that personalized search is something that holds a lot of promise, and we're not exactly sure of the signals that will yield the best results. We know that search history, your clicks and your searches together provide a really rich set of signals ... It's a matter of understanding how...
Question: Is what she is saying related to what I was talking about a few days ago, how it is difficult to know what parts of one's search history ("signals") to tie to the current query? I.e. if I have searched for both games and movies in the past, and I now search for "pirates of the carribean", how does personalization know whether I am in my own personal "game" history mode or my own person "movie" history mode?
Is this the problem that Marissa is describing, here?
Hmm. No thoughts or reponses from anyone here, eh?
Well, I have a followup to my concern. If the problem is indeed which "signals" (search history, clicks, etc.) to tie to one's current query, then it seems like the problem only gets exacerbated, the more data you have.
The more I click things, the more I search for things, the more my profile starts to spread out, into all sorts of different areas, interests, opinions, patterns, etc. The more data or information the system has about me, the more difficult it becomes to know which parts of my search history ("signals") to tie to the current query, simply because there will be even more diversity in the search history, more signals to be confused by.
Conventional wisdom is that more data is better, because one can just look at the relative frequency of signals in my profile. Frequent past signals indicate frequent future directions, and vice versa.
But I don't buy this maximum likelihood approach. I.e. just because 90% of my search history contains searches for games, and 10% contains searches for movies, does not mean that there is a 90% chance I mean the game, when I type "pirates of the carribean".
But I confess; recommender systems and user modeling are not my specialty. There must be something that I am missing here, because obviously Google loves the idea and thinks it is getting a lot of relevance boost with the idea.
I really would be curious to hear comments from Greg's other readers - why should I not be concerned about diversity and confusion among signals, as one's personal data profile grows?
Post a Comment