Monday, December 17, 2007

Eric Enge interviews Sep Kamvar

Eric Enge posted an interview with Google personalization guru Sep Kamvar.

Some highlights of Sep's answers below:
The two signals that we use right now are the search history and the location. We constantly experiment with other signals, but the two signals that have worked best for us are location and search history.

Some signals that you expect would be good signals, turn out not to be that good. So for example, we did one experiment with Orkut, and we tried to personalize search results based on the community that users had joined. It turns out that while people were interested in the Orkut communities, they didn't necessarily search in line with those Orkut communities.

It actually harkened back to another experiment that we did, where in our first data launch of personalized search we allowed everybody to just check off categories that were of interest to them. People did that, and people would check off categories like literature. Well, they were interested in literature, but they actually didn't do any searching in literature. So, what we thought would be a very clean signal, actually turned out to be a noisy signal.

When I think about what I am interested in, I don't necessarily think about what I am interested in that I search for and what I am interested in that I don't search for. That's something that we found was better learned algorithmically rather than directly.

A signal should be very closely aligned with search and what you are searching for in order for it to be useful to personalizing search ... In addition, we've found that your more recent searches are much more important than searches from a long time ago.
For more on the problems with explicitly extracting preferences -- as Sep found when explicitly asking for each user's category interests in an early version of Google Personalized Search -- please see my post, "Explicit vs. implicit data for news personalization", and the links from that post.

For more on trying to use signals not closely aligned with search, please see my earlier post, "Personalizing search using your desktop files".

For more on the importance of focusing on recent searches for personalized search, please see also my past posts, "The effectiveness of personalized search" and "The many paths of personalization".

1 comment:

jeremy said...

In addition, we've found that your more recent searches are much more important than searches from a long time ago.

To me, that sounds more like "search as a dialogue" than "personalized search".

As far as the "explicit" vs. "implicit" thread, I've already hashed that out with you, in some of the previous threads you mention. But I do want to point out that there is still a difference between the type of explicit interest group clustering that Google has done ("I like literature") and the explicit search feedback that I am so keen on.

The explicit behavior Sep is talking about is a priori behavior.. interest declaration before the search has started. I very much believe him that declaring your interest in literature is not going to help you a year later.

But when I declare, in the moment, my interest in, let's say, "surfing vacations", then I see no problems with the system asking me for a refinement: "Did you mean California surfing? Hawaiian surfing? Tavaruan surfing? Costa Rican surfing? Irish surfing? French surfing?" etc.

I might never have even known that French surfing existed. Especially since it does not appear in the top 10 of a Google search. But in that moment, the query refinement tool has made that information known to me, and I can click and explore the wonderful world of French surfing.

This is something that I would never get from advanced query operators, because I would have never even known to ask about surfing in France to begin with, no matter how advanced the operators are. It is also something that I would never get from implicit, "personalized" search, either, since the chances are that I probably was not very recently searching for anything in France.