Comments on Geeking with Greg: Explicit vs. implicit data for news personalization

Thanks, Peter, I was reading that paper a few days...

2007-05-07T05:43:00.000-07:00

Thanks, Peter, I was reading that paper a few days ago.

It is a remarkable piece of work -- an impressive and even intimidating example of what Google can do with its cluster and data -- but not without at least some flaws.

I have an analysis of the paper drafted, but have not had a chance to convert the notes into a post. As I am sure you can guess, that takes a fair amount of time, and, unfortunately, I am traveling right now. I hope to have something up next week.

Greg,Since you are discussing www2007, did you cat...

2007-05-06T19:02:00.000-07:00

Greg,

Since you are discussing www2007, did you catch the Google paper "Google News Personalization: Scalable Online Collaborative Filtering". It gives some explicit details on using mapreduce for large scale EM algorithm implementations and collaborative filtering. Any thoughts on the approach they describe?

-Pete

I think the seeming paradox in the recommendations...

2007-05-03T09:11:00.000-07:00

I think the seeming paradox in the recommendations paper can be resolved in a similar way. Users like the feeling of being in control, but they turn out to often have a hard time using that control to get what they want.

Apologies for being dense, but I still do not see how this resolves the seeming paradox in the recommendations paper. Users are lazy, right? But when it comes to the recommendations ("push") arena, their desire for control overrides their laziness, so that they actually prefer the explicitization, even if they suck at using it. At the same time, when it comes to the ad hoc search ("pull") arena, users' desire for control does not override their laziness. Users end up not wanting to give "add/remove term" feedback. Even though the experiments show that they would get better results if they did.

So this is what is confusing me. Here you have the same sort of explicitization being done, essentially the same type of data, with the same type of "add/remove term" user interface. And in one case, desire for control overrides laziness, and in the other laziness overrides desire for control. And again, the only real difference between the two cases is that one is "push" and the other is "pull". What gives, here?

Greg, how right of you to call me out if I am maki...

2007-05-03T08:57:00.000-07:00

Greg, how right of you to call me out if I am making too big of a leap, there. But I'm not (or the K&B paper is not) talking about any sort of advanced query syntax. The K&B paper is very similar to this WWW'07 paper; it is about showing the user the dozen terms that it system would automatically add to the query, and giving the user the power to veto or add to that list. "Add or remove" a word is hardly advanced query syntax, and hardly takes training to understand. It is just taking implicitly added data, and making it explicit and manipulatable by the user.

So again, given that both papers seem to be doing something very similar, i.e. "explicitizing" a dozen or so automatically-generated query terms and letting the user manipulate those terms, why does the nature ("pull" versus "push") of the information gathering have such an impact on the effectiveness of the explicitization? (Ad hoc search is "pull", recommendation is "push".) Explicitization seems to work for "pull", but not work for "push". Why?

Thanks for the link to that paper. We have already...

2007-05-02T20:05:00.000-07:00

Thanks for the link to that paper. We have already taken the position that implicit information may be more useful than explicit and we are busily working on the gathering, summarizing and utilizing of tons of implicit information about our members.
That being said, we could also use some expertise in this area so if you know of someone who may be a good fit, please have them contact me.

Hi, Jeremy. You make good points as always.Howeve...

2007-05-02T19:06:00.000-07:00

Hi, Jeremy. You make good points as always.

However, I think you might be making a leap from seeing less iterations with searching with a different interface to finding a reduction in the total amount of work.

Not only does a more complicated interface take time to learn and understand, but it may take more time per search to formulate a good query. It easily could be more iterations with less work per iteration versus fewer iterations with more work on each iteration.

I think an example of this is advanced search syntax, which is rarely used, but does allow more complicated queries. Why is it rarely used? Because it is faster to iterate with a simple interface.

I think the seeming paradox in the recommendations paper can be resolved in a similar way. Users like the feeling of being in control, but they turn out to often have a hard time using that control to get what they want.

Ok, so manipulating keywords is not the best appro...

2007-05-02T17:57:00.000-07:00

Ok, so manipulating keywords is not the best approach for news recommendation. But I do find one of the other studies that the authors cite quite interesting:

"Koenemann and Belkin [15] on different levels of the user’s control on the expansion terms show that the increasing openness of the expansion terms and the higher level of the user’s control on the query expansion improve search effectiveness. This includes the findings that 1) participants performed 15% better when they were able to view and manipulate the terms, 2) they used less iterations to develop equally good or better queries [snip]"

So while word-level manipulation is not good for recommendation, it is good for ad hoc (web style) search. In the latter task, not only is precision improved, but fewer iterations are required.

In other words, users are lazy, and allowing them to directly manipulate explicit ad hoc search data cuts down the total amount of work that they have to do! And yet I constantly hear from you, from Danny Sullivan, from Google, that users are unwilling to do this type of explicit work, even if it means less overall work, total. Users, paradoxically, would rather take the long route, and type in more queries, over and over.

I think this recommendation paper you cite suffers from a similar paradox. With adhoc search users can get better results with less total work by manipulating explicit data, but they do NOT like to do it. With recommendation, users actually get worse results with more total work by manipulating explicit data, but users actually DO like to do it.

Do you see these eerie reverse parallels? Any thoughts on the origin or nature of these paradoxes?