Sunday, April 25, 2010

Google News hybrid recommendations

Three Googlers published a paper, "Personalized News Recommendation Based on Click Behavior" (ACM), at the recent IUI 2010 conference that describes a hybrid recommender system combining user-based and content-based recommendations. This new hybrid recommender now appears to be deployed on Google News.

Some excerpts from the paper:
[The] previous Google News recommendation system was developed using a collaborative filtering method. It recommends news stories that were read by users with similar click history. This method has two major drawbacks ... First, the system cannot recommend stories that have not yet been read by other users ... Second ... news stories [that] are generally very popular ... are constantly recommended to most of the users, even for those users who never [are interested because] ... there are always enough clicks ... to make the recommendation.

A solution to these two problems would be to build profiles of user's genuine interests and use them to make news recommendations. The profiles ... filter out the stories that are not of interest ... [and recommend stories] even if [they have] not been clicked on by other users ... Based on a user's news reading history, the recommender predicts the topic categories of interest ... News articles in those categories are ranked higher in the candidate list.

On average, the hybrid method ... improves the CTR [of] the existing collaborative method by 30.9% ... [and increased] the frequency of website visits in the test group [by] 14.1%.
Hybrid recommenders are not that new. In the past, as in this paper, they usually were motivated by trying to deal with the sparsity and cold start problems that challenge collaborative filtering recommenders. Hybrid systems also have been used to deal with the so-called Harry Potter problem -- recommendations that focus too much on popular items -- by constraining the collaborative recommendations to the interests expressed in the profile, though that often can be better dealt with by tuning a collaborative recommender to discourage correlations between unpopular and popular items.

One thing that is surprising in this paper is the use of high-level topics rather than fine-grained topics. I would think that you would be better off getting as specific as possible on the profile, then branching out to related topics. The paper briefly addresses this, arguing that "specializing the user profile may limit the recommendations to news that the user already knew", but that seems like it would only happen if you rather foolishly only used read topics rather than including topics that appear to be related to read topics.

By the way, when you have as much data as Google should have, it is not at all clear you want to fall back on a content approach like they did in this paper here. Yehuda Koren, for example, has convincingly argued that, when you have big data, latent factor models extract these content-based relationships automatically in much more detail and much more accurately than you could hope to do with a manually constructed model.

Finally, I cannot quite let this one go by without mentioning that Findory was a hybrid news recommender, launched in January 2004, that dealt with the cold start and sparsity problems of a collaborative recommender, the same problems the Google News team apparently is still struggling with six years later. Findory is not mentioned in this paper in the related work, but I know the Google team is quite aware of Findory.


Unknown said...

DailyMe built its Newstogram platform to do profile-based personalized news recommendations for news sites. We're seeing similar or even better results on some of the sites testing it now, and we're rolling it out to other sites rapidly.

We agree that a profile-based approach coupled with other forms of recommendations, especially for light users, is the best approach for news content.

Greg Linden said...

Hi, Neil. There have been many attempts to do profile-based versions of personalized newspapers where articles are selected based on keywords or categories. One of the early ones was the Krakatoa Chronicle (co-created by Krishna Bharat, who also created Google News). That approach has problems with the quality of the recommendations, which is why the many attempts to do that in the past have had limited success.

What is notable about this work at Google (and Findory well before it) is that it uses a hybrid recommender that combines content-based recommendations with collaborative filtering.