Friday, October 19, 2007

Google News, Krishna Bharat, and RecSys 2007

Google researcher and creator of Google News Krishna Bharat just gave the keynote talk at the Recommender Systems 2007 conference.

The talk had a pleasantly idealistic focus on increasing access to knowledge. Krishna clearly sees helping people find news information as a noble and important mission.

Krishna devoted most of the early part of the talk to discussing the history of writing and information broadcast, ending with the claim that the Web was creating a change in news consumption and access equally revolutionary to radio and television.

This revolution comes from universal and easy access to news and lower costs of producing news. Krishna saw this as having a very broad impact, saying, "The Internet can (and will) do a lot for democracy."

Even so, Krishna warned of challenges, saying, "Technology enables free speech but doesn't guarantee it," and expressing concerns about censorship. He said the goal of Google News was to ensure the spread of knowledge, multiple perspectives, and differing opinions.

To provide multiple perspectives, Google News crawls a broad list of sources, ranks and clusters them, then explicitly exposes the clusters to readers. That makes it easy for people to see the difference in, for example, how a hostage crisis is covered in South Korea and Pakistan.

The clustering attempts to group stories on the same event together. Krishna made the interesting comment that the clusters will change with time, with old and new stories shifting clusters as follow-up stories on an event appear. They use a technique Krishna only broadly described as an agglomerative hierarchical clustering algorithm.

Krishna provided more details on how Google determines the relevance of stories and authority of sources. He started by describing how human editors determine the relevance of stories, a long list that included scope/impact, urgency, lack of negativity, unexpectedness, lack of ambiguity, the "human element", ability of the audience to identify with the story, elite (e.g. celebrity) references, consonance, continuity, market forces, local bias, and ideological bias.

Krishna then said that Google determines article relevance by looking at the authority of the source, timeliness of the article, whether it is an original piece, placement by the editors on the source page, the apparent scope and impact, and the popularity of the article.

A big piece of determining the relevance of a story is determining the authority of the source. Google estimates that by looking at the characteristics of all the articles produced by the source (including number of non-duplicate stories, length of the articles, breadth of the articles, number of important/breaking stores, click rate by Google News readers, and the average quality of the writing), PageRank of the news website, and real world data on the news company (e.g. number of employees).

Krishna did see personalization and recommendations for news as a long term goal, saying we want to "get the right news to the right audience." And, Krishna has been interested in this for a long time, all the way back to his 1995 work on the Krakatoa Chronicle. As for more recent work, Krishna summarized the WWW 2007 paper, "Google News Personalization".

Overall, Krishna focused on the Google's mission of making information universally accessible and useful. He clearly wants to help people find news and be informed about world events, using whatever tools, personalization or otherwise, serve that mission.

No comments: