Monday, August 02, 2004

Topix.net and a new NewsRank

Rich Skrenta announces that Topix.net has launched "a next-gen version of our NewsRank(tm) story technology [which] powers the relevance, accuracy and magnitude of the stories categorized on Topix.net."

In many ways, Topix.net and Findory News are trying to solve the same problem, surfacing the news you need from thousands of different news sources. But they use very different approaches. Topix.net provides fine-grained categorization of news articles (e.g. a news category for paleontology, not just for science), so users can dive in to specific categories to get what they want. Findory uses personalization to learn your interests and automatically surface the news you need. Different, perhaps even complementary, approaches to the same problem.

Topix does have some impressive technology. It's not at all easy to categorize and prioritize articles using content analysis, but Topix.net does a pretty good job. How do they do it?
    Not with human editing, source tagging, or keyword scanning. The Topix.net NewsRank engine is reading each story individually, determining locality and subject information based on the content of the article.

    Categorizing sources in order to produce topic aggregations doesn't work. Susan Mernit writes a great blog about online media, but she also writes about food and other personal topics. Blindly adding her entries to a food or media industry aggregation would result in inappropriate posts showing up.

    Source-based categorization doesn't work for local, either. The San Francisco Chronicle runs stories that aren't about San Francisco. Conversely, there are many stories about events in SF that show up in news sources based outside of San Francisco. These stories would be missed with source-based tagging.

    Keyword-driven filters are also a poor solution. Pulling every story out of the news stream with "San Francisco" in it will not make a good SF rollup, but instead will yield a random jumble of posts, most of which merely mention "San Francisco", but overall have nothing to do with it:

    ... on a business trip to San Francisco, ...
    ... an unrestricted free agent from San Francisco, ...
    ... was bound from Alaska to San Francisco in the winter of 1860 ...
    ... moved, with her family to San Francisco in 1960, ...

    The situation is even worse if the keyword is ambiguous ("Kerry", "Bush", "Springfield").

    Our solution is to disambiguate references to people, places and subjects, and match them against our Knowledge Base of 150,000 topics. The result lets our algorithmic story editing technology leverage a much finer-grained idea of what a story is about than simply using the big 7 news categories (US, World, Business, Sci/Tech, Sports, Entertainment, Health.)
Clever, and it seems to work quite well. Nice work.

1 comment:

Anonymous said...

There seems to be dozens of news aggregatioon sites offering similar services. Must admit Topix is resourceful. But are such news aggregators than can rip news from a few hunderd sites and do simple key word tagging or categorization available for web masters? I can find a lot of services but not any software for sale.. any hints?