Wednesday, September 20, 2006

The Daily You paper

Lawrence Kai Shih and David Karger at MIT wrote an interesting WWW2004 paper, "Using URLs and Table Layout for Web Classification Tasks", about a news recommender system they called "The Daily You".

The recommender system is unusual in that it uses proximity on a page and similarities in the URLs to find related articles. From the paper:
In recommendation systems ... typically, the Web is treated as a large text corpus: the numerous features used are the words in the documents, and standard machine learning algorithms such as Naive Bayes or support vector machines are applied.

The Web is more than just text, however: it contains rich, human-oriented structure suitable for learning. In this paper, we argue that two features particular to Web documents, URLs and the visual placement of links on a page, can be of great value in document classification. We show that machine-learning classifiers based on these features can be simultaneously more efficient and more accurate than those based on the document text.

Our motivating example for these classification problems is The Daily You, a tool providing personalized news recommendations from the Web. The Daily You uses URLs and table layout to solve two important classification problems: the blocking of Web advertisements and the page regions and outbound hyper-links predicted to be "interesting" to its user.
Shih and Karger are saying that human editors already identify related articles by putting them in close proximity, either close together on a web page or by giving them similar URLs on their website. They try to extract and exploit that to generate good news recommendations. It is a cute idea.

No comments: