Tuesday, June 06, 2006

Beyond PageRank, RankNet, and fRank

Matthew Richardson, Amit Prakash, and Eric Brill from Microsoft Research presented a paper, "Beyond PageRank: Machine Learning for Static Ranking" (PDF), at WWW 2006.

The paper describes a neural network-based ranking function, fRank, that they trained to determine the importance of various features for relevance rank of web search results.

The authors show that naive PageRank is no longer effective for relevance rank, producing the worst results in their tests and offering ranking no better than ranking solely based on features of the domain of the page.

That result is not surprising. Naive PageRank has been under assault by spammers for many years and is no longer effective.

This is not to say that all information extracted from the link graph is useless, but that it must be adapted to the deluge of spam using techniques such as those explored in TrustRank.

Unfortunately, this paper does not explore that issue, which makes it hard to interpret their findings. I am left uncertain whether they have found a significant improvement in relevance rank, as they seem to claim, or if they merely knocked over a strawman.

By the way, despite having no authors in common, this paper is closely related to the MSR RankNet work described in a 2005 paper "Learning to Rank using Gradient Descent" (PDF). I talked about that paper in painful detail in a previous post.


Anonymous said...

Greg, algorithms like neural network requres a independent variable, a set of dependent variable, and a training set to work . . .so how do they determine the relevance of a page so that they can train the algorithm in the first place? (a novice)

Greg Linden said...

Hi, Anonymous. That is described in the paper in section 5.1. Do you have a question about it?