Thursday, April 20, 2006

New relevance rank in Google Scholar

Dejan Perkovic has the post on the official Google blog about an impressive new relevance rank feature in Google Scholar:
We try to rank recent papers the way researchers do, by looking at the prominence of the author's and journal's previous papers, how many citations it already has, when it was written, and so on.
On a search for [personalization], it was neat to see that the fifth result was a paper by Udi Manber and Ash Patel about personalization features on Yahoo.

On a search for [collaborative filtering], the first results appropriately are GroupLens papers, followed by an excellent Breese and Heckerman paper.

On a search for [web search], the Sergey Brin and Larry Page's paper gets the top result, followed by a paper from Steve Lawrence.

Great stuff. Very cool and very useful.

Update: As Dejan wrote in an update to his post and confirmed to me by e-mail and OR pointed out in the comments, the new feature is the "Recent articles" relevance rank. The "All articles" relevance rank (which I linked to in my examples above) is unchanged.

Hearing this, I think it is likely this feature is a response to the recent launch of Windows Live Academic and "order by date" features in that service. The Google Scholar "Recent articles" ranking is more complicated than just ordering by date, but it is unclear to me whether the rank is as simple as biasing the original "All articles" rank toward more recent documents or if there is more involved here. Understandably, given competitive issues, Dejan did not want to chat me up on the details.

In any case, let me add a couple comments to my original post. First, I had not noticed that the "All articles" Google Scholar relevance rank is as good as it now seems to be until I played with the service again after seeing Dejan's post. The ordering is much better than I recalled. Some of what I think of as the best recent papers in the field were at or near the top of many searches I tried.

As for the new "Recent articles" feature that Dejan posted about, it is interesting to try the searches again ([personalization], [collaborative filtering], and [web search]) with that enabled.

On the [web search] query, I see that the Jeff Dean 2003 IEEE "Google Cluster Architecture" paper now is in the top results, a good find. And it is flattering to see that one of my recent papers is near the top for a "Recent articles" query for [collaborative filtering].

I have tended to prefer the old Citeseer over Google Scholar, mostly because Citeseer makes it particularly easy to download a copy of the papers, but I often get frustrated with the slowness and poor relevance rank in Citeseer. I will have to turn to Google Scholar more frequently.

2 comments:

or said...

Greg did you use the sort by "Recent Articles" which is the new feature announced on the Google Blog. That's where it really starts to shine as not only are recent research papers shown but it is ordered by date and relevance. You may actually find recent research articles you have not heard about or read yet. Try your examples and sort by recent and see what happens. It reminds me of what Scoble always asked for - wanting a sort by date based on relevance - not just "sort by date" or "sort by relevance"

Shane said...

Google Scholar is a huge improvement over CiteSeer. The BibTeX export functionality is great (except for authors= instead of author=). The only other problem is sometimes you have to do a web search to find the pdf, but usually it is pretty good.