Friday, October 15, 2004

Relevance rank and desktop search

Don Park doubts that Google Desktop Search, or any desktop search for that matter, can ever be compelling because page rank doesn't work for files on your desktop:
    The problem with desktop search is that, while the file system, email archives, and browser cache offers extra metadata, there are no hyperlinks among desktop documents. Without hyperlinks, you can't do page ranking Google is famous for.

    The core problem here is that search engines like Google throws everything into one pot. For web search, all the web pages on the Net gets thrown into that pot. Thankfully, hyperlink-based pageranking pulls the good stuff to surface with minimal hassle. With desktop search, all of your documents gets thrown into the pot without an equivalent of page ranking to measure relevance. IMHO, there aren't enough metadata on the desktop to achieve the same level of utility Google web search offers.
Page rank is great, but I think this overstates its importance. All is not lost without page rank.

Approximate understanding of the content and context of document text can determine importance and relevance. Full natural language understanding isn't necessary; statistical analysis of the text and structure of documents can be sufficient.

1 comment:

Hjalmar said...

I would guess that the desktop equivalent of PageRank would be how often a particular file, web page or email message has been accessed by the user. It gives an indication of the importance of the item to the user.

We use the number of accesses with good results on Spurl.net to measure the importance of given links to the user.