- The problem with desktop search is that, while the file system, email archives, and browser cache offers extra metadata, there are no hyperlinks among desktop documents. Without hyperlinks, you can't do page ranking Google is famous for.
The core problem here is that search engines like Google throws everything into one pot. For web search, all the web pages on the Net gets thrown into that pot. Thankfully, hyperlink-based pageranking pulls the good stuff to surface with minimal hassle. With desktop search, all of your documents gets thrown into the pot without an equivalent of page ranking to measure relevance. IMHO, there aren't enough metadata on the desktop to achieve the same level of utility Google web search offers.
Approximate understanding of the content and context of document text can determine importance and relevance. Full natural language understanding isn't necessary; statistical analysis of the text and structure of documents can be sufficient.
1 comment:
I would guess that the desktop equivalent of PageRank would be how often a particular file, web page or email message has been accessed by the user. It gives an indication of the importance of the item to the user.
We use the number of accesses with good results on Spurl.net to measure the importance of given links to the user.
Post a Comment