Friday, September 17, 2004

Search as a dialogue

Microsoft Research's text mining group says:
    A search engine should be more helpful than merely delivering a large list of documents to a user. We are building prototypes that give users a better search experience by allowing users to more effectively navigate results and have a dialog with the search engine to find what they’re looking for.
The focus on search as a dialogue is interesting. A major flaw of current search engines is that each search is treated as independent.

For example, let's say I'm trying to find discussions some of the topics covered at Foo Camp. I might start by searching for "foo camp". Not satisfied with those results, I might change it to "foo camp blogs". That doesn't get me what I want. I try "foo camp web feeds". And so on. I'm repeatedly refining my search query, trying to find the information I need.

But current search engines ignore this stream of related queries, this dialogue, instead treating each search as independent. There is an opportunity for techniques that focus explicitly on this kind of refinement process, using all the information to help you find what you need more efficiently and reliably. Personalized search is one of these techniques.

[Thanks, Mary Jo Foley, for pointing out quote on the MSR page]

1 comment:

Greg Linden said...

Hi, Brendon. Interesting comment. A P2P search engine is an interesting idea, but it faces some serious hurdles.

Bootstrapping: The system is useless until you achieve critical mass and critical mass is likely quite large. You'll need to have some mechanism for making the system usable and useful when there's only tens of thousands of users.

The network: You're talking about sending a lot of data around to a lot of peers on every search request. Latency would certainly be an issue and bandwidth may be an issue.

Coverage: Even if you send your query to thousands of users on the P2P network, the pages they have in their cache will be a very small sample of the pages available. You'll fail to index much of the web. Querying more peers on each request would help less than you might think. You'd rapidly hit diminishing returns since there's overlap in what people browse. And you'd rapidly encounter scaling issues as you ask more and more of the network do to work on each search request into the network.

In the end, it's an interesting idea, but I'm not convinced it would produce a compelling product. I think the quality of the search results would be low compared to the deeper crawls and more extensive data analysis done by Google and other dedicated search engines.

All that being said, there are people working on P2P search. Napster, Gnutella, and FreeNet are interesting examples. Grub is an example specifically focused on web search (though just the crawl is distributed, I believe).