Friday, September 17, 2004

Search as a dialogue

Microsoft Research's text mining group says:
    A search engine should be more helpful than merely delivering a large list of documents to a user. We are building prototypes that give users a better search experience by allowing users to more effectively navigate results and have a dialog with the search engine to find what they’re looking for.
The focus on search as a dialogue is interesting. A major flaw of current search engines is that each search is treated as independent.

For example, let's say I'm trying to find discussions some of the topics covered at Foo Camp. I might start by searching for "foo camp". Not satisfied with those results, I might change it to "foo camp blogs". That doesn't get me what I want. I try "foo camp web feeds". And so on. I'm repeatedly refining my search query, trying to find the information I need.

But current search engines ignore this stream of related queries, this dialogue, instead treating each search as independent. There is an opportunity for techniques that focus explicitly on this kind of refinement process, using all the information to help you find what you need more efficiently and reliably. Personalized search is one of these techniques.

[Thanks, Mary Jo Foley, for pointing out quote on the MSR page]


Brendon J. Wilson said...

[Mirroring a comment I made on Jeremy Zawodny's web site]

An interesting observation. I was thinking of something similar to address the inability of search engines to incorporate direct user feedback. For example: Did the user skip some of the search results - perhaps those results are less relevant? Did the user stay at one of the results for a prolonged period of time, actively reading the content (indicated by mouse/keyboard activity) - perhaps that result is more relevant? Shouldn't that information be propagated back into the search indices, along with all the key word and inbound and outbound link reference counts?

There is a strong need to incorporate these indicators and others to enable search engines to understand the user before they even enter a search term. This is especially true given the exponential growth in available information - growth which is quickly rendering most search results useless (search engines, one must add, that fail to reach the deep web that stretches into corporate databases with localized web search interfaces). I wonder: could a user's social network, ethnic, cultural or educational background assist the search engine understand the context surrounding a set of search terms? If, for example, a user is searching for the term "Java" and the search engine knew that the user's friends had all executed searches for Java and eventually settled on results with a limited set of terms related to the Java computer languages, wouldn't it make sense to use that information in some way?

The distributed nature of the information required by a search engine to truly understand the context behind a user's search query, coupled with the need to diffuse the risk of having such a critical tool in the hands of a few companies, suggests to me that there might be a need to shift towards a P2P-based search engine approach.

Greg Linden said...

Hi, Brendon. Interesting comment. A P2P search engine is an interesting idea, but it faces some serious hurdles.

Bootstrapping: The system is useless until you achieve critical mass and critical mass is likely quite large. You'll need to have some mechanism for making the system usable and useful when there's only tens of thousands of users.

The network: You're talking about sending a lot of data around to a lot of peers on every search request. Latency would certainly be an issue and bandwidth may be an issue.

Coverage: Even if you send your query to thousands of users on the P2P network, the pages they have in their cache will be a very small sample of the pages available. You'll fail to index much of the web. Querying more peers on each request would help less than you might think. You'd rapidly hit diminishing returns since there's overlap in what people browse. And you'd rapidly encounter scaling issues as you ask more and more of the network do to work on each search request into the network.

In the end, it's an interesting idea, but I'm not convinced it would produce a compelling product. I think the quality of the search results would be low compared to the deeper crawls and more extensive data analysis done by Google and other dedicated search engines.

All that being said, there are people working on P2P search. Napster, Gnutella, and FreeNet are interesting examples. Grub is an example specifically focused on web search (though just the crawl is distributed, I believe).