Wednesday, November 10, 2004

Does size matter?

In his announcement about Google's index size, Bill Coughran makes a good counterargument against my claim that index size doesn't matter.

His basic point is that there are large classes of searches that return few or no useful results. The bar is pretty low in these cases. If you only return three results for a query, relevance rank isn't going to matter much. If you increase the number of results to five, that's probably helpful if the results you added are at all relevant. So, it's true. If you can increase the number of results in these cases without reducing relevance on other queries, you're helping people find what they need.

Nevertheless, is expanding a general crawl really the right approach? When you already crawl 4B pages, any additional pages you crawl will be deep in the crufty back alleys of the web. These kinds of documents can not only be useless, but can hurt overall search quality if they get surfaced inappropriately. Perhaps it would be more useful to do directed crawls of high quality or specialized data sources? Target specific holes in your coverage?

And there's other ways to be helpful. For example, perhaps the query is just the wrong query. Maybe the searcher needs help with query refinement (replacing search terms) or query expansion (broadening a search with synonymous or related terms).

Again, it all about relevance. If you can improve relevance in a minority of cases by expanding the general crawl without hurting the common cases, you'll improve the overall usefulness of the search engine. But increasing index size isn't the only way to improve relevance.

Update: Interesting post from Danny Sullivan on index size of and depth of the page indexed by the various search engines.

No comments: