I was playing with a few broad search queries recently and the results surprised me.
I was expecting any query that returns most of the Web (such as [the]) to yield links to the sites with the highest PageRank. Such an indiscriminating query would seem to provide very little basis for doing much else.
So, I would expect Google search results for the most popular English words ([the], [of], [to], [and]) to be the same, as also the search results for [* *] (any two words separated by a space), [the * the] ("the" followed by some words followed by "the"), and [1..1000] (any number between 1 and 1000).
As you can see by clicking on those links, all of those queries return most of the Web, between 5B pages (for [the * the]) and 20B pages (for [and] and [to]). But they return very different results.
The top result for [the] is "The Onion". The top result for [of] is "The Library of Congress". For [to], "Welcome to the White House". For [and], NASA.
The results for [1..1000] seem to be the closest to what I was expecting. It shows links to Netscape, Mozilla, Microsoft, IE, Macromedia Flash, Apple Quicktime, perhaps the most linked to sites on the web?
But, no, even those do not appear to be in PageRank order. For example, the first search result, the Netscape site, only has a PageRank of 8/10 and the Macromedia Flash site a PageRank of 5/10. Huh, again, not quite what I was expecting.
Curious. Why would these results differ so wildly? All of these pages contain these words. Why the difference in what makes it to the top?
Perhaps we should look at what other sites do as well. What do the results look like for [the], [of], [to], and [and] on Yahoo Search? They also differ from each other and from Google's. In fact, these results seem even more strange, with Crate and Barrel making it to the top on [and] and some site called To-Done being the top for [to]. Hmm...
Only MSN Search behaves even remotely close to what I was expecting. Results for [of], [to], [the], and [and] are fairly similar.
Looking at the Google results again, it may be the case that page title and text in links counts for a fair amount. "The Onion" may get its prime spot on a search for [the] because a lot of people link to it as The Onion. But, does that explain why the White House gets top billing for [to] when only the page title ("Welcome to the White House") has that word?
Perhaps this is just a spot where small weightings deep in the Google guts make a nonsensical difference. When there's so little information about searcher intent -- a search for [the] or [and] -- it matters little what you show. Little tinkers here and there that might make a difference when intentions are clearer are probably just revealing themselves in odd ways for these broad queries.
Nevertheless, I thought it was curious. Not what I expected to see.