Tuesday, February 23, 2010

How we all teach Google to Google

Steven Levy at Wired just posted an article, "How Google's Algorithm Rules the Web", with some fun details on how Google uses constant experimentation, logs of searches and clicks, and many small tweaks to keep improving their search results.

Well worth reading. Some excerpts as a teaser:
[Google Fellow Amit] Singhal notes that the engineers in Building 43 are exploiting ... the hundreds of millions who search on Google. The data people generate when they search -- what results they click on, what words they replace in the query when they're unsatisfied, how their queries match with their physical locations -- turns out to be an invaluable resource in discovering new signals and improving the relevance of results.

"On most Google queries, you're actually in multiple control or experimental groups simultaneously," says search quality engineer Patrick Riley. Then he corrects himself. "Essentially," he says, "all the queries are involved in some test." In other words, just about every time you search on Google, you're a lab rat.

This flexibility -- the ability to add signals, tweak the underlying code, and instantly test the results -- is why Googlers say they can withstand any competition from Bing or Twitter or Facebook. Indeed, in the last six months, Google has [found and] made more than 200 improvements.
Even so, this raises the question of where the point of diminishing returns is with more data and more users. While startups lack Google's heft, Yahoo and Bing are big enough that -- if they continuously experiment, tweak, and learn from their data as much as Google does -- search quality differences likely would be in an imperceptibly small chunk of long tail queries.

3 comments:

Mike said...

My experience, when I was working at Yahoo, was that tail queries are a huge percentage of the queries you get, so being able to learn from them is really important. Google's enormous number of users, even relative to Yahoo and Microsoft, mean that they have a lot more of that sparse tail data to learn from. I would also guess that there are many queries that are tail terms for the smaller search engines that aren't in the Tail for Google, simply because of its size.

dinesh said...

I thought the two comments from Microsoft were interesting:

“The algorithm is extremely important in search, but it’s not the only thing,” says Brian MacDonald, Microsoft’s VP of core search.

"If we don’t have a paradigm shift, it’s going to be very, very difficult to compete with the current winners," says Harry Shum, Microsoft’s head of core search development. "But our view is that there will be a paradigm shift."

Parsing the two statements, is the implication that a paradigm shift is coming in the UI but not in the algorithm or that one part of MS doesn't know what the other part is doing or we are praying as hard we can for something, anything to happen to change the status quo or something else?

Chris said...

Tail queries, not tail users. If I make 500 equally important queries a week, and Yahoo or Bing turn up complete garbage for 5% of them, then the entire search engine is useless for me.