Wednesday, September 27, 2006

Humans and algorithms and humans

John Battelle interviews Googler Matt Cutts. Some interesting excerpts from Matt on algorithms based on user data:
When savvy people think about Google, they think about algorithms, and algorithms are an important part of Google. But algorithms aren't magic ... quite often ... [they] are based on human contributions in some way.

The simplest example is that hyperlinks on the web are created by people ... Google News ranks based on which stories human editors around the web choose to highlight. Most of the successful web companies benefit from human input, from eBay's trust ratings to Amazon's product reviews and usage data. Or take Netflix's star ratings ... [they] are done by people, and they converge to pretty trustworthy values after only a few votes.
Findory is similar in that its recommendations are based on what humans find and discover. The knowledge of what is good and what is not comes from readers; it is people sharing what they found with each other.

Findory's personalization is like what happens on social networking sites, but all the sharing happens anonymously and implicitly. Findory's algorithms quietly do all the work behind the scenes so that everyone in the Findory community can recommend articles to each other.

Matt also has a quick warning about some of the issue with abuse and spam:
The flip side is that someone has to pay attention to potential abuse by bad actors. Maybe it's cynical of me, but any time people are involved, I tend to think about how someone could abuse the system. We've seen the whole tagging idea in Web 1.0 when they were called meta tags, and some people abused them so badly with deceptive words that to this day, most search engine give little or no scoring weight to keywords in meta tags.
See also my previous post, "Community, content, and the lessons of the Web", where I said, "We cannot expect the crowds to selflessly combine their skills and knowledge to deliver wisdom, not once the sites attract the mainstream. Profit motive combined with indifference will swamp the good under a pool of muck ... At scale, it is no longer about aggregating knowledge, it is about filtering crap."

See also my previous post, "Getting the crap out of user-generated content".

No comments: