Monday, July 03, 2006

Combating web spam with personalization

In a BBC article by Darren Waters, Google Engineering VP Douglas Merrill says of web spam:
"Spam is an arms race," said Mr Merrill, adding it was a multi-million dollar industry which was trying to fool search engines.

Spammers exploit the way search engines work by bombarding blogs and comments pages with links to their websites. Google prioritizes websites in their search results if a particular page is linked to by other sites.

Mr Merrill said: "Spammers are highly motivated. There is a lot of money at stake."
There is a huge amount of value from getting to the top of the search results. If spammers get their links to the top of the page, billions will see it.

One way to reduce the value is to catch as much spam as possible and eliminate it. This reduces the average value from spamming, but the lucky few who get through the filter continue to receive a massive payoff.

Another way to reduce the value is to reduce the maximum payoff. If different people see different search results, spamming becomes much less attractive. The jackpot from getting to the top of the page disappears. A successful spam link will be shown to millions, not billions.

Personalized search shows different search results to different people based on their history and their interests. Not only does this increase the relevance of the search results, but also it makes the search results harder to spam.

This problem is not unique to web search. For example, Digg, a site that produces a list of popular weblog articles, offers an attractive target for spammers. If Digg could offer different lists for different people, perhaps showing "most popular for people like me", it would reduce the incentive to try to manipulate the site to get to the top of the page.

In general, anywhere we show the same list to millions of people, we create an incentive to manipulate the list. We can reduce that incentive by filtering spam. We can also reduce that incentive by not showing the same list to millions of people, instead showing different lists to different people based on their interests. The incentive to spam fades, fragmented into a complex nest of personalized choices.

No comments: