Friday, October 16, 2009

An arms race in spamming social software

Security guru Bruce Schneier has a great post up, "The Commercial Speech Arms Race", on the difficulty of eliminating spam in social software. An excerpt:
When Google started penalising a site's search engine rankings for having ... link farms ... ­[then] people engaged in sabotage: they built link farms and left blog comment spam to their competitors' sites.

The same sort of thing is happening on Yahoo Answers. Initially, companies would leave answers pushing their products, but Yahoo started policing this. So people have written bots to report abuse on all their competitors. There are Facebook bots doing the same sort of thing.

Last month, Google introduced Sidewiki, a browser feature that lets you read and post comments on virtually any webpage ... I'm sure Google has sophisticated systems ready to detect commercial interests that try to take advantage of the system, but are they ready to deal with commercial interests that try to frame their competitors?

This is the arms race. Build a detection system, and the bad guys try to frame someone else. Build a detection system to detect framing, and the bad guys try to frame someone else framing someone else. Build a detection system to detect framing of framing, and well, there's no end, really.

Commercial speech is on the internet to stay; we can only hope that they don't pollute the social systems we use so badly that they're no longer useful.
An example that Bruce did not mention is shill reviews on Amazon and elsewhere, something that appears to have become quite a problem nowadays. The most egregious example of this is paying people using Amazon MTurk to write reviews, as CMU professor Luis voh Ahn detailed a few months ago.

Some of the spam can be detected using algorithms, looking for atypical behaviors in text or actions, and using community feedback, but even community feedback can be manipulated. It is common, for example, to see negative reviews get a lot of "not helpful" votes on, which, at least in some cases, appears to be the work of people who might gain from suppressing those reviews. An arms race indeed.

An alternative to detection is to go after the incentive to spam, trying to reduce the reward from spamming. The winner-takes-all effect of search engine optimization -- where being the top result for a query has enormous value because everyone sees it -- could be countered, for example, by showing different results to different people. For more on that, please see my old July 2006 post, "Combating web spam with personalization".

No comments: