Thursday, October 27, 2005

Getting the crap out of user-generated content

Xeni Jardin has an article up on Wired called "Web 2.0 Cracks Start to Show".
Web 2.0 is very open, but all that openness has its downside: When you invite the whole world to your party, inevitably someone pees in the beer.

These days, peed-in beer is everywhere. Blogs begat splogs -- junk diaries filled with keyword-rich text to lure traffic for ad revenue ... Experiments in participatory media attract goatses as quickly as they do legitimate entries, like the Los Angeles Times' experimental wiki, which was pulled after it was defaced.
Websites hosting user-generated content need to be designed with the idea that much of the content will be crap or spam.

Small sites work dandy when they're only used by early adopters. Early adopters are dedicated, so the quality of the content is high. Traffic is low, so spammers don't care about them.

As they grow, as traffic increases and the products start to attract a mainstream audience, the incentive for spam go up. Suddenly, there's a profit motive, an ability to reach a wide audience at low cost. The spam floods in.

Using captchas -- the "Are you human?" test -- is one approach to dealing with spam that is discussed in the article, but it doesn't help shovel through all the uninteresting crap that is generated by real humans.

Other techniques for getting the crap out include editor review of recent changes (Wikipedia, Craigslist), asking your users to report abuse (Craigslist, MSN Spaces, Blogger), user moderation (Slashdot), relevance rank to suppress poor content (Google Blog Search, all web search engines), and personalization to elevate content that interests you (Findory).

Any site dealing with user-generated content should be using these techniques. There is wisdom in that crowd, but you're going to have to dig to find it.

No comments: