Thursday, March 09, 2006

Growth, crap, and spam

There seems to be a repeating pattern with Web 2.0 sites. They start with great buzz and joy from an enthusiastic group of early adopters, then fill with crud and crap as they attract a wider, less idealistic, more mainstream audience.

Memeorandum appears to be the latest example of this. After gushing about Memeorandum in hundreds of posts, sending huge numbers of people to check out the site, Robert Scoble suddenly said he is signing off of Memeorandum, explaining that he's tired of what he sees now, little useful information, a lot of snarky articles from people seeking traffic.

Similarly, in the early days of Digg, it attracted praise as more interesting and useful than Slashdot. Traffic grew and, only a short time later, Digg started to fill with spam and crap. Russell Beattie had a good post about this where he said that Digg is "really full of crap" and that "posts that have more quality" are "getting lost in a continual din of rumor mongering [and] grandstanding."

How many times does this cycle have to repeat before people start building systems designed from the start to deal with bad behavior, crap, and spam?

Sure, you can get away without it while you're small. Spammers don't care about you when you're small; there's no profit motive. But, if you ever hope of building anything that works for the mainstream, you need can't assume everyone will play nice.

If your product tries to help people find stuff they need, you need to design from the start to surface the good stuff and filter out the crap.

See also Xeni Jardin's article in Wired, "Web 2.0 Cracks Start to Show", where she says, "When you invite the whole world to your party, inevitably someone pees in the beer."

See also my earlier posts, "Getting the crap out of user-generated content" and "Digg, spam, and most popular lists".

Update: Gabe Rivera, founder of Memeorandum, takes issue with my claims and says I mischaracterized Scoble's issues in the comments to this post. He makes good points, and it is well worth reading his thoughts.

Update: Six months later, the problem with spam on Digg gets worse.

Update: Nine months later, it appears my prediction about Memeorandum and spam did not come true. TechMeme appears to be focusing on mainly on high quality, more popular weblogs. These weblogs already have a lot of traffic and little incentive to manipulate TechMeme. This approach may exclude some of the long tail of blogs, but is effective at increasing quality and reducing spam.

12 comments:

Anonymous said...

It looks the personal opinion of Robert Scoble that content on memeorandum is crap. If the articles written by him are not appearing with the regularity that they were apppearing in the past then it does not translate into "crap".

I bet you his article about memeorandum being crap will make it to tech.memeorandum. So is he contributing to the crap he is just complaining about.

Alex Bosworth said...

"Sure, you can get away without it while your small. Spammers don't care about you when you're small; there's no profit motive. But, if you ever hope of building anything that works for the mainstream, you need can't assume everyone will play nice."

This is the same deal with people not designing their systems to be massively scalable from day one, it's just a different kind of scaling issue.

But for every 1 site that grows suddenly to hyper-popularity, there are 1000 sites that don't.

Why waste time building things for a condition that is likely to never happen? It's better just to try not paint yourself into a corner and deal with scaling issues as they come.

This is an especially hard one to deal with ahead of time, and doesn't ever really seem to be a fatal error, rather an annoyance.

Digg's growth is still strong, I think it's more like the phenomenon of people saying, 'oh no one goes there anymore, it's way too popular'.

If you spend a lot of time making Findory air-tight against the potential that people might eventually try to game the recommendations, it's probably a mistake.

As we saw with Koolaidguy's antics, Digg had so little protection that a single person with a few bookmarklets could push arbitrary stories in front of 100k people.

It caused them problems to be sure, but they spent time developing the safeguards and in the end they're still going strong, at least according to Alexa: http://www.alexa.com/data/details/traffic_details?&range=6m&size=medium&y=r&url=http://findory.com/#top

Gabe said...

Greg, do you have any data, any shred of evidence, that spam, or "crap" has increased on memeorandum? Scoble's problem was that he wasn't reading his own favorite feeds, because he was so addicted (his words) to memeorandum. That problem will always arise when you exclusively read something that isn't designed to be comprehensive. Memeorandum isn't.

Memeorandum was indeed designed from the start to deal with crap. Doesn't mean it's a 100% success, but I think I've managed the tradeoffs pretty well, and things are always improving.

Greg, I don't want to seem condescending, but when you're trashing other people's stuff to tout your own product, it's doubly important to make sure your arguments seem reasonable. You don't want your credibility to turn to, uh, crap.

Greg Linden said...

Hi, Alex. You make a good point that it may not make a lot of sense to do a huge amount of work on any part of a system before it is really necessary.

On the other hand, making these systems robust to crap is the hard part of building them. Building an simple but unsustainable system sets you up in a trap. Once you get big, if you have never thought carefully about it, you may not be able to dig yourself out of the hole.

Greg Linden said...

Hi, Gabe. Good to hear from you!

You make a great point. To be honest, I'm going off of what Robert said in his post, nothing more. I saw he turned from Memeorandum's biggest fan to disillusioned with the site, and took that as evidence that there is a problem.

I have to admit I don't use Memeorandum regularly. But, when I have used it, I have been impressed with your ability to filter out spam and crap. You seem to include only top weblogs -- a good filter to start, though at the cost of the long tail -- and your algorithms generally appear to surface and prioritize good content.

So, I was basing my conclusions on Robert's post and his change of heart. If I misunderstood his reasons, I apologize.

I also apologize if my post gave the appearance of trashing your product to promote Findory. That was not my intent. I did not mention Findory in my post. I did not intend the post to be a comparison of Digg, Memeorandum, and Findory. I consider Findory to be different than Memeorandum and Digg, not a direct competitor.

Anonymous said...

Thoughtful response to Gabe's comments...

Gabe said...

Thanks Greg.

isb said...

I think Scoble's problem is with the "echo chamber" effect that is propagated by memeorandum. And that is not memeorandum's fault - as Gabe rightly pointed out, memeorandum is designed to capture *interesting* conversations across the blogosphere. It is not personalized (is it because you get lots of small clusters, Gabe? :) - and even that approach would lead to the problem of reinforcement in the long run.

Danny said...

Well, I'm still a fan, Gabe -- whether or not you show our stuff.

To be honest, I don't have Scoble's problem. I read my feeds AND Memeorandum both, which of course puts out its own feeds. Gives me the best of both worlds, stuff that Memeorandum might miss but also stuff I might miss in my feeds. Of course, I take the Topix search engine feeds and some other news sources, as well.

jeff.dalton said...

I read Memeorandum and my feeds via Findory favorites.

As Greg said, they are less competitors than two services that complement each other well. Although, they do compete for my time. Right now I would say they about equal on my time scale.

Arnie said...

Design for the detail (even if you don't implement in Version 1.0 -- you can't imagine every outcome, but one thing you can imagine is your web site being flooded by SPAM and JUNK.

The devil is in the details.

Paul Baclace said...

This idea of a social group lifecycle and how it applies to the Web is interesting.

Is there a kind of Woodstock versus Carnegie Hall tradeoff? Or is this more like urban neighborhood planning and transitions but with a faster timescale?

Does anyone have pointers to a long term discussion or search terms that identify current research in social group lifecycles, Internet crowd control, or group incentive design?