Thursday, March 31, 2005

A relevance rank for news and weblogs

If you use a feed reader like Bloglines, there must have been at least a few times you've looked at the overwhelming pile of unread articles with a sigh. So much to read.

All feed readers organize the articles in the same way. They group the articles by feed and sort the articles by date. So, you go through, click on each feed, skim the articles, and slog on through.

"Wouldn't it be nice," you've probably thought, "if these articles were sorted by relevance? Maybe the most important articles at the top and least important at the bottom? Then I could just read the articles from top to bottom, stopping when I get bored or run out of time."

That would be nice. But, what does it mean? What's the most relevant news?

Let's explore it. What if all the articles from the news and weblog feeds were sorted by how many people read them? The more people have read an article, the higher in your list of unread articles.

Hmm... That might help, but it'd be ordered by popularity, not relevance. Yahoo News has an example of ordering news by popularity. You can see that it tends toward the sensationalistic and tabloid. It pulls you toward the mainstream and away from the long tail. That's the wrong direction, folks. You want interesting and useful, not bland and mediocre.

Okay, if it's not most popular, what is the most relevant news?

Maybe the problem is that we're defining popularity too broadly. Does it matter to me if a teenage surfer chick thought an article with rumors of Britney Spears' pregnancy was really awesome? Not in the slightest. Does it matter if one of my computer geek friends really enjoyed an article on the upcoming MySQL 5 release? Yes, that does matter.

So, perhaps relevance is what people like me like. Okay, so I'll just list hundreds of people I know who are like me, get them all to use the same feed reader, and then... oh, shucks, that's never going to happen, is it?

Fortunately, it doesn't have to. We can find people like me, people I don't even know, automatically and anonymously using some clever algorithms. Put that computer to work, I say.

Great! Now we know how to sort news by relevance. We take all the news and sort by what people like me like. So, why isn't anyone doing this? Well, someone is doing it -- and doing it quite well, I might add -- but why isn't anyone else?

Well, it's hard. Really hard. Maybe I made it sound easy, but the devil is in the details. For example, the most interesting articles for a subgroup isn't actually the same as the most popular; it's a little different, and that's just one of tens of spots where you can trip up and hork the quality of the relevance rank. These "clever algorithms" I mentioned can be really expensive; doing this at scale for millions of readers requires a lot of careful thought. News is perishable -- old news is no news -- so you better find a good solution to the cold start problem. And the list goes on. It's not easy.

But it's got to be done. It takes too long and too much effort to use the current generation of feed readers. To break into the mainstream, next generation feed readers will have to sort articles by relevance.

2 comments:

Andy Harbick said...

I've got bloglines pointed at my personal feed of technology blogs. It's great. I often discover things that I would've seen eventually first on my findory feed.

Anonymous said...

Hi Greg.

You might find outbrain.com interesting. They have a FireFox extension that lets readers rate blog posts on the fly and then they have some clever algorithms to decide what's interesting, current and highly rated.

I can imagine that this, combined with clever tag or index based filtering could be the thing you're talking about.