Saturday, October 21, 2006

Reddit, Digg, and personalized news

Emre Sokullu and Richard MacManus at Read/WriteWeb wrote an article, "Personalized News: A Market Overview", that covers several startups trying to recommend news articles and build personalized front pages for readers.

Some excerpts:
Our guess is that personalized content will become a more popular paradigm in about 1 to 2 years.

Personalized news has a couple of main attractions. Theoretically, if your news is personalized then it's not as vulnerable to gaming as [Digg's] power of masses approach. Plus people are getting busier everyday, so personalized news has a strong appeal as a potential solution for information overload.
The article mostly talks about Reddit. I found this a bit odd, since Reddit strikes me as closer to Digg than a personalized news site, but Reddit does have a "recommended articles" page off their front page.

As the article explains, Reddit appears to use a keyword-based approach like the Bayesian filter Paul Graham developed for spam filtering. I doubt that this simple content-based approach can be made to work well, And, unfortunately, as Emre said, "many [Reddit] users still complain about not receiving relevant news recommendations."

I went back to try Reddit's recommendations again -- I hadn't looked at it in a while -- and, after rating a few articles, it still put up a message that "no links have been recommended for you yet. keep telling reddit what you like and dislike by voting on links, and check back here later for recommended links." I never got to the point where I actually received recommendations. That is a problem. Recommender systems need to work from sparse data in real-time. They need to react immediately and instantly to new data.

After rating ten articles, mostly about Google (on GDrive, MapReduce, Sawzall, BigTable), on Reddit, I finally received one recommendation, "Fun with Javascript". After a few more minutes, I received a few more recommendations, one a political article about stem cells, one on messing with telemarketers, and one on the "Father and Son story of the century". So, yes, I agree, not relevant.

Emre suggests that all personalized news sites are like Reddit. Findory does not use Bayesian analysis over keywords. Instead, Findory uses a form of social filtering where Findory readers anonymously and implicitly share the articles they find and enjoy with other Findory readers.

Oversimplifying a little, Findory works a bit like Digg except that rather than seeing a front page of the generally most popular articles, you see a front page of the articles that are most popular for readers like you. As Emre said, different lists for different people reduces the incentive to game the system by eliminating the winner-takes-all effect.

In general, the power of the masses approach, epitomized by Digg, has two problems with relevance. First, a most popular list is generic and untargeted; it is only relevant as long as your interests match those of the entire community. Second, as power of masses sites reach a mainstream audience, the incentive to spam grows and relevance drops. Personalized news has neither of these issues.

Finally, it is worth noting that personalized news is not limited to crazy little startups. Both Google News and Microsoft's MSN NewsBot have a small widget on their front pages that recommend news stories based on the articles you read.

4 comments:

Meme chose said...

There is another side to personalization which I rarely see mentioned - a downside.

When I see results on Google for example there is information contained in the fact that the page I am seeing is the same page almost everybody else sees in response to the same query. The presence of a link on the first page tells me something about the distribution, and suggests something about the popularity of that link. All of this is lost the moment I am being shown a personalized page, because it is hard to see how personalization, if complex enough to work effectively, can ever be transparent. The way search connects us instantly with an entire shared culture in this way is novel, valuable and under-appreciated; personalized search throws this away.

The window offered by generic search onto what other people think is important is also something I think ordinary, non-technical people intuitively use and want. Consequently I expect the future of personalized search will be as a selectable feature add on to generic search.

Greg Linden said...

That's a good point, Meme Chose. I have also heard some talk about how it may be desirable to be able to e-mail friends your search terms and have the other people see the same search results you saw.

However, if people really did want to see the popularity of links in search results, I would expect search engines that order by popularity, like DirectHit did, to be much more popular. DirectHit failed to attract a big following, which I think raises some questions about the effectiveness and level of searcher interest in seeing the generically most popular search results.

Anonymous said...

"When I see results on Google for example there is information contained in the fact that the page I am seeing is the same page almost everybody else sees in response to the same query."
But it is, in fact, NOT the same page. Its not even typically the same wording for the article. While most (many) news articles seem to be based upon a newwire publication there are subtile differences, views and content can be different. 10 people telling the "same" story need not tell the same story.
I, for one, am not satisfied reading one view, one approach, one angle on a story but want to see the larger picture.


"The presence of a link on the first page tells me something about the distribution, and suggests something about the popularity of that link."

Popularity is not and should never be a guide to the news. Citation and links I don't think are terribly good measures of either data quality, trust or even importance. Its like holding a microphone to its loudspeaker: it squeels. People link to top ranked sites. This has demonstrably lead, in the Web arena, to an overrepresentation of some marginal positions as computer programs can't distinguish between positive links ("this site is good") and negative ones (look at the lies).

Link counts are popular because they work well to help find anything about something and are computationaly very cheap. They let one pre-sort results which saves a lot of effort and processing power. They also let one use simpler search algorithms that are also very fast (b-tree). Their social costs, however, are high.

Link counts not only misrepresent relevance but they also appear to have a significant impact on content and available information. Since most people are directed to the few highly linked and known sites, the rest remain invisible and unsustainable. The impact of the reliance on links in popular search ranking has been to limit dicussion and concentrate media control in the hands of a few. The information is maybe still there but its hidden since their voices are drowned out by the squawk of the popular highly linked sites.

If all you are interested in are the top popular stories as told by a single voice then you don't need to use the Internet and could limit yourself to a single popular boulevard newpaper.

But reading a "top story" in a popular boulevard newpaper is not enough to be informed.

My approach in http://www.ibu.de has been to view the predicate "personal" as in "personalized" news as routing and filtering. Its about finding and discovering.


E. Zimmermann
IBU News Germany/
NONMONOTONIC Lab of BSn
.

Unknown said...

You are correct Greg, as far as the recommendations for reddit go, there is not much to behold (the recommendation engine is notoriously crappy). But did you try to use the sub-reddits? You can add and remove categories from your front page, for example, I've got politics, funny, science, etc... added to my front page but I blocked out religion so I will see nothing in that category. There are tons of sub-reddits as well, from broad-ranged topics to more particular subjects. In my opinion, this is plenty of personalization to get just the news you want.