Wednesday, August 10, 2005

Mining the peanut gallery

I flew back from SES yesterday, so that means I got a chance to catch up on more of my reading on the plane.

Of the papers I plowed through, one of them is particularly fun, "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews" by Kushal Dave, Steve Lawrence, and David Pennock.

The goal of the paper is pretty ambitious: Take all the reviews out there for each product and summarize them. Tough problem. Nasty natural language issues here.

But the payoff is big. This is something that would be quite useful, especially if some method of determining the credibility or the authority of each review was part of the process. People need help them differentiating between the vast number of products out there. Summarizing reviews could be a way of providing useful information quickly, much more easily than reading each individual reviews.

One thing that's great about this paper is that they detail their search through many different approaches to the problem, some simple, some more complicated. It is interesting that some of the most effective methods turned out to be fairly simple.

Another fun thing about this paper is the authors. Steve Lawrence was one of the authors of Citeseer. Kushal Dave and Steve Lawrence are now both at Google. David Pennock was at Overture and now is at Yahoo Research.

By the way, this summarizing reviews idea reminds me a bit of Newsblaster, the research project at Columbia that tries to automatically summarize news articles from many sources. If you haven't seen that yet, it's worth checking out.

Update: Gary Price wrote me to let me know about NewsInEssence, a news clustering and summarization research project out of U of Michigan.


mb said...

I thought Google was doing something like this already with their movie search. Search for "movie: ishtar" and you'll see that Google's calculated an average rating of 1.5 out of 5. Search for "movie: gandhi" and you'll get 4.3 out of 5.

Movie ratings may be more structured than other reviews because the critic often gives stars or thumbs-up or something -- but if Google's not having humans summarize the reviews, then they must employ some kind of "fuzzy semantic typing" to make sense of it all.

There are other hints as well -- Google's internally-developed language translation (not yet available to the public), local search results with traffic-light ratings of restaurants, etc.

Greg Linden said...

Thanks, Mahlon. That's a great point.

I think the current movie search implementation is probably fairly simple -- just averaging star ratings for reviews that have them -- but you're absolutely right that it's an interesting first step.

Thanks again for pointing it out.

mb said...

I'm sure you're right -- Google's probably using the star ratings to summarize movie reviews.

And they may be doing something similar with restaurant reviews, but it's not quite as obvious. If you search for this sushi place, Google grades each review with a green/yellow/red traffic light -- but some of the reviews don't have a star rating, or if they do, they don't correspond to Google's traffic light.

Granted, it looks pretty simple, but there might be a little semantic magic behind the review summaries.

In any event, these unfiltered reviews are much more useful to me than bought-and-paid-for recommendations. I look forward to seeing more structured mining of the peanut gallery.