Monday, March 14, 2005

Amazon's Statistically Improbable Phrases

Amazon.com seems to be doing an experiment with a feature they call "Statistically Improbable Phrases". From their help page:
    "SIPs", show you the interesting, distinctive, or unlikely phrases that occur in the text of books in Search Inside the Book. Our computers scan the text of all books in the Search Inside program. If they find a phrase that occurs a large number of times in a particular book relative to how many times it occurs across all Search Inside books, that phrase a SIP in that book.
For example, for the business book "The Human Equation", the SIPs are "high performance work arrangements", "profits through people", "high performance management practices", "high commitment work practices", and "more cooperative labor relations".

Cute idea. Data mining in action.

Unfortunately, I'm not sure it is all that useful. The idea seems to be to help people discover other interesting titles that contain the same phrases. In my experiments with it, there were many clicks involved, too many spurious results, and too much work. But, it is a clever way of trying to expose more of the features of Search Inside the Book.

The feature may be in weblab, so I'm not sure everyone can see it, but, for those who can see it, it appears at the very top of book detail pages under the title and author.

Update: Mike at TechDirt reports that Amazon apparently forgot to implement a filter for naughty words and gives some examples of the amusing consequences.

No comments: