Behavioral measures, in my opinion, almost always ...

2008-11-12T10:03:00.000-08:00

Behavioral measures, in my opinion, almost always beat less specific "opinion" measures. A good deal of psychological literature has suggested that reported intentions are only moderately correlated with future behavior and social performance pressures can affect how you rate something. Additionally, from a statistical power point of view, a within subjects design (same user gets both versions) is superior to a between subjects design. When the difference between an A and B test is not overtly disruptive to the user experience (as with ordering results) it makes sense to leverage the power advantage you get. It's a more elegant experimental design.

There seems to be a trend moving away from "rate h...

2008-11-12T06:14:00.000-08:00

There seems to be a trend moving away from "rate how much you like something" to "pick which one you prefer." Machine translation evaluation has been moving towards this too. I wonder if we'll see a variant of Netflix where we don't rate movies 1-5 but instead do "hot or not" between pairs of movies. Is someone already doing this (aside from Hot or Not of course)?

Comments on Geeking with Greg: Testing rankers by interleaving search results

Behavioral measures, in my opinion, almost always ...

There seems to be a trend moving away from "rate h...