Comments on Geeking with Greg: Book review: Introduction to Information Retrieval

I see that Blogger mistakenly truncated some of my...

2009-02-10T16:56:00.000-08:00

I see that Blogger mistakenly truncated some of my reference/pdf links, above. Thanks, Blogger.

If anyone wants, I can repost those links, wrapped inside of an anchor tag. Otherwise, I won't worry about it.

The point is, lots of early IR writers talk about all sorts of evaluation beyond just recall and precision.

And some of the evaluations that they mention still aren't being done by web search companies; or at least that's the way it appears from the outside.

Here is one more early IR paper by Karen Sparck-Jo...

2009-02-09T16:40:00.000-08:00

Here is one more early IR paper by Karen Sparck-Jones, discussing a number of evaluation measures beyond just precision and recall:

http://www.sigir.org/museum/pdfs/Information_Retrieval_Experiment/pdfs/p256-jones.pdf

I'd like to point out, pdf-page-3 of that document, that "overall efficiency" is one of the proposed metrics, and "number of searches made" is one of the ways of evaluating overall efficiency.

Currently, it seems that most web IR systems have a built-in bias or expectation that the correct way to deal with more comprehensive scale information needs is to have the user manually enter query after query after query.

However, the time spent searching, Karen Sparck-Jones suggests, should not only be measured in terms of how quickly the engine responds to any one query, but the total amount of time (and effort) it takes to issue all the multiple queries, until the user is satisfied.

Maybe someone with more firsthand inside knowledge can debunk me, but I see very little being done, comparison-wise, in terms of the tradeoff between single-query, single-iteration response time, and total number of search queries issued. It seems like most search engines model these as separate factors, when in fact they are joint factors.

Here is another good reference. See (pdf=page6 ,d...

2009-02-09T15:26:00.000-08:00

Here is another good reference. See (pdf=page6 ,document=page110) of the following document:

http://www.sigir.org/museum/pdfs/Information_Retrieval_Experiment/pdfs/p105-lancaster.pdf

Also check out the 2-3 paragraphs following this long list of user evaluation criteria. I find it particularly interesting that the "ease of making one's needs known" is an important factor. Also interesting is a discussion of the various type of information needs, ranging from known-item and fact-finding information needs (1 and 2) all the way to literature searches and alerting services (3 and 4). In the comprehensive literature search department (which I think also includes information needs as diverse as planning activities for one's vacation and exploring a genre of music) there is an acknowledgment that the user is willing to make a speed vs. quality tradeoff, i.e. that the user will usually be willing to sacrifice a little bit of response time in exchange for results which are of marked better quality.

One of my ongoing frustrations with web search engines is that they do not allow me to make that tradeoff. Every single search is done with the same amount of focus on speed, with no ability for me to specify the fact that I want "deeper" quality links and am willing to wait 2 seconds, 10 seconds, or even a whole minute, to get them.

Realistically, I find it very hard to believe that every single search has the exact same speed requirements. It's very limiting to only design your system to that one end goal.

Greg: Here is one of the reports that I had mentio...

2009-02-09T14:16:00.000-08:00

Greg: Here is one of the reports that I had mentioned. It's Cranfield III. I know there are other mentions of evaluation measures beyond just recall and precision in the early days of IR, but it's been about a decade since I read those papers. So I haven't been able to re-find the best ones yet. I know there is another paper out there that goes into much more detail that the one I am about to cite.

So check out page 4 of this PDF. Note that in addition to relevance, Cranfield mentions time/speed, presentation/layout, and amount of effort required by the user to construct a useful query.

http://www.sigir.org/museum/pdfs/Factors%20Determining%20the%20Performace%20of%20Indexing%20Systems%20Vol%201%20-%20Part%201%20Text/pdfs/p1-chapter_1.pdf

So these factors do go back to the early days of IR.

The factor that I personally have been very interested in for quite some time is the "amount of user effort" metric. There are philosophical approaches that say it is up to the user to repeatedly reformulate their own queries, when things don't work. The user then has to make the additional mental effort to come up with a bunch of various strategies. There are other approaches that try to assist the user in query reformulation, results understanding and summarization, etc. These are attempts to reduce the amount of user effort.

How one measures the effort, and how one compares two different information seeking approaches, is very much an open problem, I think. But from an early day, the recognition was there, that these other factors needed to be considered.

@matthewhurst: The book is titled "Introduction" t...

2009-02-09T11:40:00.000-08:00

@matthewhurst: The book is titled "Introduction" to Information Retrieval. I think that temporal retrieval is a more advanced topic. For that matter, the book probably also does not cover other, more advanced non-textual types of information retrieval, such as image retrieval (whether content-based or textual context-based) and music retrieval.

It's an important thing to keep in mind, and that's something I think most people tend to forget: Information retrieval does not just mean "text" retrieval or "web" retrieval.

But those are, again, more advanced topics, and probably not covered by this book.

The key utility measure is user happiness...This i...

2009-02-08T17:08:00.000-08:00

The key utility measure is user happiness...This is a point that is often missed in information retrieval.

In the original Cranfield experiments (1960s), didn't Cleverdon propose 6 different metrics of information retrieval effectiveness? Relevance, and the associated measures of recall and precision, was just one of those 6 metrics. The other 5 metrics included things like speed, coverage, etc. It's been a long time since I read that report, so I don't remember offhand. But all these measures have been part of IR from the beginning.

Historically, however, getting good relevance was relatively much harder than getting a good user interface. So if what you are saying is that most IR research has concentrated on algorithmic relevance, rather than UI, you're probably right. But nowadays the more interesting UI research out there is not just on "layout" and "clarity" of the results. It's on the input side of things. Often a single-line input box, with two buttons (search, and "lucky") does not translate to eliciting the best query from the user. Interactive search, query as a dialogue, etc. are becoming more necessary to ensure user happiness these days.

Stateless (aka typical web search), or state-filled but non-transparent (aka recent history personalization) might yield decent results in terms of relevance. But I often find myself frustrated at my inability to get insight into why the results are being returned and, more importantly, what I can do to overcome/change a poor result. I have no idea how that sense of user frustration is measured, in a web environment, but I'll bet it is there. Interfaces that offer more transparency into the search process are more important than strictly relevance alone.

Take a look at the recent HCIR workshops, which are trying a little harder to explore the overlap between IR and HCI, and get at some of these issues.

Thanks for the great post. I recognise a lot of wi...

2009-02-08T08:07:00.000-08:00

Thanks for the great post. I recognise a lot of wisdom there, and learned a few things too!

Greg - This is a no-brainer to buy. One question b...

2009-02-07T18:59:00.000-08:00

Greg - This is a no-brainer to buy. One question before I do: does it address temporal issues of IR (i.e. what happens when static relevance collides with the timely nature of corpora like the blogosphere and other pieces of social content)?

Thanks for the great writeup with useful pointers ...

2009-02-07T13:52:00.000-08:00

Thanks for the great writeup with useful pointers from the book. I really liked Manning and Schütze's "Foundations of Statistical Natural Language Processing", and I'm happy I bought this latest book. I have no formal IR training, but I thought this book was very approachable from a general CS and Comp. Ling. background.

Thanks for taking the time to do the writeup; the ...

2009-02-07T12:35:00.000-08:00

Thanks for taking the time to do the writeup; the book looks quite interesting!