Thursday, April 27, 2006

Finding and discovering

A Jan 2006 CACM article out of Microsoft Research offers a nice overview of the Stuff I've Seen project.

One particularly interesting part of the article explores how personalization complements search. Some excerpts:
Implicit Query ... analyzes the email message the user is looking at and extracts important words from the body, subject, sender, and recipient fields. These words are automatically used in a query ... and the results are shown as a side panel attached to the current message.

We thought IQ would be helpful in sparing users the effort of generating queries, and indeed it is.

But many people have also reported an unanticipated benefit of finding information, especially when they completely forgot they had anything related and would never have generated an explicit search on their own.
Search helps when you know what is out there and can easily say what you want.

Personalization helps when you do not know what is out there. Personalization surfaces interesting items you did not know about and may not have found on your own.

Search is finding. Personalization is discovering.

3 comments:

jeremy said...

Woah woah woah. I do apologize for being such a stickler.. but personalization is about the last thing that those authors are talking about in this CACM article. Take a step back and look at the theme of the rest of the CACM issue. It is all about exploratory search, revealing hidden meaning and structure in collections that one is searching, so that the user can gain a better understanding of what is being searched, and choose for themselves the information they want to find out more about.

In other words, this entire CACM issue is about tools.

In this MSR article, they talk not about personalization, but about approaching the searching and organization of one's personal data using the tools-based mindset. Look at the last sentence of the 2nd paragraph:

"Since personal collections are often stored locally, highly dynamic interfaces can be used to help users quickly iterate queries and explore their content"

What are "highly dynamic interfaces" used to "explore..content", if not tools!

With all respect, I think you're mixing the understanding of personalization with the searching of (and query extraction from) personal data. The former is an algorithmic approach. The latter is the type of content to which something is applied.

Correct me if I am wrong, but personalization is defined as automatically selecting or preferring certain types of content over others, based on past user action and behavior...and doing so in a way that is opaque to the user...or at least requires zero additional effort from the user, as you and Marissa Mayer are fond of saying.

This MSR article is talking about the exact opposite opposite of this. It is talking about coming up with tools that allow the user insight into the way the collection is being searched, and letting the users choose for themselves. Tools.

Or as the MSR authors themselves put it: "In many ways this [system that we are describing] is similar to the category interfaces described by Hearst in this section, providing an organizing context for results and future queries".

Now go read Marti's article: "Information seekers often express desire for a user interface that organizes search results into meaningful groups, in order to help make sense of the results, and to help decide what to do next." Bingo, tools!

Reminds you of Ask or Vivisimo. Tools. Not personalization.

Greg Linden said...

Jeremy, the Implicit Query work is personalization. It generates searches automatically and implicitly based on user behavior.

You are right that much of the rest of the work in the Stuff I've Seen and related projects is on experimental user interfaces to help people sift and sort through their data. That is cool stuff, I agree, but it is different than Implicit Query.

By the way, Jeremy, I still do not know who you are. I would appreciate it if you could use your full name in your comments so I could know who I am talking to.

jeremy said...

Greg,

Whups, let me first start off by saying that I initially responded to the wrong article. As you clearly said (and I clearly spaced on), you were talking about the Jan 2006 CACM article. I had mistakenly skipped to the Apr 2006 CACM article, also from MSR. So my responses above were somewhat off-base.

That said, I still don't think that "Implicit Query formulation" is, by itself, "personalization". If I understand it correctly, implicit querying is a two part process. Part (1) simply means using the text of whatever page/document/etc one is currently visiting in order to retrieve more information related to that page. Part (2) is that the querying itself is done not against the entire web, but against a user's own SIS (stuff I've seen) collection.

Right?

Well, let's look at the two halves of this equation. What makes this "personalization"? Is it the fact that there is automatic query extraction from one's current context? Not necessarily. One can do this without knowing anything about the user at all. In fact, the automatic query extraction bit is simply shorthand for full document to full document similarity, which can be done by any number of TDT systems, for example. And TDT is not "personalization". Querylessness by itself does not make it personalized.

How about the second half.. the collection against which one is querying. If it is one's personal browsing history that is being search does that make it personalization? Not necessarily. Again, I draw a big distinction between retrieval on a collection of personal information, versus using personal information to automatically and without user intervention filter the results of stuff the user has not yet seen. That is the sense of the word that I get from Google. When Google talks about personalization, they're talking not about searching one's own history, but about using one's history to filter and/or bias NEW things that one has not yet seen. Natch?

Now, it is true that one could use one's search history to bias the term vectors in one's current document context, and thus bias the queryless formulation of the query, which then gets used in searching SIS. But of those three steps, the only "personalization" is the history-based biasing. Neither the queryless querying, nor the personal history searching, nor even the combination of those two, is itself personalization.

And along those lines, I don't think that you can say that "search is finding; personalization is discovering", as you mention above. Personalization is an automated technique for information filtering or biasing, one over which the user has little to no control. "Exploratory search", on the other hand (see CACM April 2006) is about discovering. It is all about coming up with the tools that allow a user to develop intelligence insights into collections with which they have contact.

Ultimately, as we've discussed before, I think the best methods are those that will combine personalization with tools. There is room for both. It's just my own personal bee-in-a-bonnet to be absolutely clear about what each one is, when we might want to use it, and what it is doing. So as usual, I think we very much agree on where this should all be going.. its just a matter of coming to an understanding on what will get us there.