Friday, February 01, 2008

The brain as a spam filter

Over at the Scientific American Mind Matters blog, Andrew McCollough and Ed Vogel from U of Oregon posted a thought-provoking article, "Working Memory: They Found Your Brain's Spam Filter".

Some excerpts:
Our mental 'inbox' of working memory is ... constrained ... Several decades of research have indicated that our capacity to hold information "in mind" for immediate use is limited to a mere three or four items.

There are at least two primary explanations for this severe limitation in working memory capacity. First, it could be that working memory capacity is essentially determined by storage space, and that some people have larger "hard drives" than others do.

The alternative explanation is that capacity depends not on the amount of storage space but on how efficiently that space is used. Thus high-capacity individuals might simply be better at keeping irrelevant information out of mind, whereas low capacity individuals may allow more irrelevant information to clutter up the mental inbox. High-capacity individuals may just have better spam filters.

Some of our recent work has provided evidence favoring this mental spam filtering idea. In one experiment... high-capacity people were excellent at controlling what information was represented in working memory: They let in information about relevant objects but completely filtered out irrelevant objects. Low-capacity individuals, by contrast, had much weaker control over what information entered the mental "in box"; they let in information about both relevant and irrelevant objects roughly equally.

Surprisingly, these results mean that we found that low capacity people were actually holding more total information in mind than high capacity individuals were -- but much of the information they held was irrelevant to the task.
More information is not better. To be most productive on a task, we want to maximize our ability to filter for relevant information, not maximize our ability to acquire information.

This result is not surprising, but it does nicely frame the problem for those of us working in information retrieval. Not only is precision more important than recall, not only should we help people filter data and focus their attention, but also we may want to explicitly help people form and retain a working set of knowledge to apply to their task.

For example, search engines are increasing their support for re-finding, helping people remember and return to information found in the past. This usually is done as a web history, not an explicit representation of a past working set of knowledge, but it does help people build a working set, get back to items that may have dropped out of their working set, and swap back in the working set for an old task to which they are returning.

More generally, I think it is useful to think of searchers as skimming and filtering the information on pages as they try to build a small set of relevant information for their task. This may suggest methods we might consider to filter and help focus attention, such has highlighting parts of a page that are particularly likely to be useful, explicitly attempting to determine what may be distracting on a page for specific types of tasks and reducing those distractions, and carrying information and history across pages as people work on their tasks.

[SciAm article found via Mark Thoma]

6 comments:

Anonymous said...

Wow, that's really interesting.

Did you ever meet people who take great offense at what we would generally think of as normal or even trivial kinds of day to day issues and problems? Quite often they can recount the entire chain of events which led up to these things in excruciating detail while you, as as listener, are thinking "You got all worked up about that? I'd barely remember it!"

Perhaps there is a reason that some people are thicker-skinned than others.

Anonymous said...

Hmm.. I think I agree with your conclusion, but not the route you took to get there :-)

More information is not better. To be most productive on a task, we want to maximize our ability to filter for relevant information, not maximize our ability to acquire information.

Well, actually, it's a mixture of these two things. We want to maximize our ability to acquire relevant information, right?

Not only is precision more important than recall, not only should we help people filter data and focus their attention

I don't quite see how you get to this conclusion from the previous idea. Both precision and recall are still important. E.g. suppose you only have 4 brain slots to fill with information, and you do a search.

Would you rather have a search system that returned only 2 relevant documents, and nothing else? Or would you rather have a search system that returned 3 relevant documents and 1 non-relevant document? The former is more precise, but has lower recall. And vice versa.

Given that you can always ignore the non-relevant document returned by the search engine, I would prefer the engine that gave me 3 relevant documents. Especially since I probably have a better chance from that point forward in finding even more relevant information, because I now have 3 examples of relevant information rather than 2. Precision is not necessarily more important than recall.

but also we may want to explicitly help people form and retain a working set of knowledge to apply to their task.

This is the crux of what you are saying, I think. Can I ask you: Should this working set of knowledge be implicitly represented, i.e. embedded in the search engine as some sort of parameter weight, or explicitly represented, i.e. shown to the user in a way that the user can interact and manipulate and give feedback on that worked set of knowledge?

The former is the personalization approach that you often advocate. The latter is the "tools-based" approach I'm always blathering on about. You say:

More generally, I think it is useful to think of searchers as skimming and filtering the information on pages as they try to build a small set of relevant information for their task. This may suggest methods we might consider to filter and help focus attention, such has highlighting parts of a page that are particularly likely to be useful, explicitly attempting to determine what may be distracting on a page for specific types of tasks and reducing those distractions, and carrying information and history across pages as people work on their tasks.

This seems to favor the explicit approach, the "tools" approach, rather than the personalization approach. Am I correctly characterizing what you are saying?

Greg Linden said...

Hi, Jeremy! Absolutely, I think this favors a combination of the tools approach you advocate and the personalization approach I'm always blathering on about.

In particular, the tools will have provide substantial search functionality beyond existing interfaces, but also pay attention to individual searcher's interests and history.

Do you agree?

Anonymous said...

Oh, I absolutely agree. I think without the algorithmic intelligence on the back end, the whole point of search is lost. And I see nothing wrong with tying it to searcher history. Frankly, I believe that recent searcher history is more important than long term searcher history, but I don't have the numbers to back that up, off-hand. But either way, I'm definitely not against what you're proposing.

I'm just saying that your sort of blathering will probably be most effective when coupled with my sort of blathering, so as to give the searcher an overall sense of what the back end algorithm understands. And allow the searcher to modify the algorithm's understanding. It is the black box that I object to, not the algorithmic back end.

The first part of that is key to me, though. Offering explicit tools doesn't just give the searcher an opportunity to give feedback to the algorithm. It also gives the algorithm a chance to give feedback to the searcher! The searcher can actually see how well the algorithm has understood his/her intent, by seeing the information actually populating the tools.

Anonymous said...

Just a quick followup: This is something I think that gets lost in a lot of the A/B testing, click-stream only testing, that happens on the web. There is this feeling that something on the web is only useful if someone clicks on it, or otherwise performs some sort of action.

But tools can be just as useful even if they never get used, because they serve as view ports into the black box, as feedback from the algorithm that the algorithm is understanding the user's intent.

Greg Linden said...

Great point, Jeremy. I think instant answers and snippets are a good example of that. In some cases, people can get the answer they need right from text on the search result page, no clicks, no actions.