Comments on Geeking with Greg: KDD talk on the Future of Image Search

Greg, I'd love to hear your thoughts on the face-r...

2008-09-04T17:56:00.000-07:00

Greg, I'd love to hear your thoughts on the face-recognition feature Google released this week in their PicasaWeb app.

Works amazingly well for me, distinguishing between members of my family with similar facial features. Works regardless of lighting, hairstyle, hair color, hats, 3/4 profile, etc.

Yes, my mistake. The model has 30k, not every sin...

2008-09-02T10:46:00.000-07:00

Yes, my mistake. The model has 30k, not every single image.

But yes, every single image still requires more than the couple of dozen tags than you get using the ESP game to label and image. Especially if you want any sort of granularity at all.

And you absolutely need this granularity. I just did a search for "fish" on Flickr. There are 2.29 million images with this tag. And this isn't even the whole web!

I would agree with Ricardo save for one point: whi...

2008-09-01T08:09:00.000-07:00

I would agree with Ricardo save for one point: while Numenta appears to have a handle on the "partonomy" part of the equation (with their concept of "invariant representations"), the query model for an HTM is a big mismatch with current Web search query interfaces. To query an HTM, you need to give it some form of the thing you're looking for (in part, in whole but deformed, etc). This would seem to be at odds with the keyword-based search that is most popular today and would be a significant hurdle into introducing HTM-based image search for general purpose consumption on a site like Flickr or Google. I remain hopeful, however, as some of the results people are getting with Numenta's stuff are encouraging.

Hi, Jeremy. Just to clarify, I meant that Jitendr...

2008-09-01T07:45:00.000-07:00

Hi, Jeremy. Just to clarify, I meant that Jitendra said 30k parts in the visual model (across all possible images), not per image, but I don't think that detracts from your point.

I'm totally pro-content-based methods. In princip...

2008-08-31T18:43:00.000-07:00

I'm totally pro-content-based methods.

In principle, tagging can solve many of these issues. In practice it never will. That's because the effort required to tag images with the intentionality or the granularity that you need will never outpace the rate at which images are created.

Suppose for example that someone does tag a photo with the tag "fish", and you are looking for photos of "fish". Well, what if you also want to specify type of fish? Or color of fish? Or orientation of the fish within the image? Or the thinness or the fatness of the fish?

All of these things are possible to describe in tags. But I seriously doubt that most, if not asymptotically all, photos in the world will ever get that many tags applied to them.

I disagree with Jitendra that more tagging will not solve this problem. I think that more tagging *would* actually solve this problem. The issue, though, is whether you'll ever get enough effort to actually annotate every single image with all these tags.

Again, the answer is no, simply because all those attributes of the image are too far down the "long tail of effort", meaning that you'll never actually get enough people describing enough things about the image, for enough images.

What does Jitendra estimate, 30k attributes PER IMAGE? Let's suppose it is even 500, rather than 30,000. You'll never get that many tags for most images.

So the net effect or outcome is the same.. tags won't get you there.

The rate at which the number of photographs in the world appear far exceeds the rate at which new taggers appear, or even the rate at which old taggers continue to play tag labeling games.

You need content-based methods.

Stuff that Numenta (http://www.numenta.com/) has b...

2008-08-31T07:01:00.000-07:00

Stuff that Numenta (http://www.numenta.com/) has been doing with hierarchical temporal memory seems to me the way to go for human-like vision/object recognition