Friday, August 17, 2007

Image search to solve hard problems

Alyosha Efros from CMU gave a fun Google Tech Talk, "Using Data to "Brute Force" Hard Problems in Vision and Graphics", with some clever ideas on using large image databases to solve hard problems.

The examples I enjoyed the most started about 20 minutes into the talk. The first, object insertion, looks at the problem of trying to add people or objects to a picture.

Rather than attempting to model the object to be inserted and then adjust the perspective, scale, and lighting, Alyosha suggests we change the problem to finding an appropriate object that already has the right perspective, scale, and lighting. That is, rather than take a specific picture of a man and try to adjust it, search a massive image database to find some picture of a man that is already properly adjusted.

This only works if you have a massive image database and good odds of finding a strong match, but we do now have massive image databases. These databases are only getting bigger with time, improving our odds of finding a good match and making this brute force approach appear even more promising.

Another example discussed in the talk, scene insertion, is a similar problem to object insertion, but at a larger scale. We are no longer just adding objects, but taking out whole chunks of pictures (e.g. construction equipment, roads, buildings, etc.) and creating a new picture by filling in the deleted data. In tools now available, this is done using texture fills, but that works poorly for large deletions.

Alyosha proposes a way of attacking the scene insertion problem where they search a large database for similar images. If they are trying to replace part of a picture that is mostly a beach scene, for example, they search our massive database to find similar beach scenes, then steal the missing chunk from the related scenes.

The talk is enjoyable and light with plenty of pictures of both good and bad examples of what happens when you try to apply this technique. Well worth watching.

See also my Oct 2006 post, "The advantages of big data and big clusters", where I quoted Googler Peter Norvig as saying, "Worry about the data first before you worry about the algorithm."

2 comments:

Ricardo N. Cabral said...

Pretty interesting indeed. After watching it my mind wandered trying to come up with another problem domain where such technique would make more sense and have more end-user appeal, after all, it's not everyday that you go and decide to remove a piece of your photo and want to know what best fits it.

Greg Linden said...

I'm not so sure, Ricardo. I found the examples in the talk to be useful and compelling.

I suspect this would be a pretty popular feature in Photoshop, not only for people editing and cleaning up personal photos, but also for advertisers and others creating images for commercial purposes.