Friday, June 02, 2006

Cyc talk at Google

I have been following the Cyc project, an attempt to build and use a massive database of common sense knowledge, off and on for a decade or so.

So, I was excited when I saw a video of Douglas Lenat's recent talk on Cyc at Google, "Computers versus Common Sense". It is long, but it is full of interesting examples, definitely worthwhile if you have any love for geeky AI stuff.

If you are already familiar with the Cyc project, you still might want to check out the talk at 31:30 and 41:00.

At 31:30, Douglas talks about how they deal with things being partially true or conflicting assertions in the database. They do not use probabilities, but instead allow statements to be consistent in local contexts while inconsistent globally. They also almost never use a formal theorem prover, instead preferring a large set of much faster heuristics for reasoning.

At 41:00, Douglas talks about automatically learning knowledge from the Web. Douglas argues that understanding the natural language text on the Web requires starting with a large handcoded database of common sense knowledge. After building that seed database manually, it can be used to automatically extract additional knowledge from the Web.

On needing a manually constructed seed database of knowledge, I suspect fans of statistical NLP might be quick to disagree. But, Douglas did have some compelling cases that would trip up statistical techniques. For example, no one on the Web ever writes that water runs downhill, but they do write that water runs uphill (as a metaphor).

If you are interested in more, you might check out OpenCyc and the Cyc publications.

See also the Verbosity project (PDF) that I mentioned in an earlier post.

See also my previous post, "AI and the future of search".


Anonymous said...

The IT Conversations podcast/show had a great debate a month or two ago between Ray Kurzweil and Susan Greenfield. Toward the end of the debate, Greenfield makes the excellent point that modeling really tells you nothing new about the problems you are working on. You have to know what it is you are looking for, before you see what it is you are going to find. I am doing a horrible job paraphrasing her, but that is the basic jist.

I think this is related to the statistical NLP issues raised here. You can't just count things. You have to know why it is you are counting them, and how it all fits together, before you've even done the counts. And that is where something like Cyc or some other type of knowledge modeling comes in.

Anyway, I'll give it a watch.. and I'd recommend you give that debate a listen, too.

Anonymous said...

Excellent - thanks for pointing this out. I lost track of the Cyc project in the last 5-10 years. I remember first reading about Lenat in the 1980's - I figured that by now (post 2000) my flying car would be powered by an interactive computer running a relative of Cyc. Alas, we continue to underestimate the complexity of intelligence.