Thursday, May 10, 2007

Powerset CEO talk at UW CS

Barney Pell, the CEO of Powerset, gave a talk at UW CS on natural language search. Video of the talk now is available.

I attended the talk. I was hoping for details on Powerset's technology and a live demo, but, unfortunately, the talk was much higher level than that. It mostly covered motivation for natural language search and why the market timing was right. I have to say, it had the feel of an investor pitch.

The most compelling part of the talk for me was when Barney was talking about the value of NLP for extracting additional information from a small data set. For example, Barney compared the performance of Powerset's alpha product running over Wikipedia with Google limited to searching over Wikipedia on several questions (e.g. "Who did IBM acquire in 2003?" and "When did Katrina strike Biloxi?").

On the one hand, these examples might not be fair to Google, since Google gains its power from its massive index; Google is crippled by not allowing it to reach far and wide to answer questions. On the other hand, there are many applications where the all that is available is a small data set (e.g. newspapers, health, product catalogs), and there is considerable value in those problems of maximizing your understanding of that data.

The least compelling part for me was the hyping of the technology Powerset licensed from Xerox PARC, especially when Barney appeared to suggest that this technology means NLP is largely a solved problem:
The fundamental problems we were really worried about -- you know, problems like how do you deal with ambiguity, how do you deal with open vocabulary, how can you be robust in the face of noise and erroneous things, how can you be applied to multiple languages and these kind of things, how can you be computationally efficient at all -- took a really long time and, while they are not just all completely done, the fundamental challenges that they had seen for all that time were basically resolved.
It would be nice if the fundamental challenges in NLP were basically resolved, but I do not believe that is the case.

I do agree with the motivation behind Powerset. Especially for verticals, better understanding of smaller data sets would be useful.

I also agree that bloating indexes with data summarizing NLP extractions is a promising approach, despite the x100 longer index build times and x10 increase in index sizes that Barney said may be required. Computers are more powerful and massive clusters are becoming cheaper to acquire. The computational power to do these tasks is at hand.

I am not sure I agree with Barney when he said a linguistics approach to NLP is more likely to bear fruit than a statistical approach. More thoughts on that in my previous post, "Better understanding through big data".

I also have to say I was confused at several points in Barney's talk about whether Powerset was seeking better question answering or trying to do something bigger. Some of his examples seemed like they would not only require understanding query intent and the information on a single web page, but also might require understanding, synthesizing, and combining noisy and possibly conflicting data from multiple sources. The latter is a much harder problem, but Barney seemed to be suggesting that Powerset was taking it on.

In the end, the talk did not address my concern that Powerset is overpromising in the press and is likely to underdeliver. What I would really like to do is play with a live Powerset demo, perhaps Powerset powering Wikipedia search or the search for a major newspaper, and see more details behind the technology. For now, I remain worried that the pitch is running far ahead of the product.

Update: Six months later, Powerset has a management shakeup, losing its COO and having its CEO, Barney Pell, step down to CTO due to a "slip in the company's delivery date of its product."

4 comments:

Anonymous said...

Thanks for the pointer. I really want NLP to prove to be useful, but I'm frankly skeptical.

It's useful to be sitting on the net while Barney's talking. Try googling "IBM 2003 acquisition". The first result is the one you want.

Plus when have you ever not been able to find what you're looking for in Wikipedia? Just go read the article on Katrina. Or searching for apartments, go to Craigslist and search for apartments in Berkeley.

Barney also argues that search optimization is a bad thing. I'd disagree with that. Search optimization is simply the act of making your site most suitable to be found, whatever the tools. And that's a good thing. Ad agencies know how to optimize a print ad or a billboard to get the most readers, this is no different.

Anonymous said...

If you have any question this is more sizzle than steak check out the video on venturebeat.

http://venturebeat.com/2007/02/12/powerset-raising-more-money/

These guys think its 1999 again. Google killer uh, not quite.

Anonymous said...

I was triggered by the comments of Greg Linden, saying that he fears the pitch is running far ahead of the product and the actual possibilities of natural language.

To start with, I agree in general with Greg’s scepticism that NLP is a bit like the everlasting promise. For some reason many more companies failed to come with a good speech or text solution based on natural language than succeeded. At this moment I cannot comment on Powerset. However, I do know that the actual possibilities are there. Have a look at Q-go Natural Language Search www.q-go.com .

This European software company - headquartered just outside Amsterdam - is not on par with Powerset’s marketing. But this is more than compensated by the fact that Q-go are around for quite some years with actual solutions. At companies like ING, KLM, Telefonica, Deutsche Telekom, DHL, the largest European pension fund and other large European organizations the solution works on the website or on the intranet..

Based on pure NLP, Q-go answers a variety of questions. One of the live customers is a part of the Dutch Ministry of Social Affairs. On the website people can ask questions (in Dutch) like ‘’Show me the money’’. And believe me, quite a number of users are asking questions in this way. And with a typical search engine they would never get a relevant answer back.
By analyzing this sentence in different ways, Q-go understands the user intend and returns just a handful of relevant possibilities for the end-user. The type of natural language search solution accounted for a verifiable large cost saving with this organization . But more important, it is telling the organization exactly how users ask their questions. It is building insight.

The solution is 100% SaaS. And this type of natural language supports quite a number of languages, with more complex grammatical structures, like German, and even so-called secondary languages , like Catalan – the second spoken language in Spain.

The live systems are in variety of languages. Q-go can be tested in English on KLM http://www.klm.com/travel/gb_en/index_default.html
with in-domain questions.

And Greg: with an NDA we are happy to give you insight in the NLP engine that we built...

Q-go
Marcel E. Smit
CEO

Anonymous said...

Greg, you said "I also have to say I was confused at several points in Barney's talk about whether Powerset was seeking better question answering or trying to do something bigger. Some of his examples seemed like they would not only require understanding query intent and the information on a single web page, but also might require understanding, synthesizing, and combining noisy and possibly conflicting data from multiple sources."

I think that's the clue -- the something bigger is "enterprise search." I'm betting that Mechanical Turk is going to play a part in Powerset's NLP and that it's not all VC smoke and mirrors. They're going after enterprise search and that's why they need NLP. Here's my theory.