Saturday, August 19, 2023

AI letter signers not worried about doomsday AI

An article in Wired, "A Letter Prompted Talk of AI Doomsday. Many Who Signed Weren't Actually AI Doomers":
A significant number of those who signed were, it seems, primarily concerned with ... disinformation ... [or] harmful or biased advice ... [But] their concerns were barely audible amid the furor the letter prompted around doomsday scenarios about AI.
Related, one of the sources for that was a blog post over at Communications of the ACM, "Why They're Worried":
Undesirable model behaviors, whether unintentional or caused by human manipulation ... highly convincing falsehoods that could lead many to believe AI-generated misinformation ... highly susceptible to manipulation ... false content by AI recommendation engines ... can be abused by bad actors.
Despite the hype over AI existential risks from some, these AI experts are worried about uses of AI like flooding the zone with propaganda or making it harder to find reliable information in Google search, practical issues with current deployment of LLMs and ML systems that are getting worse over time.

Challenges using LLMs for startups

Someone (not an AI expert) was asking me about applying large language models (LLMs, like ChatGPT) to a particular product. In case it's useful to others, here's an edited version of what I said.

When thinking about LLMs for a task, I think it's important to consider how LLMs work.

Essentially, LLMs are trying to produce plausible text given the context. It's roughly advanced next word prediction, so given previous words and phrases, and an enormous amount of data on what writing usually looks like around similar words and phrases, predict the next words.

The models are trained by having human judges quickly rate a bunch of output for how plausible it looks. This means the models are optimized for producing output that, on first glance, looks pretty good to people.

The models can imitate writing style, especially for a short period of time, but not particularly well or closely. If you ask LLMs to produce what a specific person would say, including the user of the product, you'll get a plausible-looking but unreliable answer.

The models also have no understanding of what is correct or accurate, so they will produce output that is wrong, potentially in problematic ways.

The models have no reasoning ability either, but they often can produce plausible output to questions that appear to require reasoning by rephrasing memorized answers to similar questions.

Unfortunately, these issues mean you may struggle to get LLMs to produce high quality output for many of the features and products you might be thinking about, even with considerable effort and optimization on the LLMs.

If you're prepared for that, you can try to make customers forgiving of the inevitable errors with a lot of effort on the UI and by limiting expectations, but that's not easy!