Someone (not an AI expert) was asking me about applying large language models (LLMs, like ChatGPT) to a particular product. In case it's useful to others, here's an edited version of what I said.
When thinking about LLMs for a task, I think it's important to consider how LLMs work.
Essentially, LLMs are trying to produce plausible text given the context. It's roughly advanced next word prediction, so given previous words and phrases, and an enormous amount of data on what writing usually looks like around similar words and phrases, predict the next words.
The models are trained by having human judges quickly rate a bunch of output for how plausible it looks. This means the models are optimized for producing output that, on first glance, looks pretty good to people.
The models can imitate writing style, especially for a short period of time, but not particularly well or closely. If you ask LLMs to produce what a specific person would say, including the user of the product, you'll get a plausible-looking but unreliable answer.
The models also have no understanding of what is correct or accurate, so they will produce output that is wrong, potentially in problematic ways.
The models have no reasoning ability either, but they often can produce plausible output to questions that appear to require reasoning by rephrasing memorized answers to similar questions.
Unfortunately, these issues mean you may struggle to get LLMs to produce high quality output for many of the features and products you might be thinking about, even with considerable effort and optimization on the LLMs.
If you're prepared for that, you can try to make customers forgiving of the inevitable errors with a lot of effort on the UI and by limiting expectations, but that's not easy!