In the Era of Large Language Models, What's Next for AI Chatbots ?
New chatbots powered by large-language models are much more capable than those that came before, but it'll be a while before they make their way into commercial applications.
Recent advances in large-language models (LLMs) will probably power the next generation of chatbots. However, as Meta’s BlenderBot 3 public demo shows, making AI chatbots powerful enough to be useful while avoiding harmful responses is a nontrivial task. While LLMs will be used in many commercial applications in the near future, an open-world LLM-powered chatbot is probably not one of them.
Meta’s BlenderBot 3
As we covered in the last newsletter: Meta’s new AI chatbot demo, BlenderBot 3, was spewing many conspiracy theories among other offensive remarks. For what it’s worth, Meta seems to have “cleaned up” most of the initially reported problems (it no longer repeats election conspiracies, for example), and Meta was very upfront about the chatbot’s potential flaws in its press release:
Since all conversational AI chatbots are known to sometimes mimic and generate unsafe, biased or offensive remarks, we’ve conducted large-scale studies, co-organized workshops and developed new techniques to create safeguards for BlenderBot 3. Despite this work, BlenderBot can still make rude or offensive comments, which is why we are collecting feedback that will help make future chatbots better.
BlenderBot 3’s real impact, however, is probably not the live demo website from which these news articles obtained their controversial contents. Along with the demo, Meta has publicly released the AI model behind BlenderBot 3 as well as the dataset that was used to train it. This is a big deal for other AI researchers working in conversational AI. Curating large-scale datasets and training models with 175 billion parameters are very expensive activities to do, something most researchers outside of big tech companies are not able to afford. This release goes a long way toward democratizing such AI research, and the field at large should encourage similar open-science endeavors from other big tech companies as well.
Chatbots Before and After Large Language Models
But let’s back up a little and think about where BlenderBot 3 places us in the world of chatbots. Chatbots are not new, and they have a long and multi-faceted history. A very early example of a natural language chatbot is ELIZA, built by computer scientists at MIT in 1966. What made ELIZA special was that it was specifically designed to showcase the “superficiality of communication between humans and machines.” All it did was insert key phrases from the user’s inputs into predefined sentence templates (somewhat like a mad lib), but the result was surprisingly convincing to many who interacted with the system:
The story goes that some folks actually thought ELIZA was sentient, not unlike the recent report of an ex-Google engineer who also thought the company’s chatbot, LaMDA, was sentient.
There is a qualitative difference between ELIZA and LaMDA though and for that matter a difference between all the chatbots that came before large language models (LLMs), and those that came after.
Before LLMs, chatbots relied on manually-engineered pattern matching and identification algorithms to parse and construct sentences. These algorithms explicitly analyze a sentence’s syntax tree (what are the noun phrases, verb phrases, etc, and how they are related to each other) as well as their semantics (what do words refer to and mean in the context of the sentence?). Chatbots from ELIZA onwards, even the recent voice assistants like Siri, Alexa, and Google Assistant, all rely on similar techniques.

However, as one can imagine, having software engineers and linguists sit down and write down all possible ways to parse all possible sentences in a language is an infeasible task. As such, these rule-based chatbots tend to either be very brittle or very limited, handling only a narrow range of common language expressions.
Things changed after the advent of LLMs (this really started back with OpenAI’s GPT-2 but took off even more with GPT-3), which are large deep neural nets trained on very large datasets. The training objective of such models is to predict the next “word” based on all the words that came before it. While a simple objective, this coupled with Internet-scale data and very large models enabled LLMs to respond to and generate long, coherent texts in ways that previous rule-based systems were incapable of doing.
To get LLMs to “converse,” one simply needs to give as input a sequence of text that resembles a conversation and ask the LLM to predict what texts should come next. Note the LLM isn’t trying to express a particular meaning or convey some thought in a proactive way - it is just trying to give the most likely response given the current conversation and all the conversations the model has seen in its training data.
This is not to say that the chatbot will always give the same responses to the same questions. One can easily prompt the chatbot to “converse” in different styles, much like how one can prompt text-to-image AI models to generate the same image in different styles. For example, prompting GPT-3 with “the following is a conversation between 5-year-olds” will probably give very different results from “the following is a conversation between 70-year-olds.” Google probably does something similar with its chatbot LaMDA, and here’s an example of a user chatting with LaMBDA as if it were Pluto:
What does this all mean?
The point is that LLM-based chatbots can be way more flexible and capable than previous rule-based chatbots, but that doesn’t mean the former is always better or more preferred than the latter. As this episode of BlenderBot 3 shows, LLM-based chatbots can be difficult to verify and make safe, and there are always scenarios where canned responses are more appropriate than creative ones. Rule-based chatbots are very common these days in applications like customer service, and to my knowledge, we don’t actually have a pure LLM-based chatbot, text or voice, that is being used in a commercial product. It’s likely that LLM is already used by companies to understand user inputs (e.g. figuring out if a user’s question is about sales or technical support), but it’s unlikely that an LLM is directly generating the responses seen by the user.
The fact that BlenderBot 3 is released as a research project and not a product (like OpenAI’s GPT-3) points to the fact that LLM-based “open-language” chatbots are not yet ready for public and commercial use. This isn’t a bad thing, it’s just that as many expected, “reining-in” these models and getting them to stay on topic while avoiding harmful responses will need additional research. The more interesting question to me is whether there actually is a convincing and useful application for a truly open-ended AI chatbot, or would popular uses be more about the different specializations the AI chatbots can take (e.g. customer service, video game NPCs, education). Regardless, it is a pretty exciting time for AI and language, and we should expect many more advanced chatbots in the near future.
Copyright © 2022 Skynet Today, All rights reserved.