The Impact and Future of ChatGPT
What really powerful foundation models and generative models mean for AI researchers, tech, and the world at large.
It’s been a couple of months since OpenAI released ChatGPT, the large language model chatbot that not only blew the minds of many AI researchers but also introduced the power of AI to the general public. Briefly, ChatGPT is a chatbot that can respond to human instructions to do tasks from writing essays and poems to explaining and debugging code. The chatbot displays impressive reasoning capabilities and performs significantly better than prior language models.
In this editorial, I will discuss my personal takes on ChatGPT’s impact on three groups of people: AI researchers, Tech developers, and the general public. Throughout the piece, I will speculate on the implications of technologies like ChatGPT and outline some scenarios that I believe are likely to happen. This article leans more toward an op-ed than a fact-based report, so take these views with a grain of salt. With that, let’s dive in…
ChatGPT for AI Researchers
To me, a fellow AI researcher, the most important lesson of ChatGPT is that curating human feedback is very important for improving the performance of large language models (LLMs). ChatGPT changed my, and I suspect, many researchers’ views on the problem of AI alignment for LLMs, which I’ll explain now.
Before ChatGPT I intuitively thought that we had two different problems when it comes to LLMs: 1) improving LLMs’ performance on certain language-based tasks (e.g. summarization, question and answering, multi-step reasoning) while 2) avoiding harmful/toxic/biased text generations. I thought of these two objectives as related but separate and called the second problem the alignment problem. I learned from ChatGPT that alignment and task performance are really the same problem, and aligning LLMs’ outputs with human intent both reduced harmful content as well as improved task performance.
For a bit of context, on a very high level, one can think of separating modern LLM training into 2 steps:
Step 1: Self-Supervised Learning (SSL) of a neural network model to predict the next word (token) given a sequence of previous words (tokens) - this is trained on a very large, Internet-scale dataset.
Step 2: Aligning the LLMs’ generations with human preferences through various techniques, like fine-tuning the LLM on a small dataset of high-quality instruction-following texts and using Reinforcement Learning to fine-tune the LLM with a learned reward model that predicts human preferences.
For ChatGPT, OpenAI likely used many different techniques in tandem with each other to produce the final model. It also seemed like OpenAI was able to quickly respond to online complaints of misaligned behaviors (e.g. generating harmful texts), sometimes in days if not hours, so the company must also have ways of modifying/filtering model generations without actually retraining/fine-tuning the model.
ChatGPT marks a quiet come-back for Reinforcement Learning (RL). Briefly, Reinforcement Learning with Human Feedback (RHLF) first trains a reward model that predicts how high a human would score a particular LLM generation, then it uses this reward model to improve the LLM through RL.
I won’t go into too much in detail about RL here, but OpenAI has traditionally been known for its RL prowess, having authored OpenAI gym which jump-started RL research, trained RL agents to play DoTA, and famously trained robots to play the Rubik’s cube using RL on millions of years of simulation data. After OpenAI disbanded its robotics team, it seemed like RL was fading into the background for OpenAI, as its achievements in generative models came mostly from Self-Supervised Learning. The success of ChatGPT, which hinges upon RLHF, is bringing new attention to RL as a practical method for improving LLMs.
ChatGPT also shows how it will be increasingly difficult for academia to develop scale-enabled AI capabilities moving forward. While this problem could’ve been seen throughout the era of deep learning, ChatGPT makes it all the more entrenched. Not only is training the base GPT-3-type model out of reach for smaller labs (it’s no coincidence that GPT-3 and subsequent OpenAI developments happened after Microsoft threw the full weight of Azure behind OpenAI, building out dedicated server farms and supercomputers), but the data collection and RL fine-tuning pipeline that led to ChatGPT is probably also too systems/engineering heavy for the appetite of academic labs.
Making ChatGPT freely available to the general public allows OpenAI to collect additional invaluable training data, which are pivotal to its future LLM improvements. In this way, publicly hosting ChatGPT is essentially a large-scale data collection exercise for OpenAI, and it’s not something smaller organizations can afford.
Open-source and large-scale collaborative academic partnerships with companies like HuggingFace and Stability may be how academia moves forward at this point, but these organizations will always move slower than smaller teams with bigger budgets. I speculate that when it comes to state-of-the-art language models, open-source will typically lag behind companies by a few months to a year.
I believe the only way we may see the tide turning in favor of academia is if there are national-level computing clouds dedicated to academic AI research. This will no doubt cost billions of dollars and require dedicated administrative and engineering staff. Such an endeavor isn’t too out of the left field - it would be analogous to the James Webb Space Telescope and the Large Hadron Collider. In the U.S., some are already calling for national AI clouds that do LLM inference, but the ability to train and fine-tune LLMs and other foundation models is just as important. Given the national strategic importance of AI, we may actually see developments in this direction in the near future.
At the same time, AI researchers don’t always have to train big models to make big impacts. My opinion is that instead of racing for the next biggest and bestest LLM, smaller academic labs could focus on improving the use of existing LLMs, analyzing their strengths and weaknesses, and take advantage of the fact that companies are hosting these very powerful LLMs for very low costs. For example, research on LLM alignment can be conducted using available LLM APIs from OpenAI and other companies without the academic lab needing to train these models from scratch. Cheap and publically accessible access to powerful LLMs is enabling a whole suite of publically accessible research that discover new capabilities and applications of LLMs.
ChatGPT for Tech
For those working in and developing products for tech, ChatGPT and similar code-writing models present significant first-order and second-order effects. For coders, using AI-based code-completion and ChatGPT-style question-answering for learning to code and understanding an existing codebase will become indispensable parts of software engineering workflows. I speculate within a year from now, many universities will have Computer Science courses that teach best practices for leveraging AI for software engineering, among other applications.
ChatGPT and more capable AI code assistance will force a fundamental reformulation of the abstraction levels that software engineers operate on. Most software engineers do not need to reason about low-level machine code, because we have very powerful compilers that turn human-readable code, like C++, into code that machines read. Software engineers may learn the inner workings of such compilers as well as how to write code that best leverage the features and advantages of these compilers, but they do not write machine code themselves, nor do they write their own compilers.
It's likely that coding AIs will act as the new “compilers” that translate high-level human instructions into low-level code, but at a higher abstraction level. Future software engineers may write high-level documentation, requirements, and pseudocode, and they will ask AI coders to write the middle-level code that people write today. In this way, I don’t see software engineers getting replaced by AI as much as being pushed up the value chain. Skilled software engineers of the future may need to understand the strengths and weaknesses of different coding AIs and how to best structure and modify AIs for a particular application domain.
The above are first-order effects where ChatGPT directly affects how tech people, particularly software engineers, work. The second-order effects on what tech products can offer are likely to be more profound. ChatGPT and similar LLMs enable new products by 1) unlocking completely new capabilities and 2) lowering the cost of existing capabilities that they all of the sudden make economic sense.
An example for 1) is how now we can add a natural-language user interface to any software by simply letting an AI coder translate language instructions into code that calls the APIs of that said software (our latest paper is one such example). Doing this in a trustable and generalizable way will require a lot of effort, and as always with launching real products, the devil is in the details. Still, this is a fundamentally new capability, and I speculate an explosion of natural-language software UIs across all software platforms and especially in ones where traditional UIs feel clunky and inconvenient (e.g. mobile, voice assistants, VR/AR). It’s honestly hard to imagine developing a new app in the era of LLMs without incorporating a language-based UI. The bar for entry is so slow (just need to call a publically accessible LLM API), and if you don’t do it, your competitor will and will deliver a superior user experience.
Lowering the cost of existing capabilities doesn’t sound as sexy as unlocking fundamentally new ones, but it’s just as important. There might have been a lot of promising applications for LLMs, but the cost of fine-tuning LLMs for these downstream tasks may have been too high to be worth the investments. With ChatGPT and improved instruction-following, developers may no longer need to collect large datasets for fine-tuning and instead just rely on zero-shot performance. Expect a plethora of “small-scale” LLM deployments in text-based classification, summarization, and in-line prediction capabilities across many existing apps that deal with text inputs. These marginal improvements in UX may not have been worth the ROI before, but they all of sudden do now.
Low cost also means there are a lot of low-hanging fruits in the business of applying LLMs and other foundation models, creating value for consumers through good UI/UX, integrations within existing software products, and efficient go-to-market and monetization strategies. Lensa is an example that checks all of these boxes. These more practical aspects of LLM deployment will often outweigh the absolute performance of the underlying model, and successful startups can always swap an older LLM with a new and improved version. This also means that those who apply LLMs should not tie their tech stack too closely to the peculiarities of specific LLMs. The fast improvement cycle of LLMs, coupled with publically accessible APIs and the key business differentiators not being the models themselves, will likely mean that LLMs will be commoditized.
There will be two types of tech companies moving forward - those that can afford to train and run their own foundation models and those that do not, with the latter needing to pay a foundation model tax to the former. This sounds dramatic, but it isn’t so different from what we have today, where tech companies either host their own servers or pay a tax to AWS/Azure/GCP. The AI Cloud business will be a key battleground for the future of cloud platforms and will give opportunities for competitors to overtake incumbents. For example, there’s a high probability that Azure, with Microsoft’s experience and integration with OpenAI, will overtake the others with it's AI Cloud offerings (the company has already released OpenAI's models on Azure, well ahead of its competitors in Amazon and Google).
Lastly, on a more speculative ground, foundation models built on deep learning may allow us to avoid the negative consequences of a slowing Moore’s law for quite a bit longer. As these models get more capable, they will take over more and more tasks that were done by traditional software, meaning more and more software will be optimizable by merely optimizing the performance of neural networks. Neural nets run on GPUs and application-specific chips, and the improvements in their performance do not see the slowdowns apparent in the improvements of traditional CPUs, which is roughly captured by a slowing Moore’s law. We’re really lucky that there is a single neural network architecture, the Transformer (used by ChatGPT and other foundation models), that can represent general-purpose computation and be trained to perform so many different tasks so well. We’re not close to the end of optimizing Transformer performance, so I’m expecting computers to generally get faster as LLMs become more powerful and replace more complicated traditional software stacks.
ChatGPT for the General Public
ChatGPT was the first AI technology that many members of the general public directly interacted with. Sure, before ChatGPT there was Siri and Alexa, and deep learning applications were already ubiquitous in many commercial applications. The differences are previously, AI tech in deployment often worked in the background and was “filtered” through layers of traditional software and limited UI. The public has a much more direct experience with AI through ChatGPT, where a user could directly give inputs to an LLM and directly see its outputs (OpenAI does filter for harmful content and modifies user inputs with its own prompt, so it’s not directly interacting with the underlying model, but close enough). ChatGPT is also significantly more powerful than previous chatbots. Coupled with how the service has been free for now, these factors propelled ChatGPT into the mainstream conversation.
This directness makes the AI novelty/hype much more real for the general public than prior news about AI. I can imagine that all of the sudden, the claim that chatbots could be conscious doesn’t sound too far-fetched for those unfamiliar with how LLMs work. This also points to a painfully lacking aspect of science communication when it comes to AI - I argue the AI community is doing a very poor job of outreaching and educating the broader public about how AI (LLMs) work, what they can do vs. can’t do, and how to best think about responsibly using such AI technologies. Heck, we can’t even claim that people in tech know the basics about LLMs, let alone the general public, who will become the vast majority of the end users impacted by this technology. A continued failure to educate and communicate about AI in the next couple of years may prove calamitous as ChatGPT-like models make their way into mission-critical applications without the proper precautions.
Or… it may not… in the sense that maybe the best way to educate folks about a new piece of technology is by letting the public openly experiment with the technology and its applications, experience its failures, and iteratively debate and refine the popular view. The accessibility of this wave of foundation models, especially the free-use precedent set by ChatGPT, can keep the public more informed about AI through hands-on experience, in turn leading to more informed understanding and discourse.
Merely months after the release of DALL-E 2, the first really good text-to-image generative model, we’re already seeing a diverse set of policy reactions from companies and communities that try to adapt to this new reality, from outright banning AI art to incorporating the sale of AI art stock images. For ChatGPT, some academic conferences are banning its use (as well as some schools), and other academics are listing it as a co-author. There are quite a few ongoing lawsuits surrounding generative AI as well. It’s unclear at this moment what are the legal and ethical ways of using these models, but it is clear that these small-scale experimentations around AI-use policy are really important for the public to figure things out. Personally, I think this is a good direction, as I believe public policy should be determined by public discourse and not by obscure committees from any particular tech company hosting these models.
One last thought on applications of ChatGPT and similar foundation models - tech deployment always takes longer than tech innovations (although adoption speed is speeding up), and while one can build impressive LLM demos over a weekend, it’ll still take a lot of work and trial-and-error to build reliable, scalable products that bring value to consumers. Within tech, we may see a tsunami of generative AI apps in 2023, but I’m expecting these to diffuse through the general public much more slowly. There are many factors that will slow down mass-scale generative AI adoption - the inertia of existing systems and products, cultural barriers against the perception of AI replacing human workers, the cost of running AI may not make sense in a lot of applications, low reliability and trustworthiness of LLM outputs, and scaling up LLM compute infrastructure to serve billions of queries in real-time. None of these challenges will be overcome overnight, or even over months. But they will be, eventually, and the world 5 years from now will look very different.
What about the future?
If there’s anything we learned in the last 10 years of deep learning, it’s that it’s really hard to make accurate predictions about AI, both its development and its deployment. However, with confidence, I can say that ChatGPT is merely a small preview of what’s to come. For the future of foundation models, there are two directions I’ve seen promising progress on and that I think will have breakthroughs this year or next: 1) a ChatGPT-level foundation model that is truly multimodal (e.g. text, audio, image, 3D, motions, video, files) and 2) foundation models that are designed to take actions in an environment.
For 1), imagine a ChatGPT-like interface but you can upload not just text but also audio, images, videos, 3D models, as well as other structured files, and have it “understand”, analyze, manipulate, and generate such content. Bits and pieces of such technology already exist today, and it seems straightforward to incorporate all these modalities into one model.
For a more extended overview of recent advances in text-conditioned generative models, see our previous editorial:
For 2), in the near future it seems reasonable to have a foundation model that can reliably interact with a computer through a keyboard and mouse to perform many everyday tasks that humans perform today. We have some evidence that this is doable, from startups targeting robotic-process-automation to researchers trying to train AI agents that complete open-ended obejctives in Minecraft. Developing such action-oriented foundation models for physical robots instead of virtual agents will be more difficult, but progress is already underway.
With regards to commercialization, on the one hand, big tech has the ability to leverage their vast compute resources to train really powerful models. But on the other hand, public/open-source models will also become really popular/easy-to-use, so I’m not sure having your own model is a big advantage for a lot of applications. As previously mentioned, foundation models will likely be commoditized. As such, it may be natural for big tech companies that already own devices/OSes to develop LLM-suitable platforms that allow others to use foundation models to build new applications on top, instead of directly competing in building these applications (imagine an OS for mobile/AR/VR/desktop/web that’s specially tailored to for multi-modal or action-oriented foundation models).
Lastly, looking ahead, we may leave the “get free data from the Internet” regime in the next 5 years that really propelled recent progress in foundation models. While custom data will always be needed for domain-specific finetuning/alignment (either through traditional supervised learning or RLHF), pretraining powerful models with large-scale “free” data undoubtedly led to the success of GPT and similar models. It will be interesting to see how the community pivots beyond merely scraping existing digital data to improve foundation model performance. For sure we will still improve models through better training and alignment techniques, but what’s the next frontier for large-scale self-supervised learning? Where do the next 10 or 100 trillion data points come from? I’m excited to find out.
Copyright © 2023 Skynet Today, All rights reserved.