GPT-3, Foundation Models, and AI Nationalism
Geopolitical Implications of the 'Year of Monster Models'
A few weeks ago, Andrey Kurenkov wrote “GPT-3 is No Longer the Only Game in Town,” in which he discussed large pre-trained language models that have arisen since the release of OpenAI’s model. In December, the MIT Technology Review called 2021 “the year of monster AI models.” Among those are language and multi-modal models that have come from China, South Korea, and Israel. In this piece I will discuss these models’ role in a world where AI is becoming an increasingly important national priority for governments around the world.
In his 2018 piece “AI Nationalism,” angel investor Ian Hogwarth described a trend where “continued rapid progress in machine learning will drive the emergence of a new kind of geopolitics.” The piling on of large language models from different countries following GPT-3’s release feels almost like a geopolitically-driven race in which “national champions” – highly concentrated centers of AI talent like OpenAI, Huawei, or DeepMind – develop ever more powerful models. While these large models have thus far been developed largely by private organizations without government involvement, their development within a particular country can still contribute greatly to that nation’s standing in the global AI space. Furthermore, the development of these models is at least in part due to the AI ecosystem nations have set up, as well as a potential motivator for investment at the national level. I will focus my discussion on these two aspects – the ambitions of various countries to develop their own ‘monster models’, and why that is important on a national level beyond those models’ direct applications.
Model Diffusion and Model Development
Viewed from the perspective presented in the “AI Nationalism” article, GPT-3 was an incredible statement to the world: the most capable deep learning model developed so far had its home in the United States of America. Developers and the public quickly took note of GPT-3’s myriad capabilities–it could make training models to solve language tasks far easier for developers without deep expertise–and multiple startups even began to build products on top of GPT-3. And language was not the only domain to make strides: GPT-3 was followed by CLIP and DALL-E, which used its architecture to power image generation. It is no surprise that other nations, especially China, would see this as a challenge at best and a threat at worst; a report from Stanford University on “foundation models” – models such as GPT-3, which are “trained on broad data at scale such that they can be adapted to a wide range of downstream tasks” – described them as having potential homogenizing effects, meaning that they would consolidate methodologies for building machine learning systems across different applications and introduce single points of failure.
Thus, the creators of a foundation model would exert considerable influence and power by virtue of their model’s impacts on downstream users–and those users would likely be incentivized to use these foundation models in their applications due to the ease of adapting such a model to a specific task rather than developing a new model from scratch. Should too many foundation models reside in the United States, other nations might be at the US’s mercy: the decisions and mistakes of developers in the US would echo around the world. The phenomenon of multiple state and private actors developing GPT-3-style AI models was dubbed model diffusion by former OpenAI policy director Jack Clark, following what he sees as a “general trend over the course of history… of the diffusion of general-purpose tools.”
[ Timeline of existing foundation models ]
A year and a half after the release of GPT-3, we’ve seen actors from multiple large countries join the race, particularly in the latter half of 2021:
Beijing Academy of Artificial Intelligence (BAAI) releases CPM
Huawei (China) announces Pangu-α
Naver (South Korea) unveils HyperCLOVA
BAAI announces WuDao 2.0
AI21 Labs (Israel) announces Jurassic-1
Microsoft Research (USA) announces Florence
Facebook AI Research (USA) introduces FLAVA
DeepMind (UK) announced Gopher
Peng Cheng Laboratory and Baidu (China) release PCL-BAIDU Wenxin