GPT-3, Foundation Models, and AI Nationalism
Geopolitical Implications of the 'Year of Monster Models'
A few weeks ago, Andrey Kurenkov wrote “GPT-3 is No Longer the Only Game in Town,” in which he discussed large pre-trained language models that have arisen since the release of OpenAI’s model. In December, the MIT Technology Review called 2021 “the year of monster AI models.” Among those are language and multi-modal models that have come from China, South Korea, and Israel. In this piece I will discuss these models’ role in a world where AI is becoming an increasingly important national priority for governments around the world.
In his 2018 piece “AI Nationalism,” angel investor Ian Hogwarth described a trend where “continued rapid progress in machine learning will drive the emergence of a new kind of geopolitics.” The piling on of large language models from different countries following GPT-3’s release feels almost like a geopolitically-driven race in which “national champions” – highly concentrated centers of AI talent like OpenAI, Huawei, or DeepMind – develop ever more powerful models. While these large models have thus far been developed largely by private organizations without government involvement, their development within a particular country can still contribute greatly to that nation’s standing in the global AI space. Furthermore, the development of these models is at least in part due to the AI ecosystem nations have set up, as well as a potential motivator for investment at the national level. I will focus my discussion on these two aspects – the ambitions of various countries to develop their own ‘monster models’, and why that is important on a national level beyond those models’ direct applications.
Model Diffusion and Model Development
Viewed from the perspective presented in the “AI Nationalism” article, GPT-3 was an incredible statement to the world: the most capable deep learning model developed so far had its home in the United States of America. Developers and the public quickly took note of GPT-3’s myriad capabilities–it could make training models to solve language tasks far easier for developers without deep expertise–and multiple startups even began to build products on top of GPT-3. And language was not the only domain to make strides: GPT-3 was followed by CLIP and DALL-E, which used its architecture to power image generation. It is no surprise that other nations, especially China, would see this as a challenge at best and a threat at worst; a report from Stanford University on “foundation models” – models such as GPT-3, which are “trained on broad data at scale such that they can be adapted to a wide range of downstream tasks” – described them as having potential homogenizing effects, meaning that they would consolidate methodologies for building machine learning systems across different applications and introduce single points of failure.
Thus, the creators of a foundation model would exert considerable influence and power by virtue of their model’s impacts on downstream users–and those users would likely be incentivized to use these foundation models in their applications due to the ease of adapting such a model to a specific task rather than developing a new model from scratch. Should too many foundation models reside in the United States, other nations might be at the US’s mercy: the decisions and mistakes of developers in the US would echo around the world. The phenomenon of multiple state and private actors developing GPT-3-style AI models was dubbed model diffusion by former OpenAI policy director Jack Clark, following what he sees as a “general trend over the course of history… of the diffusion of general-purpose tools.”
[ Timeline of existing foundation models ]
A year and a half after the release of GPT-3, we’ve seen actors from multiple large countries join the race, particularly in the latter half of 2021:
Beijing Academy of Artificial Intelligence (BAAI) releases CPM
Huawei (China) announces Pangu-α
Naver (South Korea) unveils HyperCLOVA
BAAI announces WuDao 2.0
AI21 Labs (Israel) announces Jurassic-1
Microsoft Research (USA) announces Florence
Facebook AI Research (USA) introduces FLAVA
DeepMind (UK) announced Gopher
Peng Cheng Laboratory and Baidu (China) release PCL-BAIDU Wenxin
Let us examine this trend more closely. Two Chinese companies have already answered GPT-3 with models of their own. In April 2021, Huawei developed the “Chinese equivalent” of GPT-3: the 200 billion parameter Pangu-Alpha. In June 2021, the Beijing Academy of Artificial Intelligence announced the release of WuDao 2.0, a model that is not only far larger than GPT-3, but also multi-modal: it can perform natural language processing, text generation, image recognition, and image generation.
Of course, there was more. South Korea’s Naver released the 204 billion parameter HyperCLOVA, which the company said it plans to develop into a multi-modal model that can understand videos and images in the future. The latest in this line of large transformer-based language models is Gopher, developed by DeepMind--an interesting lab because while it resides in London, it was acquired by Google years ago.
Lastly, as CLIP/DALL-E, WuDao 2.0, and Naver indicated, text isn’t the only area that is likely to be affected by foundation models. In November 2021, Microsoft Research released Florence, dubbed “A New Foundation for Computer Vision.” Expanding on what WuDao 2.0, CLIP, and DALL-E had already done, Florence is trained on an even larger variety of data that includes images at different scales, both images and videos, and visual understanding of depth.
All these models being developed in so short a time span clearly indicates that while national competition might not yet be a reality, it is important that technological powerhouses from different countries have joined the fray and that this trend will likely continue. The flourishing of national AI strategies and focus on “national champions” and support of R&D only adds further reason to believe that nations may support the development of these models and the organizations that create them.
Power as a Service
A new foundation model might be developed within one country or another, but what does that actually mean? A key aspect of foundation models is their downstream capabilities. In “On the Opportunities and Risks of Foundation Models,” Stanford researchers highlighted that foundation models can be adapted to a range of downstream tasks; for instance, GPT-3 could be leveraged for tasks from text summarization to translation. As we mentioned earlier, this can make it easier for developers to create new AI systems. But it does have other implications: the decisions made by those who developed the foundation model will manifest in those downstream applications, and the use of just a few foundation models to power a variety of AI models implies a homogenization effect.
This is not to say that a single foundation model will rule the day for each AI domain, but there might just be a market analogy here. Countries and research groups may wrestle for dominance with ever-better foundation models that they offer to developers who would like to use them.
This has already begun to occur. At its launch, OpenAI offered GPT-3 through an API in limited release–it has since been made generally available. Since then, a number of other foundation models have been offered through similar means. Eleuther AI’s reproductions of GPT-3 are open sourced through HuggingFace. Following its collaboration with Microsoft to build the 530 billion parameter Megatron model, NVIDIA has announced its own enterprise solution for large language model pre-training: NVIDIA NeMo. Other vendors, such as SambaNova Systems and Cohere, also offer their own language model APIs.
It is notable that most of these examples hail from the US, but that may not remain the case. China’s WuDao 2.0 remains mysterious, but Inspur, who developed the “Chinese GPT-3 equivalent” Yuan 1.0, plans to open the model as an API as well. Cedille, a French language model based on Eleuther AI’s GPT-J, is also available through an API. Another researcher has open-sourced a number of language models, including GPT and BERT, in Korean.
There may be reasons to develop a foundation model besides displaying AI prowess. As we have begun to point out, different nations might seek “market dominance” through offering APIs, hoping that their models will be adopted in many different downstream uses. Of course, the fact that different nations are developing models in their native languages does not entirely support this: a nation or vendor seeking to establish market power internationally would seek to develop language models in multiple languages (example) rather than only in their own native language.
This doesn’t preclude such a scenario, but it’s not exactly clear if and how language barriers might affect the possibilities. First, it has not even been two years since GPT-3 was released, and we are still in the early stages of seeing what auxiliary developments come out of it. Second, foundation models in other domains such as vision would not be restricted by language barriers insofar as they are isolated from language. However, multimodal foundation models such as DALL-E would be subject to such a barrier. Furthermore, given the ubiquity of English text on the internet, it would not be much of a problem to find English text to train a model on as well–WuDao 2.0 was trained on both English and Chinese text.
A secondary takeaway is a statement about the AI talent and capabilities of that country. Training such a model requires great AI and engineering talent, not just to develop a high-performing model but to build the infrastructure to train it in the first place. Therefore, these developments could also be taken as a statement about a nation’s R&D capabilities and investment in its AI ecosystem. As I noted in my previous piece, establishing talent pipelines is an important piece of a national AI strategy. Displaying the ability and inclination to build state of the art models is one potential way to attract AI talent, which might be drawn towards a nation with prospects for a strong AI ecosystem in the future.
Going Big–or Small?
Insofar as foundation models represent a demonstration of the current capabilities and future prospects of a nation’s AI ecosystem, they do so in more ways than one. As noted in “On the Opportunities and Risks of Foundation Models,” these models are powerful not just because of novel architectures like the Transformer, but because of the scale provided by improvements in hardware and the availability of more training data. The growth in model size as a means of achieving state of the art results may become a trend and compel national governments to provide support so more actors might create these models. Even if the trend does not continue, foundation models may still influence future AI developments and how nations think about developing their ecosystems.
[ Growth in language model size since 2017’s BERT Source ]
Even with improved hardware, massive computational power (and data) as well as science and engineering talent is necessary to realize these models. Understandably, then, only a select number of private actors have been able to develop foundation models. While the foundation models might democratize the development of AI systems more broadly, they do make many AI developers “customers” of the original developers of a foundation model.
While large models–and the necessary compute and resources associated with them–might imply a sort of AI nationalism in their pursuit, we do not yet know if large models will become the norm. Model compression and the development of new architectures like RETRO (which achieves performance comparable to GPT-3 with 25x fewer parameters) are likely to combat and may even reverse the trend.
But researchers like Joelle Pineau think that regardless of the large model trend, AI research will continue to be dominated by large companies, at least in the US. Given the continued hearings of big tech companies on Capitol Hill, politicians might view these companies’ goals as at odds with their own and with the US’s. If this attitude remains, the US might make an effort to avoid the concentration of technological power in the hands of companies already capable of developing foundation models–one idea along this front was the National Research Cloud. That is not to say anything of other nations–the problem of supporting national champions or other organizations in the development of powerful AI systems comes down to the incentives and goals in place for various stakeholders.
Whichever direction AI research takes in the future, the influence of foundation models is likely to linger. In a world where potential uses of AI abound, allowing more people to develop AI systems with less expertise seems like a positive development. At present, foundation models appear to be one of the best ways of achieving this democratization.
But foundation models do have pitfalls and potential implications for the global AI landscape. Given the surprising effectiveness these models have displayed, many people outside the AI research community know GPT-3 by name and would like to leverage it. The business and research interest in these models present a compelling case for nations to invest in building up foundation models that can bolster their domestic AI ecosystems by providing a centerpiece on which to develop a wide range of tools.
But the implications of foundation models seem to go beyond business and research interests. The development of powerful models seems to be clearly tied to questions of AI nationalism and an “AI arms race.” Whatever direction AI takes, national governments are likely to pay more attention after the developments of the past year.
About the Author:
Daniel Bashir is a machine learning engineer at an AI startup in Palo Alto, CA. He graduated with his Bachelor’s in Computer Science and Mathematics from Harvey Mudd College in 2020. He is interested in computer vision, ML infrastructure, and information theory.