How In-Context Learning Emerges

In-context learning is the most exciting capability exhibited by Large Language Models. How does it work and where does it come from?

Jul 21, 2023

∙ Paid

TL;DR In-Context Learning (ICL) is an emergent capability of Large Language Models (LLMs) that allows them to learn new tasks on the fly without further training. This ability was first observed in GPT-3 and subsequently observed in other LLMs as well. Although the origins of ICL were mysterious in the beginning, recent research has shed light on the ingredients of ICL and shown how the model architecture, model scale, and training data distribution all play important roles in allowing ICL to emerge.

What is In-Context Learning (ICL)?

AI has come a long way in recent years (and months!), with systems like ChatGPT demonstrating impressive abilities in solving a wide variety of language-based tasks. Before LLMs, most AI models were still limited by the data they were trained on - they can only perform tasks they have explicitly been optimized for through training. GPT-3 and subsequent LLMs have been able to do something more powerful: learn new tasks and skills simply from new examples in the input, without any gradient updates or changes to the pretrained model. This ability is known as in-context learning (ICL).

Some examples of an LLM performing in-context learning. A user only needs to specify a few examples of the task, usually formatted as question/answer pairs, as part of the input to the LLM, and the LLM can perform this task by auto-completing subsequent questions. Source: OpenAI

Why is ICL exciting?

In-context learning has enormous potential to unlock more flexible, general, and human-like intelligence in AI systems. Some reasons it is generating so much interest:

Versatility - With ICL, a single model can learn a wide variety of skills at the same time, instead of needing separate training for each one.
Generalization - ICL allows models to learn underlying rules and patterns from just a few examples, and generalize them to new situations.
Efficiency - No lengthy or costly re-training of models is needed. Skills can be acquired instantly.
Accessibility - ICL enables AI systems that can be taught by everyday users through simple demonstrations of the task.

In short, ICL enables LLMs to become powerful systems that can continually learn, reason, and adapt to new tasks. But how does ICL work and where does it come from?

How Does In-Context Learning Work?

Recent research has revealed 3 key factors that enable and enhance in-context learning abilities in large language models, and we’ll go through each one.

Keep reading with a 7-day free trial

Subscribe to Last Week in AI to keep reading this post and get 7 days of free access to the full post archives.