Robots That Write Their Own Code
Code-writing language models enable robots to follow language instructions and perform diverse tasks without task-specific learning
TL;DR
Beyond generic code completion, code-writing language models can also write domain-specific code according to natural language instructions. In our recent work, Code as Policies (CaP), we explore this idea in the context of robotics and show how we can prompt language models to directly write code that controls robots to perform tasks according to language instructions. Code allows the language model to perform precise arithmetic, spatial-geometric reasoning, and express a degree of behavioral common sense. We deployed CaP on various robot platforms and showed it can perform diverse tasks, from tabletop object manipulation to 2D drawing, all without any additional model training. CaP represents a new way of programming robots and points to a promising future of using language models to write code to do tasks. For more, see our website, coverage by TechCrunch, and Twitter thread:
Background
Large language models (LLMs) have shown impressive capabilities not just in natural language understanding, but also in “reasoning” tasks, from reading comprehension to answering difficult math questions. Recent advances in robotics also leverage this capability to use LLMs for robot planning (see video above and gif below). Essentially, we want to build a system that takes as input high-level natural language instructions of a task (e.g. cleaning up a coffee spill) and outputs actions the robot can execute. In PaLM-Saycan (video above and gif below), this is done by having the LLM generate a sequence of low-level actions, described by natural language, that the robot knows how to do (e.g. go to the trashcan, pick up the sponge, close cabinet drawer, etc).
This is a powerful paradigm that lets us build robot systems that 1) interface with users through language, something non-experts can do, 2) leverage LLMs’ reasoning capabilities for task planning, and 3) enable the robot to plan new tasks without task-specific model training (everything is done through prompting).
While using natural language as the input is an incredibly expressive way to specify robot tasks, there are some limitations to using natural language as the action output for robot task planners. It’s difficult for LLMs to reliably reason about spatial-geometric relationships, perform vector arithmetic, and use logic structures like if conditions and loops. It’s also hard to provide visual feedback to the language model directly through natural language (e.g. describing object coordinates, bounding boxes, and segmentation masks), and feedback is crucial for robot decision-making. It turns out these types of operations are very naturally expressed in code, and this is the question I explored in my internship this past summer at Google - what if we just get language models to write robot code? This seems like a promising direction, but how do we exactly do that, and how well does this actually work in practice?
Language Model Programs
Keep reading with a 7-day free trial
Subscribe to Last Week in AI to keep reading this post and get 7 days of free access to the full post archives.