Why we don't have Robot Butlers yet

AI is necessary for autonomous robots of the future; its development and deployment is full of incredible challenges and promising opportunities

Welcome to the third editorial from Last Week in AI!

As explained in our recent announcement, we want to expand to broader and more in-depth explorations of recent AI developments. In particular, we’ll start by writing weekly editorials meant to address and comment on AI beyond last week’s news The first few editorials will be free, but later ones will be exclusive to paying subscribers, so if you like this one consider subscribing to have access to all the future ones:

This is the first of a series on the challenges and opportunities of applying AI to robotics, specifically in the setting of autonomous service robots that can assist humans in everyday tasks.

This article gives a high-level overview of the challenges, while later articles will dive deeper into each topic.

TL;DR: To realize the vision of robot butlers commonly seen in popular media, significant advances in hardware and software are required. Data-driven learning has a big role to play in these areas, but new methods are needed to handle the diversity of everything in the real world.

What are Robot Butlers?

In the movie Robot & Frank, a robot butler helps Frank, a retiree living by himself, with everyday tasks like preparing food, cleaning rooms, and tending the garden. The robot converses easily with Frank and other people, understands instructions and asks informative questions, and it is expected to perform all tasks a human caregiver is capable of doing.

This is the level of autonomy most of us have in mind when we see the label “robot butler.” We can more concretely characterize a system’s levels of autonomy by its expected inputs and outputs. For the robot in Robot & Frank, the inputs are visual, auditory, and tactile perception streams from its environment, and the outputs are reconfigurations of various objects in that environment.

Importantly, Frank does not have to tell the robot exactly what to do - there are no explicit instructions on how to move its feet and hands, how to grasp and use tools, or even how to cook a decent meal. The robot does not have a precise map of where everything is in the house at all times and what everything there looks like, yet it is expected to autonomously navigate and find the objects it needs.

Current robotic capabilities are far from achieving this “robot butler” level of autonomy, despite what Hollywood movies or sensationalized headlines would have you believe. Achieving these capabilities are not just important for home robots - they also have broad and important applications in many sectors like manufacturing, logistics, agriculture, and healthcare. We’ll give a high-level overview of what the key bottlenecks are for autonomous assistive robots below, and each topic will be explored in more detail in future editorials.

Challenge 1: Hardware

The utility of robots are only realized through interactions with the physical world. In this sense, robots “start” and “stop” with hardware - they need robust, long-lasting, and high-quality sensors to perceive the world, and they also need precise, powerful, and easily controllable actuators to affect the world. For a broadly applicable robot butler, these sensors and actuators should also work across a wide variety of conditions, like perceiving indoor and outdoor scenes, and dexterously manipulating rigid and deformable objects.

No robot system today, commercial or research, satisfy these requirements. Current vision sensors all have inconvenient trade-offs among resolution, framerate, and form factor. There are no reliable sensors for tactile perception that can cover the entire surface area of a robot. Our robot hands are far from the combined dexterity, compliance, and strength of the human hand.

Additional challenges include high-capacity batteries that can sustain extended operations, and relatedly, high-performance low-power onboard computing to support the complex software requirements for robot perception and control.

There is a lot of innovation needed for robot hardware to enable a “robot butler” level of autonomy, and AI methods can help with their design, optimization, and control. While there has been a lot of backslash against Tesla’s humanoid robot with overpromised capabilities and an overly optimistic timeline, more investments and R&D in robot hardware should be welcomed by the community.

Challenge 2: Software

Hardware forms the immediate inputs and outputs of a robotic system. Software is what processes the inputs and decides what the appropriate outputs should be. Software design for a reliable robotic system is fundamentally complex, encompassing almost all fields in AI. One challenge is in perception, or building a useful understanding of the world through data streams of a robot’s sensors. We’ll discuss this and others in more detail in later editorials. For this article, we’ll emphasize two related and perhaps under-discussed problems: hierarchical and sequential decision making.

Hierarchical refers to how a robot system consists of many different layers of software, all operating at different “frequencies” and somehow also seamlessly communicating with each other. At the lowest level are robot joint controllers that directly deal with motor torques or voltages - these commands can be sent at a rate of 1000 times per second. At the highest level we may have a language-based interface that take instructions from humans at a rate of 1 time per 30 minutes. In between are all the processes that make a robot “autonomous,” from processing vision inputs at 30 times per second, to planning what subtasks to do at 1 time per 10 seconds, and to deciding hand/body/feet placement goals at 10 times per second. Such a hierarchical system gets complicated fast.

Sequential refers to how a robot needs to reason about long-term impacts of its actions. These sequential decisions are made at all of hierarchies mentioned above - what low-level commands give stable behavior, where the vision system should pay attention to to help navigation, what questions to ask the human to receive a clearer instruction, and where to place the foot so the robot can keep a door open. Making sequential decisions are often referred to as “planning,” but planning too many steps ahead can be both inefficient and ineffective. The field of Reinforcement Learning (RL) tackles this problem head-on, but it is an immensely challenging domain. The graphic above depicts a robot using RL to learn a simple pick and place task with the robot system implemented in the popular Robot Operating System software framework.

Challenge 3: Data

We have all seen recent successes in computer vision, natural language processing, and various other domains where deep learning + big dataset yielded extremely impressive results. Folks in the robotics community have been trying to leverage the latest learning-based approaches in robots for many many years. Some applications, like bin-picking, have seen great success and are being widely commercialized in warehouses and fulfillment centers. However, large gaps still remain for AI-powered autonomous robots. The key problem is data.

There are two problems with collecting data for robots: one is the large diversity that exists in the real world, and two is the fact that we need data of physical interactions. For diversity, there are countless types of everyday objects like plates, mugs, and screwdrivers, and no two living rooms or offices or grocery stores are exactly alike. The sheer variation of data is overwhelming, and collecting such scale of data in the real world is prohibitively expensive, both in terms of time and money. Unlike vision and language, where billions of data points can be scraped online, there is no way to do so for physical interaction data, at least not yet (i.e. maybe one day we can extract this data from YouTube videos).

While we can collect data in simulation where robots are interacting with a virtual world, this actually leads to three challenges of its own: 1) simulation does not 100% track with the real world both in terms of physics and sensing (e.g. graphics, sound, tactile), 2) we now need to replicate the diversity of real-world objects and environments in simulation, which is no small feat, and 3) having a simulator does not solve the fundamental problem of physical interactions - a simulator does not tell us what the robot should do to get useful data, this aspect still has to be engineered or learned over time.

A relevant case study is OpenAI’s Rubik’s Cube project, where they used simulations to train a dexterous robot hand to learn how to turn the Rubik’s Cube’s sides in the real world. The project is technically impressive and produces many valuable research insights, but it also has many limitations. It collected 13,000 years worth of simulation experience over the course of many months, and the real robot hand only achieved a 20% overall success rate, even with a Bluetooth-enabled Rubik’s Cube that gave perfect state estimations. Data-driven robot learning is far from solved.

Challenge 4: Cost

Last but not least is the cost of autonomous robots. Robot hardware costs have actually decreased quite a bit over the past decade. Just now one could preorder a fairly capable robot arm for $1.5k, a far cry from the $400k PR2 robot that was a staple of robot research labs 10 years ago (although the PR2 is more capable, but surely not >200x more capable). The cost of vision sensors, especially depth sensors, have also gone down in the last decade due to commoditized hardware produced originally for video games (Microsoft Kinects) and then smartphones.

However, this is not to say that the cost of an overall autonomous robot butler, should one be built today, would be accessible and affordable. Robot arms and depth sensors are just small parts of a much bigger equation. The Shadow Hand the OpenAI used for its Rubik’s Cube project reportedly costs $60k. We’re also missing the costs of other sensors and a mobile platform. We’d also have to count for maintenance, repair, and monitoring costs, especially in the beginning of a robot butler’s development when it’s not reliable enough.

Robot costs will likely be dominated by hardware production costs, since robot software, while expensive to develop, can be “scaled for free” like all other software products. Hardware production costs of course will benefit from economies of scale, but this is a bit of a chicken and egg problem - until there are autonomous robots that can leverage hardware at scale, there is little incentive to produce such hardware at scale, and vice versa. Maturity of one category of robots, like drones or autonomous vehicles, may help bootstrap other types of robots by lowering hardware production costs. The exact path we’ll take to enable affordable (say <$10k) robot butlers remain to be seen.


So far we’ve covered 4 bottlenecks of developing robot butlers, or more generally, autonomous assistive robots that can operate in human environments. There are many challenges ahead, but also many exciting opportunities. The video above showcases Toyota Research Institute’s in-home robot platform, which hints at some potential answers to these challenges. We will dive into each topic with more detail in future editorials. If you like what you’ve read so far and are interested in reading more, please become a subscriber to receive exclusive future content and support us in our mission to demystify and disseminate AI news without the hype:

About the Author:

Jacky Liang (@jackyliang42) is a PhD student at Carnegie Mellon University’s Robotics Institute. His research interests are in using learning-based methods to enable robust and generalizable robot manipulation.

Copyright © 2021 Skynet Today, All rights reserved.