How AI-Powered Robots Fulfill Your Online Orders
AI-powered robots are slowly taking over tedious tasks in warehouses, and it's just the beginning.
This editorial is a part of a series on AI + robotics. See the previous articles below:
As from our recent announcement, editorials will be exclusive to paid subscribers for the first month, then made available to everyone afterward. If you haven’t already, please consider subscribing to help us cover our expenses and support our mission to provide informed and accessible AI news:
TL;DR Deep learning enabled the recent wave of warehouse automation where robots are increasingly used to pick objects to fulfill orders. This is one of the first applications where data-driven learning + robotics is making a real difference; due to the large variety of objects, it is impossible for human engineers to manually write software to perform these tasks. Given the growing maturity of these technologies and the economic headwinds created by Covid-19, there will likely be rapid growth in logistics automation in the near future.
What is Picking and Why is it Important?
We increasingly depend on e-commerce in our everyday lives. When we place an order on Amazon, no matter the number or diversity of items in the order, we expect them to show up on our doorsteps within just 2 days of shipping time. There is enormous complexity in the logistics systems that happen behind the scenes to make this a reality. One of the key bottlenecks in this sequence of movements of physical goods, involving manufacturers, trucks, container ships, airplanes, trains, and warehouses, is the task of picking.
Consider this simplified example of order fulfillment for an e-commerce store. A customer places an order for N items on the store’s website. This order is sent to the store’s warehouse (assuming the store has only 1 warehouse) for fulfillment. Then, workers in the warehouse need to find the N items, typically stored in boxes or bins, pick them, pack them in a shipping box, and put the box on a truck for shipment. The picking task is retrieving an object from some container of objects and then typically placing it into another container.
Picking and sorting (which often also involves some picking) are two of the most common tasks in the entire logistics chain, and it’s important because picking is a key bottleneck in logistics that has been traditionally very difficult to automate.

According to the book Warehouse & Distribution Science:
Order-picking is the most important process in most warehouses because it consumes the most labor and it determines the level of service experienced by the downstream customers. […] it is the warehouse activity most resistant to automation.
The complexity and diversity of the shape and appearance of objects, as well as the configurations of objects in a container, mean that an automated picking system needs to perceive the objects from camera sensors and make decisions accordingly. There are millions of types of objects and possibly uncountable variations of visual appearances. As such, a system that can reliably pick any object is very difficult for humans to manually engineer.
Recognizing this obstacle and also the enormous economic benefits automated picking can bring, Amazon ran the annual Amazon Picking Challenge from 2015 to 2017 to incentivize and encourage research into this area. And it worked. Applying the latest advances in deep learning and computer vision, teams built impressive systems that can pick novel objects the system has never seen before:
Since then, numerous startups have been deploying this type of technology at scale and bringing it to warehouses across the world (see top video for one such company). This trend has only started in the last 2~3 years. With e-commerce on the rise (less than 15% of U.S. retail is online) and the public expecting faster and more reliable logistics operations, automation in logistics will only become more prevalent.
How does Deep Learning Enable Robots to Pick?
Before Deep Learning:
The classical, non-data-driven approach to picking is object-centric and model-based. Before a robot can pick an object, a 3D model of the object needs to be obtained, its potential pick points (where to grasp an object or where to place a suction gripper) computed by an algorithm or manually labeled by a human. Then, during deployment, the system would first recognize the object it needs to pick from camera images, then it matches the object’s known 3D model to the position and orientation of the perceived object in the real world. Lastly, once this is matching is done, the robot selects one of the predefined pick points and commands its arm to do the corresponding picking motion.
This process is tedious to build, difficult to scale, and very brittle. First, it requires 3D models of objects to pick, and it is very difficult for such systems to generalize and handle novel objects that are not in the database. Second, the act of finding the precise position and orientation of a 3D model from images is very difficult - even the deep-learning-driven algorithms that do this today can’t achieve sub-centimeter accuracies very well. Lastly, it’s very hard for this type of method to work in more complex scenarios, like picking objects from a pile in a bin.
After Deep Learning:
The data-driven picking approach is object-agnostic and model-free. Most of the popular Deep Learning picking methods that were developed do not care about the identity of an object, and consequently, they do not need the 3D model of the specific object to pick beforehand. These methods sidestep the object-identification phase and directly ask the question: given the current observation of the world, which picks will be successful?
Readers who are familiar with Machine Learning will recognize this is essentially a supervised learning classification problem. At the high level, the data-driven methods take as input a representation of the world (typically top-down color and depth images of a bin containing objects), a representation of a pick, and predict whether or not that pick will be successful (the object will be stable in the robot’s hands or suction cup). During deployment, the system is queried with a list of potential picks, and it chooses the one that is most likely to succeed.
Since the input to the system is image observations, not object models, the system does not need to know the precise object model beforehand. Trained on enough data, the system will be able to pick novel objects in novel configurations, without additional manual adjustments. The key to these systems is Convolutional Neural Networks, a type of neural network that can efficiently learn very complex functions that deal with images from very large datasets. See a demo of such a system below:
Data Sources
As we discussed in the previous editorial in this series, there are many ways to obtain the data needed for robot learning. There are 3 types of data needed to build a data-driven picking system - image observations, list of potential picks, and whether a pick succeeded or not. Researchers in the field have experimented with all kinds of ways of obtaining such data, from using analytical models to compute pick success, using vast physics simulations, to collecting data directly in the real world.
Like any machine learning system, the performance of deep learning picking algorithms is directly tied with the quality and relevance of the data it was trained on. For startups that deploy this technology in the real world, it would be better for their systems to be trained on datasets of objects that are similar to those found in the customers’ warehouses.
An advantage that data-driven approaches have over the classical one is the ability to adapt to new objects by learning from experience. When faced with an especially challenging object, a learning-based picking system can automatically learn successful picks through intelligent trial-and-error, so the system can become better over time.
Many Startups are Deploying this Tech Today
The number of startups deploying deep-learning-based automated picking systems has skyrocketed in the last couple of years: Ambi Robotics, Berkshire Grey, Covariant, Nimble, Pickle, RightHand Robotics, XYZ Robotics, just to name a few. The deep learning process we outlined above, although the key piece of technology, is really just one part of larger warehouse automation systems. There are many types of warehouses and fulfillment centers that have different sizes, inventories, object types, workflows, and customer requirements. While there are many startups in this field, they can easily differentiate themselves by targeting specific types of warehouses and doing well with specific types of workflows.
An important component to the success of automation systems that we did not discuss in this piece is hardware. There are many choices regarding the type of camera sensors, robot arms, gripper attachments, the surrounding workstations, the shape of the bins that contain the objects, as well as the conveyor belt and moving carts that bring the objects to a robot arm for picking and taking them away afterward. Clever hardware design can make the picking task a lot simpler for the learning algorithm and improve the performance of the overall fulfillment process.
Lastly, the long tail of novel scenarios that haunts almost all AI startups also applies to robot picking. It is not an exaggeration to say that with open-source software and access to a robot arm, one robotics engineer can probably build an automated bin-picking station that achieves an 80% picking success rate with common household objects in less than a week, 90% in less than a month. It’s the last 10% of edge cases that is really difficult to handle, and a picking startup’s competitive advantage will also hinge on how they approach this tail.
The Future of Automated Picking
Industry Outlooks
Automated robot systems in warehouses are still in their nascent stage, and there is plenty of room to grow. Beyond deploying robots to existing warehouses, given this new technology, there are opportunities to design and build new kinds of warehouses and micro-fulfillment centers with operating models that are very different from what current warehouses do.
There are also opportunities to apply this learning-based approach to automate warehouse tasks that are adjacent to picking, such as packing, palletizing, shelving, truck loading/unloading, and more. In some sense, warehousing is almost the perfect place to apply the current wave of AI-enabled robots: it has enough “variety” that is impossible for humans to write traditional software for, but it also has enough “structure” that deep learning algorithms can learn to exploit. Given the economic headwinds and increasing demand for better-performing logistics chains, more intelligent automation in this field seems inevitable.
Research Outlooks
Despite the number of startups in this field, autonomous picking as a research problem is not solved. The type of object-agnostic-pick-from-a-pile task found in warehouses is just one slice of what we want intelligent robots to be able to do. There exists many open problems related to picking that we don’t have good solutions work. Some examples include finding and picking specific objects from structured clutter like shelves, cabinets, and boxes, picking tools in specific ways for use in downstream tasks (e.g. hammers, screwdrivers, spatulas, keys), and perceiving and picking objects that can move (e.g. fruits on a swaying tree branch). There’s still a lot to be done here in developing both new software and hardware capabilities, and it’s an exciting time for robotics research.
About the Author
Jacky Liang (@jackyliang42) is a Ph.D. candidate at Carnegie Mellon University’s Robotics Institute. His research interests are in using learning-based methods to enable robust and generalizable robot manipulation.
Copyright © 2022 Skynet Today, All rights reserved.