Jacky Liang, Andrey Kurenkov
On October 15th, OpenAI released a pre-print paper and an accompanying blog post both titled “Solving Rubik’s Cube with a Robot Hand”, describing how the team was able to train a 5-fingered robot hand to manipulate a Rubik’s cube with Deep Reinforcement Learning, or Deep RL.
OpenAI emphasized the “robustness” of the learned algorithm, exemplified by how the robot hand was able to manipulate the Rubik’s cube even when two of the fingers were tied together, or when the cube was pushed around by external forces. Both of these scenarios were unseen for the algorithm during training.
Along with the paper and blog post, OpenAI also released a slickly produced video about the project.
As stated in the blog post and paper, the project is not about solving a Rubik’s Cube in the sense of deriving a sequence of face rotation steps to return the cube to the “solved state.” Rather, the project focuses on learning how to control a robot hand to manipulate a Rubik’s cube to execute these rotations, with the rotations themselves being solved by a pre-existing Rubik’s cube solver.
The approach OpenAI takes is largely identical to one used last year to enable the same hand to reorient a cube, except for a technique called “Automatic Domain Randomization,” or ADR. ADR is a new approach to domain randomization (DR), which is the technique of varying many visual and physical parameters of a simulation, with the idea that an algorithm trained to work in all of these different simulation has a better chance of working in the real world.
Normal DR requires manually setting the range of the randomized parameters. By contrast, with ADR these ranges are “defined automatically and allowed to change” in ways that progressively increase the variety of environments the algorithm needs to work in. Thus, ADR removes “significant amount of manual tuning” the team had to do for their previous cube reorientation project, while making learning more efficient. This idea is “strongly related” to prior work such as POET and other recent works that suggest improvements upon DR.
A number of articles covering this news were released (e.g. TechCrunch, The Verge, IEEE Spectrum, VentureBeat) on the same day, in keeping with OpenAI’s tendency of sharing their work with journalists before most other researchers so that articles can be released on the same day as its research is announced. Some articles focused on how the robot learned to perform the task “without needing to be specifically programmed” and instead “approaches new tasks much like a human would.” Others emphasized the simulation aspect: how the technique didn’t “need any real-world training at all, as long as [the] simulations are diverse enough.”
On Twitter, most of the OpenAI researchers highlighted the “general-purpose” take on the work: