A team of AI researchers have created a way for humans to train robots to use cheap, freely available materials in order to scale the collection of training data and enable widespread adoption. The experiment utilizes reacher-grabber devices on a pole, the kind people with disabilities, senior citizens, and others commonly use to grab something that’s out of reach. Authors mounted a GoPro camera to the device to record the gripper fingers at work. They used the data to manipulate or stack previously unseen objects in previously unseen environments.
Authors call their approach the Demonstrations Using Assistive Tools (DAT) framework and say it’s novel because it allows for the collection of training data virtually anywhere, not just in a lab setting like previously existing methods. The preprint, titled “Visual Imitation Made Easy,” was published on Tuesday by authors from Carnegie Mellon University, Facebook AI Research, New York University, and UC Berkeley.
Researchers chose to use the low-tech reacher-grabber in a high-tech robotics study, they said, because it is a cheap and commonly available “robot” that requires no training to use. Reacher-grabber tools cost about $10 online today and can be used to do a lot around the home, like unlock a door, open a drawer, or grab and move various objects. The Handi-Grip reacher-grabber used in the study cost $18.
Researchers from Google and Columbia University came up with the idea to use a pole with a grabbing instrument on the end came, which was accepted for publication in June.
To train the model, they attached a GoPro camera to the reacher pole via a 3D-printed mount and recorded 1,000 attempts to move objects or complete tasks. Once they collected the videos, the researchers used them to train a convolutional neural network, which was applied to a robotic arm fitted with a camera and the same kind of two-finger grasping clamp as a reacher-grabber. Finally, they added data augmentation such as random jitters, crops, and rotation to training data to achieve higher rates of success when tested in a lab setting. They used behavioral cloning and supervised learning to train the model’s policy settings.
“Given these visual demonstrations, we extract tool trajectories using off-the-shelf Structure from Motion (SfM) methods and the gripper configuration using a trained finger detector. Once we have extracted tool trajectories, corresponding skills can be learned using standard imitation learning techniques,” the paper reads.
At the end of the process, the system achieved success rates of 87.5% in pushing objects across a table to a target spot and 62.5% in stacking performance. Humans intervened in some instances at testing time to attempt to trick the robot into failing at its task.
Visual imitation learning attempts to train robots to do things based on what they see demonstrated from input data, like a photo or video of a task being completed. Existing methods include kinesthetic teaching, but that can be slow and difficult and requires more human labor.
The authors argue these approaches can also be expensive and aren’t as useful as AI trained with data in real-world settings. Using the cheapest form of robotics available is generally seen as a way to democratize access and allow for wider adoption of AI systems when they do become available in production.
The DAT framework approach could be valuable in training the kinds of robots that operate in people’s homes to grasp a unique set of objects, such as Hello Robot, a startup that was launched out of stealth last month by former Google robotics director Aaron Edsinger and Georgia Tech professor Charlie Kemp. That company’s first robot, Stretch, also uses a simple gripper that resembles a reacher-grabber to grab or move objects in the real world.
This is the latest work from UC Berkeley that utilizes commodity hardware to improve robotic systems. In June, UC Berkeley AI researchers introduced Dex-Net AR, which uses two-minute scans of physical objects with Apple’s ARKit to train a robotic arm using a handlike grasp to pick up unique objects. In 2018, UC Berkeley researchers shared work that trained AI agents to do unique movements like backflips and the “Gangnam Style” dance from YouTube videos.