Google has developed a “robot constitution” for AI robots to keep them from harming humans.
On January 4, Google’s DeepMind robotics team announced three new advances that it says will help robots make faster, better, and safer decisions in natural environments. One such development is AutoRT, a “robot constitution” system that collects training data.
Google’s data collection system, AutoRT, leverages the potential of underlying large models to help develop robots that can understand the actual goals of humans. It helps expand robot learning by collecting data to better train robots for the real world.
AutoRT combines a visual language model (VLM), a large language model (LLM), and a robot control model (RT-1 or RT-2) to allow the robot to collect training data in a new environment. AutoRT safely directs 20 robots at once, equipped with just a camera, robotic arm and moving base. Each robot uses a visual language model to learn about its surroundings and the objects in its line of sight. The large language model proposes a series of creative tasks the robot can perform, such as putting snacks on the counter, Or act as a decision maker to choose the right task for the robot.
Although AutoRT is a data collection system, it has safety guardrails, one of which is to provide a “robot constitution.” Google’s “Robot Constitution” uses “safety-focused prompts” that instruct big-language models to avoid choosing tasks involving humans, animals, sharp objects, and even appliances. The “robot Constitution” was inspired by science fiction writer Isaac Asimov’s “three Laws of robotics,” which state that a robot must not harm a person or stand by while a person comes to harm. A robot shall obey all commands given by man, but shall not violate the first law; A robot should protect its own safety, but it must not violate the first and second laws. To improve safety, DeepMind programmed the robot to stop automatically if the force on its joints exceeded a certain threshold, and installed a physical kill switch that allowed humans to control the robot to stop working.
Over a period of seven months, Google deployed 52 unique robots in different office buildings, collecting a diverse dataset of 77,000 robot experiments covering 6,650 unique tasks. According to The Verge, some robots are controlled remotely by human operators, while others operate from a script or fully autonomously using Google’s Robotic Transformer AI learning model.
DeepMind’s other new technologies include the neural network architecture SARA-RT, which aims to make its existing robot Transformer AI learning model faster and more accurate. The RT-Trajectory model was also announced to help robots better perform specific physical tasks such as wiping tables.
For humans, understanding how to wipe a table is intuitive, but robots can translate instructions into actual physical actions in a number of ways. Traditionally, training robotic arms has relied on mapping abstract natural language (such as wiping a table) to specific actions, such as closing the gripe, moving left, moving right, but this has made it difficult to generalize the model to new tasks.
The RT-Trajectory model can automatically add visual contours describing robot movements in training videos. The RT-Trajectory takes each video in a training dataset and overlays it with a 2D trajectory sketch of the robot arm gripper performing a task. These trajectories provide low-level practical visual cues for the model to learn robot control strategies.
In contrast, DeepMind says, the RT-Trajectory model is capable of giving a robot an understanding of “how to do” tasks by interpreting specific robot actions included in videos or sketches. The system is so versatile that it can also create tracks by watching human demonstration tasks and even accept hand-drawn sketches, and it can also easily adapt to different robot platforms.