Researchers help robots navigate crowded spaces with new visual perception method
A team of researchers at the University of Toronto has found a way to enhance the visual perception of robotic systems by coupling two different types of neural networks.
The innovation could help autonomous vehicles navigate busy streets or enable medical robots to work effectively in crowded hospital hallways.
鈥淲hat tends to happen in our field is that when systems don鈥檛 perform as expected, the designers make the networks bigger 鈥 they add more parameters,鈥 says Jonathan Kelly, an assistant professor at the in the Faculty of Applied Science & Engineering.
鈥淲hat we鈥檝e done instead is to carefully study how the pieces should fit together. Specifically, we investigated how two pieces of the motion estimation problem 鈥 accurate perception of depth and motion 鈥 can be joined together in a robust way.鈥
Researchers in Kelly鈥檚 lab aim to build reliable systems that can help humans accomplish a variety of tasks. For example, they鈥檝e designed such as navigating through doorways.
More recently, they鈥檝e focused on techniques that will help robots move out of the carefully controlled environments in which they are commonly used today and into the less predictable world humans are accustomed to navigating.
鈥淯ltimately, we are looking to develop situational awareness for highly dynamic environments where people operate, whether it鈥檚 a crowded hospital hallway, a busy public square or a city street full of traffic and pedestrians,鈥 says Kelly.
One challenging problem that robots must solve in all of these spaces is known to the robotics community as 鈥渟tructure from motion.鈥 This is the process by which robots stitch together a set of images taken from a moving camera to build a 3D model of the environment they are in. The process is analogous to the way humans use their eyes to perceive the world around them.
In today鈥檚 robotic systems, structure from motion is typically achieved in two steps, each of which uses different information from a set of monocular images. One is depth perception, which tells the robot how far away the objects in its field of vision are. The other, known as egomotion, describes the 3D movement of the robot in relation to its environment.
鈥淎ny robot navigating within a space needs to know how far static and dynamic objects are in relation to itself, as well as how its motion changes a scene,鈥 says Kelly. 鈥淔or example, when a train moves along a track, a passenger looking out a window can observe that objects at a distance appear to move slowly, while objects nearby zoom past.鈥
The challenge is that in many current systems, depth estimation is separated from motion estimation 鈥 there is no explicit sharing of information between the two neural networks. Joining depth and motion estimation together ensures that each is consistent with the other.
鈥淭here are constraints on depth that are defined by motion, and there are constraints on motion that are defined by depth,鈥 says Kelly. 鈥淚f the system doesn鈥檛 couple these two neural network components, then the end result is an inaccurate estimate of where everything is in the world and where the robot is in relation.鈥
In a recent study, two of Kelly鈥檚 students 鈥 Brandon Wagstaff, a PhD candidate, and former PhD student Valentin Peretroukhin 鈥 investigated and improved on existing structure from motion methods.
Their new system makes the egomotion prediction a function of depth, increasing the system鈥檚 overall accuracy and reliability. at the International Conference on Intelligent Robots and Systems (IROS) in Kyoto, Japan.
鈥淐ompared with existing learning-based methods, our new system was able to reduce the motion estimation error by approximately 50 per cent,鈥 says Wagstaff.
鈥淭his improvement in motion estimation accuracy was demonstrated not only on data similar to that used to train the network, but also on significantly different forms of data, indicating that the proposed method was able to generalize across many different environments.鈥
Maintaining accuracy when operating within novel environments is challenging for neural networks. The team has since expanded their research beyond visual motion estimation to include inertial sensing 鈥撯痑n extra sensor that is akin to the vestibular system in the human ear.
鈥淲e are now working on robotic applications that can mimic a human鈥檚 eyes and inner ears, which provides information about balance, motion and acceleration,鈥 says Kelly.
鈥淭his will enable even more accurate motion estimation to handle situations like dramatic scene changes 鈥 such as an environment suddenly getting darker when a car enters a tunnel, or a camera failing when it looks directly into the sun.鈥
The potential applications for such new approaches are diverse, from improving the handling of self-driving vehicles to enabling aerial drones to fly safely through crowded environments to deliver goods or carry out environmental monitoring.
鈥淲e are not building machines that are left in cages,鈥 says Kelly. 鈥淲e want to design robust robots that can move safely around people and environments.鈥