230 likes | 345 Views
http://ai.stanford.edu/~asaxena/rccar/. High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning Jeff Michels Ashutosh Saxena Andrew Y. Ng Stanford University. ICML 2005. Problem. Drive a remote control car at high speeds Unstructured outdoor environments
E N D
http://ai.stanford.edu/~asaxena/rccar/ High Speed Obstacle Avoidance using Monocular Vision and Reinforcement LearningJeff MichelsAshutosh SaxenaAndrew Y. NgStanford University ICML 2005.
Problem • Drive a remote control car at high speeds • Unstructured outdoor environments • Off the shelf hardware, inexpensive cameras and little processing power • Vision and Driving Control Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Prior Work: Vision • Estimating depth from multiple images: • Stereovision (e.g., Scharstein & Szeliski, 2002) • Depth from Defocus (e.g., Klarquist et al., 1995) • Optical Flow/Structure from motion (e.g., Barron et al., 1994) • Motivation #1: Monocular vision. • Stereo vision has limits • baseline distance between cameras • vibration and blur • We would like to explore the use of monocular cues. Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Prior Work: Driving Control • Driving • Stereo-vision for driving (LeCun, 2003) • Highways with clear lane markings (Pomerleau, 1989) • Single camera for indoor robot, but known color and texture of ground (Gini & Marchi, 2002) • Motivation #2: Reinforcement learning • Many past successes used model-based RL. • Does model-based RL still make sense even for tasks requiring complex perception? • (To simulate vision input, we need to use computer graphics!) Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Approach Vision System • Estimate distance to nearest obstacle in each possible steering direction. Driving Control • Map from the output of the vision system into steering commands for the car. • Use reinforcement learning to learn the policy. Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Vision System: Training Data • Image divided into vertical columns corresponding to possible steering directions. • Image labeled with depth for each vertical column • Laser range finder -- ground truth distances Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Vision System: Monocular Cues • Monocular Cues used by humans for depth perception • Texture Variations - Laws’ • Texture Gradient (Linear Perspective) - Radon, Harris • Haze - Color • Occlusion • Known Object Size (Loomis, Nature 2001) Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Feature Vector: Monocular Cues • Texture Variation • Texture Gradient • Occlusion, Object Size, Global structure • Overlapping windows • Appending adjacent stripe’s vectors • The feature vector size is 858 dimension Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Learning Algorithm • Supervised learning to estimate the distance d in each column of the image. • Learn weights w via ordinary least squares with quadratic cost. arg minw ∑i (di - wT xi )2 • Other regression methods (SVR, robust regression) gave similar results weights depth i = columns, images features Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Results: Learning Depth Estimates • Errors on a log scale E = ∑ | log10(d) – log10(destimated)| • Able to predict depth with a average error of 0.26 orders of magnitude. Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Synthetic Graphics Data • Graphics images for training the vision system. • Variable degree of graphical realism • Can a system trained on synthetic images predict distances on real images? Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Results: Combined Vision System • When the distance to nearest obstacle in the chosen direction is less than 5 m, then it is a hazard. • Hazard rate improves by combining the real and synthetic trained system. 24% hazard rate reduction over using only real images. Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Control: Reinforcement Learning • Model based RL -- hard perception problem • Randomly generated environment in graphics simulator • Pegasus (Ng & Jordan, 2000) to learn control policy • Car initialized at (0,0) and ran for fixed time horizon. • Learning algorithm converged after 1674 iterations of policy search. Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Reinforcement Learning: Parameters • 1: spatial smoothing of predicted distances • 2: threshold distance for evasive action • 3: steering angle parameter • 4, 5: evasive action parameters • 6: throttle parameter Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Results: Actual Driving Experiments Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Results: Driving Times Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Summary • Monocular depth estimation is an interesting and important problem. • Supervised learning for depth estimation. • Model-based RL, using computer graphics simulator, to learn controller. Jeff Michels, Ashutosh Saxena & Andrew Y. Ng
Extensions/Future Work • Learn complete depth maps • Markov Random Field (MRF) to estimate depths. Learning depth from single monocular images, Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng. In NIPS 2005. Image Ground Truth Predicted Jeff Michels, Ashutosh Saxena & Andrew Y. Ng [also with Sung Chung.]
Contact: Ashutosh Saxena, asaxena@cs.stanford.edu http://ai.stanford.edu/~asaxena/rccar/ http://ai.stanford.edu/~asaxena/learningdepth/ Jeff Michels, Ashutosh Saxena & Andrew Y. Ng