Place Recognition and Lifelong Mapping

Place Recognition and Lifelong Mapping Kurt Konolige, James Bowman, JD Chen, Patrick Mihelich Willow Garage Michael Colander, Vincent Lepetit, Pascal Fua Ecole Polytechnique Federal de Lausanne Konolige et al. View-Based Maps, RSS, 2009 Konolige and Bowman, Lefelong Visual Maps, IROS 2009 Konolige et al. Mapping, Navigation and Learning for Off-road Traversal, JFR, 2008 Konolige and Agrawal, FrameSLAM: from Bundle Adjustment to Realtime Visual Mapping, TRO, 2008

Willow Garage • PR2 Mobile Manipulation Platform • Open-source robotics software • ROS • OpenCV • Robotics and vision algorithms

p2 p1 p3 From 2D laser maps to VIEW MAPS Locally metric Global manifold

p2 p1 p3 [Grisetti et al.] Toro VSLAM by VIEW MAPS • View Maps: set of stereo views connected by nonlinear gaussian constraints Continuous recognition Locally metric Global manifold Continuous detection

Crusher Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister05, …] Multi-frame: [Engels06, Mourignon06, …] CRUSHER Carnegie Mellon NREC Vehicle 5 km autonomous traverse Rough terrain Log file data

Place Recognition: Vocabulary Trees[Nister and Stewenius CVPR06] • “Bag of words” retrieval • Vocab tree created offline • For recognition: • Image keypoints extracted • Tree encodes approximate NN search • Inverted index of images at leaves [Cummins and Newman ICRA07 Cullmer et al. ACRA08 Fraundorfer et al. IROS07 Eade and Drummond BMVC08 Williams et al. ICCV07] [Image from Nister and Stewenius CVPR06]

Place Recognition: Vocabulary Trees[Nister and Stewenius] • “Bag of words” • Vocab tree created offline • New images queried and added online Performance on Indoor dataset

Geometric Check How good a rejection filter is the geometric check?

Kidnapped Robot / Relocalization

Trajectory synthesis

Indoor VSLAM with View Maps

Place Recognition after 1 week

Challenges • Robust place recognition • Use more stables features, e.g., lines [Jana Kosecka] • Learn discriminating features with their geometry • Relax the geometry • Sub-parts: chairs, tables can move • No geometry, e.g., FAB-MAP [Cummins and Newman] • Map repair: how to integrate new information • Update local metric maps with changes • What happens when PR fails?

Visual environment change • Challenges for lifelong maps: • Map stitching • Map repair • View deletion • Robust recognition

View deletion strategy • View clusters • Distance measure between views • c(v,v’) = k/m – 1, k inliers in m matches • A cluster of set S is a maximally connected subset of S • Neighborhood of v is a set S reachable from v within a distance ndand angle na • LRU algorithm • Max size Q for any neighborhood • Preferentially thin clusters • Delete oldest clusters if necessary

Visual Odometry Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister05, …] Multi-frame: [Engels06, Mourignon06, …] - no registration - high precision Indoor Willow Garage PR2 1km indoor trajectories Online

Urban Scenes[images courtesy Andrew Comport, INRIA] • Outdoor sequence in Versailles • 1 m stereo baseline, narrow FOV • ~400 m sequence • Average frame distance: 0.6 m • Max frame distance: 1.1 m • 30 - 88 Hz implementation

LAGR [Learning Applied to Ground Robotics] 200 m autonomous traverse Off-road terrain 15 Hz implementation Autonomous Off-Road Terrain Traversal

Visual SLAM Optimal solution: Bundle Adjustment • ~1000 camera poses • ~1M 3D points • Several days to solve • NxN image matching

Visual SLAM Landmarks EKF Visual SLAM [Davison02, Sim03, Solá05, …] - small-scale (On2) - robustness? FastSLAM [Se03, Eade07, Howard07] - large-scale (O log(n)) Hybrid (PTAM, Submaps, SWF) [Klein07, Eade07, Sibley07] - small scale Frames Frame-based SLAM [Lu+Milios97, Gutmann99, Grisetti07, Konolige07/08] - large-scale (On) - robustness • ~1000 camera poses • ~1M 3D points • Several days to solve • NxN image matching

Vision Tasksrealtime Local Maps Long-range motion estimation Global Maps – Place recognition and local mapre-use [Andrew Comport ICRA 2007]

q2 q1 p3 p2 p1 q3 p3 p2 p1 Visual Odometry for Motion Estimation Stereo: [Matthies, Lacroix, Agrawal, Comport, …] Monocular: [Nister05, …] Multi-frame: [Engels06, Mourignon06, …] - no registration - precision? Local Maps no registration Long-range motion estimation GPS-less estimation

6DOF Visual Odometry Principle (SfM)

left right T T+1 Visual Odometry • Extract features • - Harris, FAST, SIFT, CenSurE • Match features • DETECTION, not TRACKING • Across successive left images • Stereo: Across left/right stereo images • Find largest consistent subset of matches • Stereo: 3 non-collinear matches yield motion estimate • Monocular: 5 matches yield motion estimate* • RANSAC method • Bundle adjust last N frames and their feature tracks

Challenge of Outdoor Environments 5 Datasets - 3 km to 6 km trajectories (autonomous) - 10 Hz stereo, 1 m baseline - Max movement typically 0.8 m - RTK GPS for ground truth

5 Km 5 m 1 mrad ~ 0.06 deg Solutions Goal: 5 m error in 5 Km (0.1%) • 1. Minimize local drift • - Center-surround features for detection stability • - Incremental BA • - Calibration (remove bias) • 2. Minimize global angular drift • - Lever-arm problem • - IMU accelerometers give global tilt/roll • - Low-drift IMU for yaw drift • - Visual SLAM for loop closure

Stable Feature Detection Corners vs. Center-surround Harris, FAST ~8 ms scaled SIFT, SURF CenSurE ~15 ms ~300 ms, ~150 ms Agrawal, Blas, Konolige CenSurE: Center-surround extrema for realtime feature detection and matching ECCV 2008

Error and Calibration camera T vehicle trajectory, m Camera to vehicle transform T misalignment Stereo system miscalibration => bias trajectory, m

Results, VO 5 km runs RTK GPS Ground Truth Run 1 Run 2

IMU vs. VO • IMU: • High XYZ drift from accelerometers (t2) • Global gravity normal (noisy) – correct tilt/roll • Low drift yaw angle (~ 1 deg/hr, tactical grade IMU)

Dataset Length RMS error MAX error course1-DTED4-run2 3129 m 5.70 m (0.18%) 10.06 m (0.32%) course2B-DTED4-run4 6440 m 5.10 m (0.08%) 8.19 m (0.13%) course2B-DTED5-run1 4712 m 6.09 m (0.13%) 10.70 m (0.23%) course3-DTED5-run1 5293 m 4.85 m (0.09%) 8.58 m (0.16%) course3-DTED4-run1 4920 m 9.16 m (0.19%) 15.30 m (0.31%) VO + IMU EKF predict VO Filter EKF IMU Filter update movieIMU.mov

VO Conclusion • 1. Visual Odometry can provide precise trajectories in GPS-less environments • - Good features have high frame match rates • - Incremental bundle adjustment improves accuracy • ~ 5 cm / √m, ~0.15 deg / √m • 2. Integration with IMU is necessary for large-scale precision • - Noisy gravity normal corrects tilt/roll • - High-quality IMU for yaw correction

Visual SLAM using Skeletons • Local registration is a small optimization problem (VO) • Loop closure is a larger but reducible optimization problem

Marginalization c q z

Long-Baseline Matching • Match using CenSure features • Good matches up to 10 m baseline • High sensitivity • High selectivity • High accuracy • Not invariant to Z-axis rotation 6.42 m distance 866 features 315 matched 101 inliers Frame 9 Frame 463

FrameSLAM Results, Versaille Rond 133 frames, 29 links 35 ms PCG VO result FrameSLAM result

FrameSLAM Results, Indoor Lab [courtesy Robert Sim] • Indoor lab sequence • 12 cm stereo baseline, wide FOV • ~100 m sequence, ~8200 key frames • 17 tack points in the VSLAM graph

FrameSLAM Results, Indoor Lab [courtesy Robert Sim] • Indoor lab sequence • 12 cm stereo baseline, wide FOV • ~100 m sequence, ~8200 key frames • Green crosses are uncorrected VO; cyan environment points • Red segments are VSLAM-corrected poses; blue environment points

Challenge of Outdoor Environments 5 Datasets - 3 km to 6 km trajectories (autonomous) - 10 Hz stereo, 1 m baseline - Max movement typically 0.8 m - RTK GPS for ground truth

FrameSLAM Results, Crusher 5K x 2 VO run 1 VO run 2 RTK GPS run 1 42K key frames2.2K link frames286 links 3.3 s PCG

Small-area 3D Reconstruction Leaving Flatland Morisset, Subramanian [SRI] Rusu [TUM]

3D Reconstruction Pipeline VSLAM Maps IMU, Odometry Stereo images Hokuyo point cloud 3D Pose estimation Place recognition Octree voxels Meshes Registered Point Clouds Planes

FrameSLAM Conclusion • VO provides accurate local registration • Reduction to frame-frame constraints eliminates all feature variables • => approximation • Further reductions of frames to skeletons gives compact system • => Large systems can be solved quickly • Some method of place recognition is required for closing loops • In small areas, realtime 3D reconstruction Many … [Ishiguro01, Ulrich00, Barbosa02, … Recent: [Cummins07, Pronobis06, …]

Place Recognition and Lifelong Mapping