350 likes | 575 Views
Acquiring 3D Indoor Environments with Variability and Repetition. Young Min Kim Stanford University. Niloy J. Mitra UCL / KAUST. Dong-Ming Yan KAUST. Leonidas Guibas Stanford University. Data Acquisition via Microsoft Kinect. Our tool: Microsoft Kinect. Real-time
E N D
Acquiring 3D Indoor Environments with Variability and Repetition Young Min Kim Stanford University Niloy J. Mitra UCL/ KAUST Dong-Ming Yan KAUST LeonidasGuibas Stanford University
Data Acquisition via Microsoft Kinect Our tool: Microsoft Kinect • Real-time • Provides depth and color • Small and inexpensive Raw data: • Noisy point clouds • Unsegmented • Occlusion issues
Dealing with Pointcloud Data • Object-level reconstruction • Scene-level reconstruction [Chang and Zwicker 2011] [Xiao et. al. 2012]
Mapping Indoor Environments • Mapping outdoor environments • Roads to drive vehicles • Flat surfaces • General indoor environments contain both objects and flat surfaces • Diversity of objects of interest • Objects are often cluttered • Objects deform and move Solution: Utilize semantic information
Nature of Indoor Environments • Man-made objects can often be well-approximated by simple building blocks • Geometric primitives • Low DOF joints • Many repeating elements • Chairs, desks, tables, etc. • Relations between objects give good recognition cues
Indoor Scene Understanding with Pointcloud Data • Patch-based approach • Object-level understanding [Koppula et. al. 2011] [Silberman et. al. 2012] [Shao et. al. 2012] [Nan et. al. 2012]
Comparisons [1] An Interactive Approach to Semantic Modeling of Indoor Scenes with an RGBD Camera [2] A Search-Classify Approach for Cluttered Indoor Scene Understanding
Contributions • Novel approach based on learning stage • Learning stage builds the model that is specific to the environment • Build an abstract model composed of simple parts and relationship between parts • Uniquely explain possible low DOF deformation • Recognition stage can quickly acquire large-scale environments • About 200ms per object
Approach • Learning: Build a high-level model of the repeating elements translational rotational • Recognition: Use the model and relationship to recognize the objects
Approach • Learning • Build a high-level model of the repeating elements
Output Model: Simple, Light-Weighted Abstraction • Primitives • Observable faces • Connectivity • Rigid • Rotational • Translational • Attachment • Relationship • Placement information rotational translational contact
Joint Matching and Fitting • Individual segmentation • Group by similar normals • Initial matching • Focus on large parts • Use size, height, relative positions • Keep consistent match • Joint primitive fitting • Add joints if necessary • Incrementally complete the model
Approach • Learning • Build a high-level model of the repeating elements
Approach • Learning • Build a high-level model of the repeating elements • Recognition • Use the model and relationship to recognize the objects
Hierarchy • Ground plane and desk • Objects • Isolated clusters • Parts • Group by normals • The segmentation is approximate and to be corrected later
Bottom-Up Approach • Initial assignment for parts vs. primitives • Simple comparison of height, normal, size • Robust to deformation • Low false-negatives • Refined assignment for objects vs. models • Iteratively solve for position, deformation and segmentation • Low false-positives parts
Bottom-Up Approach • Initial assignment for parts vs. primitive nodes • Refined assignment for objects vs. models Input points Models matched Initial objects Refined objects objects parts matched
Results Data available: http://www0.cs.ucl.ac.uk/staff/n.mitra/research/acquire_indoor/paper_docs/data_learning.zip http://www0.cs.ucl.ac.uk/staff/n.mitra/research/acquire_indoor/paper_docs/data_recognition.zip
Synthetic Scene Recognition speed: about 200ms per object
Similar pair Different pair
Similar pair Different pair
Office 1 2 monitors 4 chairs trash bin 2 whiteboards
Deformations missed monitor laptop monitor chair drawer deformations
Auditorium 1 Open table
Auditorium 2 Open table Open chairs
Seminar Room 1 missed chairs
Seminar Room 2 missed chairs
Limitations • Missing data • Occlusion, material, … • Error in initial segmentation • Cluttered objects are merged as a single segment • View-point sometimes separate single object into pieces
Conclusion • We present a system that can recognize repeating objects in cluttered 3D indoor environments. • We used purely geometric approach based on learned attributes and deformation modes. • The recognized objects provide high-level scene understanding and can be replaced with high-quality CAD models for visualization (as shown in the previous talks!)
Thank You • Qualcomm Corporation • Max Planck Center for Visual Computing and Communications • NSF grants 0914833 and 1011228 • a KAUST AEA grant • Marie Curie Career Integration Grant 303541 • Stanford Bio-X travel Subsidy