420 likes | 823 Views
3D Object Recognition U sing Computer Vision. VanGogh Imaging, Inc. Kenneth Lee. CEO/Founder klee@vangoghimaging.com. Corporate Overview. Founded in 2007, located in McLean VA
E N D
3D Object Recognition Using Computer Vision VanGogh Imaging, Inc.
Kenneth Lee CEO/Founder klee@vangoghimaging.com
Corporate Overview • Founded in 2007, located in McLean VA • Mission: “Provide easy to use, real-time 3D computer vision (CV) technology for embedded and mobile applications” • 2D to 3D for better visualization, higher reliability, and accuracy • Solve problems that require spatial measurements (e.g. parts inspection) • Target customer: Application and System Developers • Enhance existing product or develop new products • Product: ‘Starry Night’ 3D-CV Middleware (Unity Plugin) • Operating Systems: Android and Linux • 3D Sensor: Occipital Structure and Intel RealSense • Processors: ARM and Xilinx Zynq • Our focus • Object recognition • Feature detection • Analysis (e.g., measurements)
Potential Applications 3D Printing Robotics Parts Inspection Security Entertainment Medical Imaging Automotive Safety
Challenges for ImplementingReal-Time 3D Computer Vision • Busy uncontrolled real-world environment • Limited processing power and memory • Noisy and uncalibrated low-cost scanners • Difficult to use libraries • Hard to find proficient computer vision engineers • Lack of standards • Large development investment
Starry Night Unity Plugin(patent pending) Starry Night Video: https://www.youtube.com/watch?v=IZX-9PH7Erw&feature=youtu.be
The ‘Starry Night’ Template-Based3D Model Reconstruction • Reliable - The output is always a fully-formed 3D model with known feature points despite noisy or partial scans • Easy to use – Fully automated process • Powerful – Known data structure for easy analysis and measurement • Fast – Real-time modeling Input Scan (Partial) + Reference Model = Full 3D Model
3D Object Recognition Algorithm for mobile and embedded Devices
Challenges - Scene • Busy scene, object orientation, and occlusion
Challenges - Platform • Mobile and Embedded Devices • ARM – A9 or A15, <2G RAM • Existing libraries were built for laptop/desktop platform • GPU processing is not always available
Previous Approaches • (2D) Texture-Based Methods • Color-based → depends heavily on lighting or color of the object • Machine learning → robust, but requires training for each object • Neither method provides transform (i.e., orientation) • (3D) Methods • Hough transform and geometric hashing → slow • Geometric hashing → even slower • Tensor matching → not good for noisy and sparse scene • Correspondence-based methods using rigid geometric descriptors • The models must have distinctive feature points which is not true for most models (i.e., cylinder) Tried
General Concept for CV-BasedObject Recognition Reference Object Descriptor Distance & Normal Compare Fine-Tune Orientation Location Transpose Match Criteria Scene Distance & Normal of Random Sample Points
Model Descriptor (Pre-Processed) Sample all point pairs in the model that are separated by the same distance D Note: In the bear example, D = 5 cm which resulted in 1000 pairs Note: The keys are angles derived from the normal of the points. alpha(α) = first normal to second point beta(β) = second normal to first point omega(Ω) = angle of the plane between two points Use the surface normal of the pair to group them into the hash tablet
Object Recognition Workflow Grab Scene Note: The example scene has around 16K points Sample point pair w/ distance D using RANSAC Note: We iterated this sampling process 100 times Generate key using same hash function Note: Entire process can be easily parallelized Use key to retrieve similarly oriented points in the model & rough transform Very Important: Multiple models can be found using a single hash table, for example, sampled point pair in the scene Match criteria to find the best match Use ICP to refine transform
Implementation • Result Object Recognition Video: https://www.youtube.com/watch?v=h7whfei0fTw&feature=youtu.be
* CONFIDENTIAL * Object Recognition Examples
Adaptive 3D Object Recognition Algorithm Resize and Reshape
Object Recognitionfor Different Sizes & Shape • Objects in the real world are not always identical • Similarity Factor, S%, can be used to denote % of shape difference • This allows recognition of object that’s similar but does not have the exact shape as the reference model • Size Factor, Z%, can be used to note the % size the object can recognize • This allows recognition of object that’s of different sizes from the reference model
General Approach • Dynamically resizes the reference model • Dynamically reshapes the reference model • Uses our ‘Shape-based Registration’ technique • Hence, the reference model is ‘deformed’ to match the object in the scene • Results in very robust object recognition • The end reference model best represents the object in the scene both in size and shape
Block Diagram – Adaptive Object Recognition with feedback • Reference model is iteratively modified with every new frame until it converges into the same object in the scene Note: Currently in the process of being implemented and will be available in Version 1.2 later this year
Object Recognition Performance Numbers
Reliability (w/ bear model) • Reliability • % false positives – depends on the scene • Clean scene: <1% • Noisy scene: 5% (1 out of 20 frames) • % negative results (cannot find the object) • Clean scene: <1% • Noisy scene: 10% (also takes longer) • Effect of orientation on success ratio • Model facing front: >99% • Model facing backwards: >99% • Model facing sideways (narrower): 85%
Performance - Mobile • Performance on Cortex A-15 2GHz ARM (on Android mobile) • Amount of time it takes to find one object • Single thread: 2 seconds • Multi-thread & NEON: 0.3 second • Amount of time it takes to find two objects • Single thread: 2.5 seconds • Multi-thread & NEON: 0.5 second • Note: Effective use of NEON led to significant performance gains of X2.5 for certain functions
Hardware Acceleration Using FPGA • Xilinx Zynq SoC provides 20 to 1,000 parallel voxel processors depending on the size of the FPGA Zynq FPGA voxel Processor 1 voxel ARM Processor 1 voxel Processor 1 scan voxel Processor 1 voxel Processor 20+
Hardware Acceleration:FPGA (Xilinx Zynq) • Select Functions to Be Implemented in Zynq • FPGA: Matrix operations • Dual-core ARM: Data management + Floating point • Entire implementation done in C++ (Xilinx Vivado-HLS)
Performance:Embedded Using FPGA • Note: Currently, only 30% of the computationally intensive functions are implemented on the FPGA with the rest still running on ARM A9. Speed will be much improved once the remaining high-intensity functions are transferred to the FPGA. • Performance on Xilinx Zynq (Cortex A-9 800 MHZ + FPGA) • Amount of time it takes to find one object • Zynq 7020: 0.7 second • Zynq 7045 (est.): 0.1 second • No test results for two objects, but should scale the same way as for the ARM
Future • The chosen algorithm works well in most real-world conditions • The chosen algorithm is tolerant to size and shape differences respect to the reference model • The chosen algorithm can find multiple objects at the same time with minimal additional processing power • Additional improvements in performance are needed • Algorithm • Application-specific parameters (e.g., size of the model descriptor) • ARM - NEON • Optimize the use of FPGA core
Summary • Key implementation issues • Model descriptor • Data structure • Sampling technique • Platform • IMPORTANT • Both ARM & FPGA provide the scalability • Therefore • Real-time 3D object recognition was very difficult but successfully implemented on both mobile and embedded platforms! • LIVE DEMO AT THE Xilinx BOOTH!
Resources • www.vangoghimaging.com • Android 3D printing: http://www.youtube.com/watch?v=7yCAVCGvvso • “Challenges and Techniques in Using CPUs and GPUs for Embedded Vision” by Ken Lee, VanGogh Imaging—http://www.embedded-vision.com/platinum-members/vangogh-imaging/embedded-vision-training/videos/pages/september-2012-embedded-vision-summit • “Using FPGAs to Accelerate Embedded Vision Applications”, Kamalina Srikant, National Instruments— http://www.embedded-vision.com/platinum-members/national-instruments/embedded-vision-training/videos/pages/september-2012-embedded-vision-summit • “Demonstration of Optical Flow algorithm on an FPGA”—http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg • * Reference: “An Efficient RANSAC for 3D Object Recognition in Noisy and Occluded Scenes” by Chavdar Papazov and Darius Burschka. Technische Universitat Munchen (TUM), Germany.