540 likes | 554 Views
How Kinect works?. Po-Hsiang Chen Advisor: Sheng- Jyh Wang. Major References. Shotton , J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation CVPR 2011 Best Paper
E N D
How Kinect works? Po-Hsiang Chen Advisor: Sheng-JyhWang
Major References • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation • CVPR 2011 Best Paper • Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US2010/0118123A1 • PrimeSense Patent
Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References
Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References
What is Kinect? • Motion sensing input device by Microsoft • Depth camera tech. developed by PrimeSense • Invented in 2005 • Software tech. developed by Rare • First announced at E3 2009 as “Project Natal” • Windows SDK Releases http://www.microsoft.com /en-us/kinectforwindows/ discover/features.aspx
Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References
Kinect Architecture IR Structured Light Mean Shift Random Decision Forest
Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References
Triangulation • Main Problem • To recover shape from multiple views, need CORRESPONDENCES between the images • Matching/Correspondence problem is hard • Occlusions, Texture, Colors.. Etc. • Solution: Structured light • Idea: Simplify matching • Strategy: Use illumination to create your own correspondences
Structured Light • Basic Principle • Use a projector to create unambiguous correspondences • Light projection • If we project a single point, matching is unique
Structured Light • Line projection ( Line Scan ) • For calibrated cameras, the epipolar geometry is known • Project a line instead of a single point
Structured Light • Project Multiple Stripes or Grids • Which stripe matches which? • Correspondence Again
Structured Light • Answer 1: Assume Surface Continuity • Ordering Constraint
Structured Light • Answer 2: Colouredstripes (De Bruijn) • Difficult to use for coloured surfaces
Structured Light • Answer 2: Coloureddots (M-array) • Difficult to use for coloured surfaces
Structured Light • Answer 3: Pattern dots (M-array) • Difficult for industrial manufacturing
Structured Light • Answer 4: Time-coded light patterns (Time multiplexing) • Use a sequence of binary patterns → (log N) images • Each stripe has a unique binary illumination code
Structured Light • All of the above are categorized as Discrete Methods • There are a lot more Continuous Structured Light Methods such as Phase shifting and etc. • Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition43(8): 2666-2680
Structured Light • All of the above are human designed patterns. • Random Speckle • Structured light using randomly generated patterns • May obtain denser depth information by solving correspondence problem
What can we do better? • A Projector is just an inverse of a camera • One projector and one camera is enough for triangulation • Need Calibration
PrimeSense Patents • US2010/0118123 • Projector-Camera system • Already calibrated structure • δZ results in δX in 32
PrimeSense Patents • US 2010/0118123 • Structured Light-1 • Pseudo-random distribution • Local: Random • Global: Gray level decreases • Can make a rough estimate in a low resolution image
PrimeSense Patents • US 2010/0118123 • Structured Light-2 • Quasi-periodic pattern • Five-fold symmetry • Results in distinct peaks in freq. domain • Contain no unit cell repeats over spatial domain • Use to reduce noise and ambient light in environment
PrimeSense Patents • US 2010/0290698
PrimeSense Patents • US2010/0290698 • Uses a special (“astigmatic”) lens with different focal length in x- and y- directions • Orientation of the circle indicates depth
Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References
From depth to joints • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation • Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used ) • Algorithms runs 5ms per frame on Xbox GPU • Novelty: Intermediate body parts representation
Body Part Inference • Body part labeling • 31 body parts • Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body
Body Part Inference • Depth image features • dI(x) is the depth at pixel x in image I • θ=(u,v) describe offsets u and v • Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations
Randomized Decision Forests • Fast and effective multi-class classifier • Each split node consists of a feature fθ and a threshold τ • At the leaf node in tree t, given a learned • Final classification
Combining Models • Multiple classifiers work together • Committees • E.g. Averaging the predictions of a set of individual models • E.g. Majority votes • Boosting • Classifiers trained in sequence • E.g. AdaBoost • Decision Tree • Binary selection corresponding to the traversal of a tree
Decision Tree • Three major aspect • A splitting criterion • A stop-splitting rule • A rule to assign each leaf to a specific class • Decision Forests • A Decision Tree Committee
Randomized Decision Forests • Fast and effective multi-class classifier • Each split node consists of a feature fθ and a threshold τ • At the leaf node in tree t, given a learned • Final classification How to train?
Randomized Decision Forests • Training • Each tree train on different images • Each image pick 2000 example pixels • Algorithm
Randomized Decision Forests • Algorithm(cont.) • Shannon entropy given Z on Y
Randomized Decision Forests • Algorithm(cont.) • Training takes a lot of efforts • 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster Where are those training data?
Training Data • Depth imaging • Simplify the task of background subtraction • Most important: easy to synthesize!!!
Kinect Architecture IR Structured Light Mean Shift Random Decision Forest
Joint Position Proposals • From the previous section, • Use Mean Shift with a weighted Gaussian kernel
Mean Shift • Kernel density estimator • Discrete points -> Continuous function • Calculate the gradient at initial point and shift • Iterate till stop
Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References
Experiments and Results • Synthetic • Real
Experiments and Results • Failure
Experiments and Results • Training parameters vs. classification accuracy
Experiments and Results • Comparisons
Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References
Conclusion • Depth images may contain enough information to solve human pose problems • Depth images are color and texture invariant, which simplifies a lot of the corresponding problem • A deep combining model with sufficient training data can become a good classifier even with simple features • Buy a Kinect for LAB