How Kinect works?

How Kinect works? Po-Hsiang Chen Advisor: Sheng-JyhWang

Major References • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation • CVPR 2011 Best Paper • Freedman, B., A. Shpunt, et al. (2008). Depth mapping using projected patterns, US2010/0118123A1 • PrimeSense Patent

Outline • What is Kinect? • Kinect Architecture • From IR to depth image • History of Structured Light • PrimeSense Invented Structured Light • From depth image to joint positions • Body Part Interference • Joint Proposals • Experiments and Results • Conclusion • References

What is Kinect? • Motion sensing input device by Microsoft • Depth camera tech. developed by PrimeSense • Invented in 2005 • Software tech. developed by Rare • First announced at E3 2009 as “Project Natal” • Windows SDK Releases http://www.microsoft.com /en-us/kinectforwindows/ discover/features.aspx

Kinect IR Structured Light

Kinect Architecture IR Structured Light Mean Shift Random Decision Forest

3D Imaging of surface

Triangulation • Main Problem • To recover shape from multiple views, need CORRESPONDENCES between the images • Matching/Correspondence problem is hard • Occlusions, Texture, Colors.. Etc. • Solution: Structured light • Idea: Simplify matching • Strategy: Use illumination to create your own correspondences

Structured Light • Basic Principle • Use a projector to create unambiguous correspondences • Light projection • If we project a single point, matching is unique

Structured Light • Line projection ( Line Scan ) • For calibrated cameras, the epipolar geometry is known • Project a line instead of a single point

Structured Light • Project Multiple Stripes or Grids • Which stripe matches which? • Correspondence Again

Structured Light • Answer 1: Assume Surface Continuity • Ordering Constraint

Structured Light • Answer 2: Colouredstripes (De Bruijn) • Difficult to use for coloured surfaces

Structured Light • Answer 2: Coloureddots (M-array) • Difficult to use for coloured surfaces

Structured Light • Answer 3: Pattern dots (M-array) • Difficult for industrial manufacturing

Structured Light • Answer 4: Time-coded light patterns (Time multiplexing) • Use a sequence of binary patterns → (log N) images • Each stripe has a unique binary illumination code

Structured Light • All of the above are categorized as Discrete Methods • There are a lot more Continuous Structured Light Methods such as Phase shifting and etc. • Salvi, J., S. Fernandez, et al. (2010). "A state of the art in structured light patterns for surface profilometry." Pattern Recognition43(8): 2666-2680

Structured Light • All of the above are human designed patterns. • Random Speckle • Structured light using randomly generated patterns • May obtain denser depth information by solving correspondence problem

What can we do better? • A Projector is just an inverse of a camera • One projector and one camera is enough for triangulation • Need Calibration

PrimeSense Patents • US2010/0118123 • Projector-Camera system • Already calibrated structure • δZ results in δX in 32

PrimeSense Patents • US 2010/0118123 • Structured Light-1 • Pseudo-random distribution • Local: Random • Global: Gray level decreases • Can make a rough estimate in a low resolution image

PrimeSense Patents • US 2010/0118123 • Structured Light-2 • Quasi-periodic pattern • Five-fold symmetry • Results in distinct peaks in freq. domain • Contain no unit cell repeats over spatial domain • Use to reduce noise and ambient light in environment

Kinect IR Structured Light

PrimeSense Patents • US 2010/0290698

PrimeSense Patents • US2010/0290698 • Uses a special (“astigmatic”) lens with different focal length in x- and y- directions • Orientation of the circle indicates depth

From depth to joints • Shotton, J., A. Fitzgibbon, et al. (2011). "Real-Time Human Pose Recognition in Parts from Single Depth Images." Microsoft Research Cambridge & Xbox Incubation • Treat body segmentation as a per-pixel classification task ( No pairwise term or CRF is used ) • Algorithms runs 5ms per frame on Xbox GPU • Novelty: Intermediate body parts representation

Body Part Inference • Body part labeling • 31 body parts • Distinct parts for left and right allow classifier to disambiguate the left and right sides of the body

Body Part Inference • Depth image features • dI(x) is the depth at pixel x in image I • θ=(u,v) describe offsets u and v • Each feature need only read at most 3 image pixels and perform at most 5 arithmetic operations

Randomized Decision Forests • Fast and effective multi-class classifier • Each split node consists of a feature fθ and a threshold τ • At the leaf node in tree t, given a learned • Final classification

Combining Models • Multiple classifiers work together • Committees • E.g. Averaging the predictions of a set of individual models • E.g. Majority votes • Boosting • Classifiers trained in sequence • E.g. AdaBoost • Decision Tree • Binary selection corresponding to the traversal of a tree

Decision Tree • Three major aspect • A splitting criterion • A stop-splitting rule • A rule to assign each leaf to a specific class • Decision Forests • A Decision Tree Committee

Randomized Decision Forests • Fast and effective multi-class classifier • Each split node consists of a feature fθ and a threshold τ • At the leaf node in tree t, given a learned • Final classification How to train?

Randomized Decision Forests • Training • Each tree train on different images • Each image pick 2000 example pixels • Algorithm

Randomized Decision Forests • Algorithm(cont.) • Shannon entropy given Z on Y

Randomized Decision Forests • Algorithm(cont.) • Training takes a lot of efforts • 3 trees with depth 20 from 1 million images takes about a day on a 1000 core cluster Where are those training data?

Training Data • Depth imaging • Simplify the task of background subtraction • Most important: easy to synthesize!!!

Kinect Architecture IR Structured Light Mean Shift Random Decision Forest

Joint Position Proposals • From the previous section, • Use Mean Shift with a weighted Gaussian kernel

Mean Shift • Kernel density estimator • Discrete points -> Continuous function • Calculate the gradient at initial point and shift • Iterate till stop

Experiments and Results • Synthetic • Real

Experiments and Results • Failure

Experiments and Results • Training parameters vs. classification accuracy

Experiments and Results • Comparisons

Conclusion • Depth images may contain enough information to solve human pose problems • Depth images are color and texture invariant, which simplifies a lot of the corresponding problem • A deep combining model with sufficient training data can become a good classifier even with simple features • Buy a Kinect for LAB

How Kinect works?

How Kinect works?

Presentation Transcript

Kinect for FRC 2012

Real-world machine learning: how Kinect gesture recognition works

12/01/11

Kinect calibration

Oh!

11/07/13

KINECT REHABILITATION

Kinect Image Manipulation Project

11/07/13

Kinect v0.1

Mariolino De Cecco , Nicolo Biasi , Ilya Afanasyev

CSE 60641 – Microsoft Kinect Performance Slide 1/26

An application Kinect camera controls Vehicles by Gesture

Kinect Part II

Windows Kinect SDK