Real-Time Human Pose Recognition in Parts from Single Depth Images

Real-Time Human Pose Recognition in Parts from Single Depth Images Speaker DengLei At I-VisionGroup

Outline • Introduction • Data • Algorithm • Experiments

Introduction—Human Body Tracking • Apps • Game, HCI, Security, telepresence, health-care etc. • Depth Camera • Kinect • full range shapes, sizes and motions • Existing Sys • Fast tracking but slow re-initialize • Our Sys • Per-frame initialization • For temporal, kinematic coherence tracking • For initialization and recovering from failure

Show • Final Video Show

General Description • Goals • Fast • Robust • Steps: • Single depth image =>dense probabilistic body part labeling near joint • Reprojecting, Generating confidence-weighted 3D joint • Strategy • Per-pixel classification, Independent Evaluation • Synthetic depth images • Deep randomized decision forest classifier with 10^5 samples • Simple 3D invariance feature distance, GPU • Mean shift for 3D joint proposals

Testing Environment • Xbox 360 GPU • 5ms per frame • One order faster than existing • Evaluate on real and synthetic samples

Contributions • Main Contrib. • Treat pose estimation as object recognition with BODY PARTS REPRESENTATION near joint • Low computational cost and high accuracy • Insights • Synthetic data is good proxy for real • Scaling up with synthetic data is important

Data Description • Lack of training data • Computer graphics hampered by color, texture variability caused by clothing, hair, skin. • Limitation of mocap • Depth imaging • Kinect depth camera: 640x480_30, err: 10^-2m • Low light • color texture invariant • resolving silhouette ambiguities in pose • Simplify background substraction • Synthesize realistic depth images of people and build large dataset cheaply

Motion Capture data • Capture real data • 500 k • Not record • Rotation about vertical axis • Mirroring left-right • Scene position and camera pose • Shapes and size • Furthest neighbor • Reduce redundant • Finally • 100k subset, distance < 5cm

Generating synthetic data • Randomize rendering pipeline • Goal: realism and variety • Cg + random parameters • 15 base body meshes spanning shapes and sizes • Camera pose& noise, clothing & hair style etc • Compare

Body Part Inference • Labeling • Key contribution: body part representation • Color-coded with joints and gaps • 31 body parts, small: accurate numerous: waste classifier • Adjust to application • Features • Simple • 5 arithmetic • 3 image reading

Random Forest • Classification • Training • On different set of synthesized images, 2000 pixels on each. • Random set of • Left and Right • Largest Gain • Terminal Cond • Large gain, Small depth • Recursing for left and right • 3 trees, depth 20, 10^6 images, 1 day on 1000 core cluster

Joint proposals • Mean shift with Gaussian kernel • Discard outlying • Smooth • Threshold • Mean shift start from c • Parameter desicion • Grid search on 5000 images

Experiments • Paras • 3 trees, 20 deep, 300k images per tree, 2000 pixels per image • 2000 candidate 50 candidateper feature

Thanks

Real-Time Human Pose Recognition in Parts from Single Depth Images

Real-Time Human Pose Recognition in Parts from Single Depth Images

Presentation Transcript

Detection, Segmentation, and Pose Recognition of Hands in Images

Estimating Human Shape and Pose from a Single Image

Real-Time Facial Recognition

Real Time Gesture Recognition of Human Hand

Real time Color- FilteredAperture Depth Estimation with single image

Human Pose Recognition

Real-time depth up-sampling

Real-time Object Recognition in Sparse Range Images Using Error Surface Embedding

Optical depth from shadows in orbiter images of Mars

Real-Time Human Pose Recognition in Parts from Single Depth Images

Real-time head pose classification in uncontrolled environments

Human Identity Recognition in Aerial Images

Real-Time Detection, Alignment and Recognition of Human Faces

Layered Depth Images

Demonstrate Real-Time HRM Pattern Recognition

Real-Time Detection, Alignment and Recognition of Human Faces

Real-Time Multivariate Detection from Single Cells

Real-Time Speech Recognition

Pose Invariant Palmprint Recognition

Human Pose detection

Accuracy in Real-Time Depth Maps

Computer Vision: Gesture Recognition from Images