200 likes | 214 Views
Learning to estimate human pose with data driven belief propagation. Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05. Learning human motion Need to know the human body configurations Detect human body parts from a single image Where are head, arms, legs, torsos?
E N D
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05
Learning human motion Need to know the human body configurations Detect human body parts from a single image Where are head, arms, legs, torsos? Estimate human body configurations What are the size, location and orientation? First step (i.e., initialization) for full human body tracking Project Goal Statistical Inference Input • Output • Location/size/ • orientation • head • arms • legs
Challenges • Large variation in pose • Occlusion: some parts are not visible • Lighting variation: affects appearance • Cluttered background: noisy visual cues • High dimensional state variables
Main Idea Potential body parts Assembly of body parts Best assembly Image • Analysis by synthesis (i.e., Hypothesize and test) • Statistical inference • Locate body parts using cues • Importance sampling • Learn the shapes of human body parts • Intelligently guess some possible answers, i.e., assembly of body parts • Match each guessed answer with image observation using shape prior and geometry constraints visual cues & importance sampling local observation & belief propagation Head sample Lower arm sample Torso sample Upper leg sample Which observed assembly looks most likely to be a human?
In Plain English • Learning shape Collect prior knowledge of body parts • Importance sampling Intelligent guess of answer • Observation What is seen in image such as appearance, color, and edges • Belief Local evidence • Belief propagation Inference using all relevant local evidence • Potential functions Encode constraints Head sample Lower arm sample Torso sample Upper leg sample Which observed assembly looks most likely to be a human?
Markov Network • Xi: pose state of each limb • Zi: image observation of each limb • Ψij(Xi, Xj): each undirected link represents a potential function • Φi(Zi|Xi): each directed link represents a observation likelihood • To infer P(Xi|Z) (i.e., P(state variables|image observations)
Learning Body Shapes Normalize the labeled shape (1) (2) (3) (1) Normalized shape, (2) originally labeled shape and (3) reconstructed shape labeled shape For each body part, normalize labeled shape and learn a low-dimensional representation, psi, using probabilistic Principal Component Analysis (PCA) Pose parameters: Xi ={psi, sx, sy, , tx, ty}
Face Detection for Head Pose • AdaBoost-based face detector • Detection results are good but not precise • 2 class k-means algorithm to cluster skin color pixels • The head pose hypothesis Ixh is obtained by re-centering the face rectangle to the centroid of the skin color cluster and then projecting to the head PCA space • Gaussian importance function
Arm/Leg Importance Functions • Image specific skin color segmentation • Least square rectangle fitting for lower-arm& upper-leg hypothesis • Upper-arm& Lower-leg hypothesis from constrained local search • Gaussian mixture importance function Skin color segmentation Rectangle fitting Upper-arm& lower-leg search
Torso Pose Importance Function • Probabilistic Hough transform to detect line segments • Lines are assembled to quad-shapes and are pruned • Canny edge masked likelihood t(n)are evaluated for each good hypothesis Ixt(n) • Gaussian mixture importance function Results from Hough transform Torso hypothesis
Potential Constraint • Encode physical constraints of human body parts • Link points are defined between two adjacent body parts • The potential function is defined by a Gaussian radial basis function Defined link points
Likelihood Model • Average normalized steered edge response in R, G, B bands • Likelihood is the maximum of the three
Experiment: Likelihood Model Translation of the left-lower-leg Curve for the likelihood value
Joint Posterior Distribution • The joint posterior distribution of the Markov network is where X={X1, X2, …, X9} • The goal is to infer the marginal posterior P(Xi|Z) i.e., P( Configuration of body part i | Image observation)
Belief Propagation • Message passing • Non-Gaussian distribution makes closed form implementation intractable • Belief propagation Monte Carlo evidence from neighboring nodes combine with local evidence from observation
Limitations of Current Work • Some skin color regions • Face in frontal pose • Reasonable contrast (visible edges) • Low degree of occlusions
Concluding Remarks • A novel algorithm for pose estimation • Principled statistical formulation in recovering Human pose in 2-D • A working prototype • Work towards full human body tracking