340 likes | 875 Views
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros (Oxford) The goal Given an image: Detect a human figure Localize joints and limbs Create a skeleton of their pose
E N D
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros (Oxford)
The goal • Given an image: • Detect a human figure • Localize joints and limbs • Create a skeleton of their pose • Create a segmentation mask of the person
Other approaches: Simple features • Model people as generalized cylinders (1980’s) • Easily implemented bottom up • Often use tree to express relations • Problems: • Cylinders are common • Often dependencies between body parts • Really need context
Other approaches: Probable pose • Often use probable pose • Template matching • Top down constraints on pose • But even highly improbable poses are still possible
Other approaches: Frequent simplifications • Nude models • Limited poses • Background subtraction or limited clutter
“Arguably the most difficult recognition problem in computer vision” • Variation in clothing • Variation in limbs • Variation in pose
Solution: “Islands of Saliency” • Use low-level features that are informative independent of context • Based on these islands, one is able to fill in gaps with context
Segmentation • Combine boundary finder (Martin et al., 2002) with Normalized Cuts (Malik, Belongie, et al., 2001) • Groups similar pixels into regions
Segmentation: Regions • 40 regions • Most salient parts of body become regions • Limbs usually two “half-limbs”
Segmentation: Superpixels • 200 region (oversegmentation) • Retains virtually all structures in original • Still reduces complexity from 400,000 pixels to 200 superpixels
Finding limbs • Candidates: all 40 regions • Four cues for half-limb detection • Contour: Probability of the boundary • Average probability of the region’s boundary, as measured by Martin’s boundary finder • Shape: How close to a rectangle • Area of overlap with reconstructed rectangle,
Find limbs • Shading • Limbs are roughly cylindrical, so should have 3D pop out due to shading • Compare Ix-, Ix+, Iy-, Iy+ for region to mean of Ix-, Ix+, Iy-, Iy+ for training set • Focus cue • Background is often not in focus • Cfocus = Ehigh/(a Elow + b)
Finding limbs • Cues are combined by summing • Use logistic regression to learn weights (training set of hand-labeled half-limbs)
Evaluation: Cues Number of hits Number of candidates generated
Evaluation summary • Not very good detectors • Strength of boundary best cue • Combining cues yields better performance • On average 4.08 of top 8 candidates produced were hits • 89% have at least 3 hits among top 8 • Motivates search for 3 half-limbs combined with head and torso
Finding torsos • Unlike half-limbs, typically several regions • Consider all sets of adjacent regions within some range of total sizes • Set of cues: • Contour • Shape • Focus • (No shading)
Finding torsos • Find orientation of torso • Find best matching head • Again contour, shape, and focus cues with shape a disk • Score for torso, score for head, and score for relative positions of head to torso multiplied to create score for oriented torso
Evaluation • Success if all four torso points within 60 pixels of ground truth
Body building • From 5-7 half-limbs and ~50 candidate oriented torsos form partial configurations consisting of: • Each torso • Three half limbs assigned each assigned to: • One of 8 half limb body parts • One of two polarities • 2-3 million partial configurations!
Enforce constraints: • Relative widths • Foreshortening doesn’t affect width of limbs much • Use anthropomorphic data to rule out limbs more than 4 standard deviations wider than expected • Length of limbs relative to torso • Assume torso not too foreshortened • No more than +/- 40% angle with image plane • Again, prune limbs more than 4 standard deviations away from mean length, relative to torso • Seems to be making some assumptions of probable pose
Enforce constraints • Adjacency • Upper limbs must be adjacent to torso • Lower limbs must be adjacent to upper limbs • Symmetry in clothing: color histograms must not be overly dissimilar for corresponding segments • E.g. right and left upper arms should be similar • Makes some small assumptions about variations in clothing
Body building: slimming down • Reduces to ~1000 partial configurations • Sorted by linear combination of the torso and the three half-limb scores • (This score can be used to improve torso detection)
Extending to full limbs • Adding additional rectangles evaluated on adjacent superpixels to empty limb joints • Want high internal similarity and high dissimilarity to surroundings
Summary • “Arguably the most difficult problem in computer vision” • Not solved here • Method here is appealing: • Don’t need to store exemplars • Island of saliency approach seems useful in many contexts • Use some configural knowledge to make reasonable guesses • Good illustration of integrating recognition and segmentation