1 / 34

Recovering Human Body Configurations: Combining Segmentation and Recognition

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros (Oxford) The goal Given an image: Detect a human figure Localize joints and limbs Create a skeleton of their pose

betty_james
Download Presentation

Recovering Human Body Configurations: Combining Segmentation and Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros (Oxford)

  2. The goal • Given an image: • Detect a human figure • Localize joints and limbs • Create a skeleton of their pose • Create a segmentation mask of the person

  3. Other approaches: Simple features • Model people as generalized cylinders (1980’s) • Easily implemented bottom up • Often use tree to express relations • Problems: • Cylinders are common • Often dependencies between body parts • Really need context

  4. Other approaches: Probable pose • Often use probable pose • Template matching • Top down constraints on pose • But even highly improbable poses are still possible

  5. Other approaches: Frequent simplifications • Nude models • Limited poses • Background subtraction or limited clutter

  6. “Arguably the most difficult recognition problem in computer vision” • Variation in clothing • Variation in limbs • Variation in pose

  7. Solution: “Islands of Saliency” • Use low-level features that are informative independent of context • Based on these islands, one is able to fill in gaps with context

  8. Algorithm

  9. Algorithm: Segmenting into regions and superpixels

  10. Segmentation • Combine boundary finder (Martin et al., 2002) with Normalized Cuts (Malik, Belongie, et al., 2001) • Groups similar pixels into regions

  11. Segmentation: Regions • 40 regions • Most salient parts of body become regions • Limbs usually two “half-limbs”

  12. Segmentation: Superpixels • 200 region (oversegmentation) • Retains virtually all structures in original • Still reduces complexity from 400,000 pixels to 200 superpixels

  13. Algorithm: Finding salient limbs and torsos

  14. Finding limbs • Candidates: all 40 regions • Four cues for half-limb detection • Contour: Probability of the boundary • Average probability of the region’s boundary, as measured by Martin’s boundary finder • Shape: How close to a rectangle • Area of overlap with reconstructed rectangle,

  15. Find limbs • Shading • Limbs are roughly cylindrical, so should have 3D pop out due to shading • Compare Ix-, Ix+, Iy-, Iy+ for region to mean of Ix-, Ix+, Iy-, Iy+ for training set • Focus cue • Background is often not in focus • Cfocus = Ehigh/(a Elow + b)

  16. Finding limbs • Cues are combined by summing • Use logistic regression to learn weights (training set of hand-labeled half-limbs)

  17. Evaluation: Cues Number of hits Number of candidates generated

  18. Evaluation: Performance

  19. Evaluation summary • Not very good detectors • Strength of boundary best cue • Combining cues yields better performance • On average 4.08 of top 8 candidates produced were hits • 89% have at least 3 hits among top 8 • Motivates search for 3 half-limbs combined with head and torso

  20. Finding torsos • Unlike half-limbs, typically several regions • Consider all sets of adjacent regions within some range of total sizes • Set of cues: • Contour • Shape • Focus • (No shading)

  21. Finding torsos • Find orientation of torso • Find best matching head • Again contour, shape, and focus cues with shape a disk • Score for torso, score for head, and score for relative positions of head to torso multiplied to create score for oriented torso

  22. Evaluation • Success if all four torso points within 60 pixels of ground truth

  23. Algorithm: Pruning to form partial configurations

  24. Body building • From 5-7 half-limbs and ~50 candidate oriented torsos form partial configurations consisting of: • Each torso • Three half limbs assigned each assigned to: • One of 8 half limb body parts • One of two polarities • 2-3 million partial configurations!

  25. Enforce constraints: • Relative widths • Foreshortening doesn’t affect width of limbs much • Use anthropomorphic data to rule out limbs more than 4 standard deviations wider than expected • Length of limbs relative to torso • Assume torso not too foreshortened • No more than +/- 40% angle with image plane • Again, prune limbs more than 4 standard deviations away from mean length, relative to torso • Seems to be making some assumptions of probable pose

  26. Enforce constraints • Adjacency • Upper limbs must be adjacent to torso • Lower limbs must be adjacent to upper limbs • Symmetry in clothing: color histograms must not be overly dissimilar for corresponding segments • E.g. right and left upper arms should be similar • Makes some small assumptions about variations in clothing

  27. Body building: slimming down • Reduces to ~1000 partial configurations • Sorted by linear combination of the torso and the three half-limb scores • (This score can be used to improve torso detection)

  28. Algorithm

  29. Extending to full limbs • Adding additional rectangles evaluated on adjacent superpixels to empty limb joints • Want high internal similarity and high dissimilarity to surroundings

  30. Algorithm

  31. Summary • “Arguably the most difficult problem in computer vision” • Not solved here • Method here is appealing: • Don’t need to store exemplars • Island of saliency approach seems useful in many contexts • Use some configural knowledge to make reasonable guesses • Good illustration of integrating recognition and segmentation

More Related