Recovering Human Body Configurations: Combining Segmentation and Recognition

Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros (Oxford)

The goal • Given an image: • Detect a human figure • Localize joints and limbs • Create a skeleton of their pose • Create a segmentation mask of the person

Other approaches: Simple features • Model people as generalized cylinders (1980’s) • Easily implemented bottom up • Often use tree to express relations • Problems: • Cylinders are common • Often dependencies between body parts • Really need context

Other approaches: Probable pose • Often use probable pose • Template matching • Top down constraints on pose • But even highly improbable poses are still possible

Other approaches: Frequent simplifications • Nude models • Limited poses • Background subtraction or limited clutter

“Arguably the most difficult recognition problem in computer vision” • Variation in clothing • Variation in limbs • Variation in pose

Solution: “Islands of Saliency” • Use low-level features that are informative independent of context • Based on these islands, one is able to fill in gaps with context

Algorithm

Algorithm: Segmenting into regions and superpixels

Segmentation • Combine boundary finder (Martin et al., 2002) with Normalized Cuts (Malik, Belongie, et al., 2001) • Groups similar pixels into regions

Segmentation: Regions • 40 regions • Most salient parts of body become regions • Limbs usually two “half-limbs”

Segmentation: Superpixels • 200 region (oversegmentation) • Retains virtually all structures in original • Still reduces complexity from 400,000 pixels to 200 superpixels

Algorithm: Finding salient limbs and torsos

Finding limbs • Candidates: all 40 regions • Four cues for half-limb detection • Contour: Probability of the boundary • Average probability of the region’s boundary, as measured by Martin’s boundary finder • Shape: How close to a rectangle • Area of overlap with reconstructed rectangle,

Find limbs • Shading • Limbs are roughly cylindrical, so should have 3D pop out due to shading • Compare Ix-, Ix+, Iy-, Iy+ for region to mean of Ix-, Ix+, Iy-, Iy+ for training set • Focus cue • Background is often not in focus • Cfocus = Ehigh/(a Elow + b)

Finding limbs • Cues are combined by summing • Use logistic regression to learn weights (training set of hand-labeled half-limbs)

Evaluation: Cues Number of hits Number of candidates generated

Evaluation: Performance

Evaluation summary • Not very good detectors • Strength of boundary best cue • Combining cues yields better performance • On average 4.08 of top 8 candidates produced were hits • 89% have at least 3 hits among top 8 • Motivates search for 3 half-limbs combined with head and torso

Finding torsos • Unlike half-limbs, typically several regions • Consider all sets of adjacent regions within some range of total sizes • Set of cues: • Contour • Shape • Focus • (No shading)

Finding torsos • Find orientation of torso • Find best matching head • Again contour, shape, and focus cues with shape a disk • Score for torso, score for head, and score for relative positions of head to torso multiplied to create score for oriented torso

Evaluation • Success if all four torso points within 60 pixels of ground truth

Algorithm: Pruning to form partial configurations

Body building • From 5-7 half-limbs and ~50 candidate oriented torsos form partial configurations consisting of: • Each torso • Three half limbs assigned each assigned to: • One of 8 half limb body parts • One of two polarities • 2-3 million partial configurations!

Enforce constraints: • Relative widths • Foreshortening doesn’t affect width of limbs much • Use anthropomorphic data to rule out limbs more than 4 standard deviations wider than expected • Length of limbs relative to torso • Assume torso not too foreshortened • No more than +/- 40% angle with image plane • Again, prune limbs more than 4 standard deviations away from mean length, relative to torso • Seems to be making some assumptions of probable pose

Enforce constraints • Adjacency • Upper limbs must be adjacent to torso • Lower limbs must be adjacent to upper limbs • Symmetry in clothing: color histograms must not be overly dissimilar for corresponding segments • E.g. right and left upper arms should be similar • Makes some small assumptions about variations in clothing

Body building: slimming down • Reduces to ~1000 partial configurations • Sorted by linear combination of the torso and the three half-limb scores • (This score can be used to improve torso detection)

Algorithm

Extending to full limbs • Adding additional rectangles evaluated on adjacent superpixels to empty limb joints • Want high internal similarity and high dissimilarity to surroundings

Algorithm

Summary • “Arguably the most difficult problem in computer vision” • Not solved here • Method here is appealing: • Don’t need to store exemplars • Island of saliency approach seems useful in many contexts • Use some configural knowledge to make reasonable guesses • Good illustration of integrating recognition and segmentation

Recovering Human Body Configurations: Combining Segmentation and Recognition

Recovering Human Body Configurations: Combining Segmentation and Recognition

Presentation Transcript

PDE methods for Image Segmentation and Shape Analysis:

Segmentation and Profiling using SPSS for Windows

STP Segmentation Targeting Positioning

Chapter 7

Object Recognition

Market Segmentation

Connectionist Model of Word Recognition (Rumelhart and McClelland)

Levers in the Human Body

Voxel-Based Morphometry with Unified Segmentation

Normalized Cuts and Image Segmentation

Voxel-Based Morphometry with Unified Segmentation

Chapter 5 The Human Body

Human Body Systems

Human Body System Project

The Human Body: An Orientation

Discourse Segmentation

Human Body Systems You Can’t H ave O ne W ithout T he Other

Memmler’s The Human Body in Health and Disease 11 th edition

CHAPTER 7

Chapter 5

Unit 3: Control Systems of the Human Body