400 likes | 523 Views
Learning Shared Body Plans. Ian Endres University of Illinois work with Derek Hoiem , Vivek Srikumar and Ming-Wei Chang. How should we represent multiple related object categories?. How should we represent multiple related object categories?.
E N D
Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, VivekSrikumar and Ming-Wei Chang
How should we represent multiple related object categories? Want to detect, localize, and estimate pose of broad range of objects, including new ones
One option: independent detectors Basic-Level Categories Broad Categories Parts … Cat Detector Dog Detector Head Detector 4-Legged Animal Detector
Our previous work: Train separate detectors, Joint spatial model Wheel Vehicle Animal Four-legged Mammal Head Leg Can run Can Jump Facing right Moves on road Facing right Farhadi Endres Hoiem (2010)
Jointly trained multi-category models • Train part/category detectors to jointly predict object structure • Only need to perform well in context defined by others • Spatial model encodes likely part positions, number of parts, likely categories, etc. • Generalizes Felzenszwalb et al.: cross-category sharing, multiple parts with one model, variable size
Deformable Part Models From Felzenszwalb et al.
Detection with Deformable Part Models From Felzenszwalb et al.
Shared mixture of deformable parts: Body Plans Include a body plan for background patches: No appearance models, just a bias
Body Plan Overview High Scoring Detections + + Object Center + Head Anchors
Anchor Point Score HOG based Deformable part model (Felzenszwalb et al.) Quadratic penalty in position and scale Sa = bias + appearance score - deformation cost Sa = bias + appearance score - deformation cost Overall score must be greater than 0 to be detected
Inference: Head + + ✓ +
Inference: Leg + + + + +
Inference: Leg + + + ✓ + + Search Constraints: Count Pairwise Exclusion
Inference: Leg + + + ✓ + +
Inference: Leg + + + ✓ + + ✓
Inference: Leg + + + ✓ + + ✓
Inference: Leg + + + ✓ + ✓ + ✓
Inference: Leg + + + ✓ + ✓ + ✓
Inference: Leg ✓ + + + ✓ + ✓ + ✓
Inference Score for each body plan: Overall score for an object hypothesis:
Benefits of Joint Learning Only consider structures with:
Benefits of Joint Learning No structures have
(Latent) Max Margin Structured Learning Loss Highest Scoring Valid Structure Invalid Structure Soft margin slack
Valid Structures Positive Examples Negative Examples Head Four-legged Elk Must select BG body plan LEG LEG LEG LEG Object Detectors: 50% Overlap with ground truth Part Detectors: 25% Overlap with ground truth
Loss Positive Examples Negative Examples Head Four-legged Elk Non-BG body plan: +1 False Positives: +1 LEG LEG Head LEG LEG False Positives: +1 Duplicate Detections: +1 Missed Detections: + 1
Optimization • Latent Structured SVM • Non-convex - CCCP • Stochastic gradient descent based cutting plane optimization
Optimization Challenges • Expensive search for violated constraints • Mine many violated constraints at once • Speeds convergence • Large feature vectors (100k+) • Can’t store every mined violated constraint • Requires careful caching
Experimental Setup • CORE: Train + Test • Familiar Categories: Camel, Dog, Elephant, Elk • Parts: Head, Leg, Torso • Unfamiliar Categories: Cat, Cow • Pascal 2008: Test • Unfamiliar Categories: Cat, Cow, Horse, Sheep
Familiar Objects Unfamiliar Objects
Mixed Supervision Four-legged Dog Head L E G L E G LEG LEG Learning Four-legged Dog Head L E G LEG L E G L E G
Mixed Supervision Four-legged Dog Four-legged Dog Head + L E G L E G LEG LEG Learning Four-legged Dog Head L E G LEG L E G L E G
Mixed Supervision - Learning • Unlabeled boxes become latent variables • Compute most likely positition • No loss for missed detections Loss Highest Scoring Valid Structure
Conclusions • Jointly representing related categories leads to better performance and generalization to unfamiliar categories • Joint training important to get full benefit of spatial model