380 likes | 542 Views
Image Parsing: Unifying Segmentation and Detection. Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty. Outline. Why Image Parsing? Introduction to Concepts in DDMCMC DDMCMC applied to Image Parsing
E N D
Image Parsing: Unifying Segmentation and Detection Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty
Outline • Why Image Parsing? • Introduction to Concepts in DDMCMC • DDMCMC applied to Image Parsing • Combining Discriminative and Generative Models for Parsing • Results • Comments
Image Parsing Optimize p(W|I) Image I Parse Structure W
Properties of Parse Structure • Dynamic and reconfigurable • Variable number of nodes and node types • Defined by a Markov Chain • Data Driven Markov Chain Monte Carlo (earlier work in segmentation, grouping and recognition)
Key Concepts • Joint model for Segmentation & Recognition • Combine different modules to obtain cues • Fully generative explanation for Image generation • Uses Generative and Discriminative Models + DDMCMC framework • Concurrent Top-Down & Bottom-Up Parsing
Pattern Classes 62 characters Faces Regions
MCMC: A Quick Tour • Key Concepts: • Markov Chains • Markov Chain Monte Carlo • Metropolis-Hastings [Metropolis 1953, Hastings 1970] • Reversible Jump [Green 1995] • Data Driven Markov Chain Monte Carlo
Markov Chains Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Markov Chain Monte Carlo Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Metropolis-Hastings Algorithm Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Metropolis-Hastings Algorithm Invariant Distribution Proposal Distribution Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
Reversible Jumps MCMC • Many competing models to explain data • Need to explore this complicated state space Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
DDMCMC Motivation Unifies Notes: Slides by Zhu, Dellaert and Tu at ICCV 2005
DDMCMC Motivation Generative Model p(I|W)p(W) State Space
DDMCMC Motivation Generative Model p(I|W)p(W) State Space Discriminative Model q( wj| I ) Dramatically reduce search space by focusing sampling to highly probable states.
DDMCMC Framework • Moves: • Node Creation • Node Deletion • Change Node Attributes
Transition Kernel Satisfies detailed balanced equation Full Transition Kernel
Convergence to p(W|I) Monotonically at a geometric rate
Image Generation Model Regions: Constant Intensity Textures Shading State of parse graph
62 characters Faces 3 Regions
Uniform Designed to penalize high model complexity
Shape Prior Faces 3 Regions
Discriminative Cues Used • Adaboost Trained • Face Detector • Text Detector • Adaptive Binarization Cues • Edge Cues • Canny at 3 scales • Shape Affinity Cues • Region Affinity Cues
Transition Kernel Design • Remember
Possible Transitions • Birth/Death of a Face Node • Birth/Death of Text Node • Boundary Evolution • Split/Merge Region • Change node attributes
Comments • Well motivated but very complicated approach to THE HOLY GRAIL problem in vision • Good global convergence results for inference with very minor dependence on initial W. • Extensible to larger set of primitives and pattern types. • Many details of the algorithm are missing and it is hard to understand the motivation for choices of values for some parameters • Unclear if the p(W|I)’s for configurations with different class compositions are comparable. • Derek’s comment on Adaboost false positives and their failure to report their exact improvement • No quantitative results/comparison to other algorithms and approaches • It should be possible to design a simple experiment to measure performance on recognition/detection/localization tasks.