440 likes | 605 Views
Grammars in computer vision. Presented by: Thomas Kollar. Slides courtesy of Song-Chun Zhu. Context in computer vision. Outside the object (contextual features). Inside the object (intrinsic features). Object size. Pixels. Parts. Global appearance. Global context. Local context.
E N D
Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu
Context in computer vision Outside the object (contextual features) Inside the object (intrinsic features) Object size Pixels Parts Global appearance Global context Local context Kruppa & Shiele, (03), Fink & Perona (03) Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03) He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99) Strat & Fischler (91), Torralba (03), Murphy, Torralba & Freeman (03) Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99) Etc.
Guzman (SEE), 1968 Noton and Stark 1971 Hansen & Riseman (VISIONS), 1978 Barrow & Tenenbaum 1978 Brooks (ACRONYM), 1979 Marr, 1982 Ohta & Kanade, 1978 Yakimovsky & Feldman, 1973 Why grammars? [Ohta & Kanade 1978]
Which papers? • F. Han and S.C. Zhu, Bottom-up/Top-down Image Parsing with Attribute Grammar, 2005. • ZijianXu; A hierarchical compositional model for representation and sketching of high-resolution human images, PhD Thesis 2007. • Song-Chun Zhu and David Mumford; A stochastic grammar of images, 2007. • L. Lin, S. Peng, J. Porway, S.C. Zhu, and Y. Wang, An empirical study of object category recognition: sequential testing with generalized samples, 2007.
Three projects using and-or graphs • Modeling an environment with rectangles. • Creating sketches
Commonalities • Use context sensitive grammars Called And-Or graphs in these papers Provides top-down and bottom-up influence Most are generative all the way to the pixel level • Configuration matters E.g. they don’t assume independence given the parent These can take the form of a MRF
Challenges • Objects have large within-category variations • Scenes have variation
Challenges • Describing people has variation
Three phases • Bottom-up detection • Compute edge segments and a number of vanishing points. These vanishing points are grouped into a line set and rectangle hypotheses are found using RANSAC, generating a number of rectangles from a bottom up proposal. • Initialize the terminal nodes greedily • Pick the most promising hypotheses with heaviest weight by increase in posterior probability. • Incorporate top-down influence • Each step of the algorithm picks the most promising proposal among the 5 candidate rules by increase in posterior probability. • When a new non-terminal node is accepted (1) insert and create a new proposal (2) reweight the proposals (3) pass attributes between the node and parent.
Probability Models • p(C_free) follows the primal sketch model. • p(G) is the probability of the parse tree • p(I | G) is the reconstruction likelihood
Probability Models • p(l) is the probability of a rule • p(n | l) is the probability of the number of components given the type of rule. • p(X | l, n) is the probability of the geometry of A. • p(X(B) | X(A)) ensures regularities between the geometries (e.g. that aligned rectangles have almost the same shape). e.g. each square should look reasonable e.g. for the line rule, enforce that everything lines up
Probability Models • Primal sketch model
Inference: bottom-up detection of rectangles • RANSAC is run to propose a number of rectangles using vanishing points
Inference: initialize terminal nodes • Input: candidate set of rectangles from previous phase • Output: a set of non-terminal nodes representing rectangles • While(not done): • re-compute weights • Greedily select the rectangle with the highest weight • Create a new non-terminal node in the grammar
Inference: initialize terminal nodes • Input: non-terminal rectangles from previous step • Output: a parse graph • While (not done): • re-compute weights • Greedily select the highest weight candidate rule • Add rule to parse graph along with any top-down predictions. • Weights are computed similarly to before.
Generating sketches • Additional semantics
Challenges • Geometric deformations clothes are very flexible • Photometric variabilities large variety of colors, shading and texture • Topological configurations combinatorial number of clothes designs
And-Or graph • “In a computing and recognition phase, we first activate some sub-templates in a bottom-up step. For example, we can detect the face and skin color to locate the coarse position of some components, which help to predict the positions of other components by context.”
Conclusions • Grammar-based model was presented for generating sketches. • Markov random fields at lowest level. • Top-down/bottom-up inference performed.