Grammars in computer vision

Grammars in computer vision Presented by: Thomas Kollar Slides courtesy of Song-Chun Zhu

Context in computer vision Outside the object (contextual features) Inside the object (intrinsic features) Object size Pixels Parts Global appearance Global context Local context Kruppa & Shiele, (03), Fink & Perona (03) Carbonetto, Freitas, Barnard (03), Kumar, Hebert, (03) He, Zemel, Carreira-Perpinan (04), Moore, Essa, Monson, Hayes (99) Strat & Fischler (91), Torralba (03), Murphy, Torralba & Freeman (03) Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, (03) Heisele, et al, (01), Agarwal & Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99) Etc.

Guzman (SEE), 1968 Noton and Stark 1971 Hansen & Riseman (VISIONS), 1978 Barrow & Tenenbaum 1978 Brooks (ACRONYM), 1979 Marr, 1982 Ohta & Kanade, 1978 Yakimovsky & Feldman, 1973 Why grammars? [Ohta & Kanade 1978]

Why grammars?

Which papers? • F. Han and S.C. Zhu, Bottom-up/Top-down Image Parsing with Attribute Grammar, 2005. • ZijianXu; A hierarchical compositional model for representation and sketching of high-resolution human images, PhD Thesis 2007. • Song-Chun Zhu and David Mumford; A stochastic grammar of images, 2007. • L. Lin, S. Peng, J. Porway, S.C. Zhu, and Y. Wang, An empirical study of object category recognition: sequential testing with generalized samples, 2007.

Datasets

Large-scale image labeling

Our Goal:

Three projects using and-or graphs • Modeling an environment with rectangles. • Creating sketches

Commonalities • Use context sensitive grammars Called And-Or graphs in these papers Provides top-down and bottom-up influence Most are generative all the way to the pixel level • Configuration matters E.g. they don’t assume independence given the parent These can take the form of a MRF

Challenges • Objects have large within-category variations • Scenes have variation

Challenges • Describing people has variation

Grammar definition

And-or graphs

Modeling with rectangles

Six production rules

Two examples

Three phases • Bottom-up detection • Compute edge segments and a number of vanishing points. These vanishing points are grouped into a line set and rectangle hypotheses are found using RANSAC, generating a number of rectangles from a bottom up proposal. • Initialize the terminal nodes greedily • Pick the most promising hypotheses with heaviest weight by increase in posterior probability. • Incorporate top-down influence • Each step of the algorithm picks the most promising proposal among the 5 candidate rules by increase in posterior probability. • When a new non-terminal node is accepted (1) insert and create a new proposal (2) reweight the proposals (3) pass attributes between the node and parent.

Probability Models • p(C_free) follows the primal sketch model. • p(G) is the probability of the parse tree • p(I | G) is the reconstruction likelihood

Probability Models • p(l) is the probability of a rule • p(n | l) is the probability of the number of components given the type of rule. • p(X | l, n) is the probability of the geometry of A. • p(X(B) | X(A)) ensures regularities between the geometries (e.g. that aligned rectangles have almost the same shape). e.g. each square should look reasonable e.g. for the line rule, enforce that everything lines up

Probability Models • Primal sketch model

Inference: bottom-up detection of rectangles • RANSAC is run to propose a number of rectangles using vanishing points

Inference: initialize terminal nodes • Input: candidate set of rectangles from previous phase • Output: a set of non-terminal nodes representing rectangles • While(not done): • re-compute weights • Greedily select the rectangle with the highest weight • Create a new non-terminal node in the grammar

Inference: initialize terminal nodes • Input: non-terminal rectangles from previous step • Output: a parse graph • While (not done): • re-compute weights • Greedily select the highest weight candidate rule • Add rule to parse graph along with any top-down predictions. • Weights are computed similarly to before.

Example of top-down/bottom-up inference

Results

ROC curve

Generating sketches • Additional semantics

Challenges • Geometric deformations clothes are very flexible • Photometric variabilities large variety of colors, shading and texture • Topological configurations combinatorial number of clothes designs

Decomposing a sketch

And-Or graph • “In a computing and recognition phase, we first activate some sub-templates in a bottom-up step. For example, we can detect the face and skin color to locate the coarse position of some components, which help to predict the positions of other components by context.”

Sketch sub-parts

Example grammar

Sub-templates

Probability model

Overview of the algorithm

Sketch results

Conclusions • Grammar-based model was presented for generating sketches. • Markov random fields at lowest level. • Top-down/bottom-up inference performed.

Grammars in computer vision

Grammars in computer vision

Presentation Transcript

Computer Vision

Computer Vision

Computer Vision

Computer Vision

Motion in Computer Vision

Computer Vision

Computer Vision

Challenges in Computer Vision

Computer Vision

Application in Computer Vision

Attention in Computer Vision

Computer Vision

Computer Vision

Computer Vision

Computer Vision

Application in Computer Vision

Computer Vision

Computer Vision

Computer Vision