630 likes | 652 Views
This paper delves into the challenges in image parsing, proposing a framework using And-Or Graphs and Stochastic Context-Free Grammar to bridge the semantic gap. Aiming to generalize small samples and synthesize configurations using Monte Carlo simulation to address overlapping parts and ambiguity in images. It introduces image grammar and discusses the formulation, learning, and testing processes, focusing on a new paradigm shift in image processing. The study investigates the vast dataset of Lotus Hill Institute to establish benchmarks for identifying various objects and scenes. Through visual vocabulary and relationships, it explores the synthesis of texture and structure in primal sketches to create high-level image representations.
E N D
Grammar of Image ZhaoyinJia, 03-30-2009
Problems • Enormous amount of vision knowledge: • Computational complexity • Semantic gap …… Classification, Recognition
Objectives in this paper • Framework for vision • And-Or Graph • Algorithm for this framework • Top-down/bottom-up computation • Generalization of small sample • Use Monte Carlos simulation to synthesis more configurations • Fill the semantic gap
Grammar • Language: co-occurance of s is more than chance • Image: Parallel; T-junction CONSTANTINOPLE
Formulation of grammar • Start symbol: S • Non-terminal nodes: VN • Reproduction Rule: R • Terminal nodes: VT
Formulation of grammar • Start symbol: S • Non-terminal nodes: VN • Reproduction Rule: R • Terminal nodes: VT
Formulation of grammar • Start symbol: S • Non-terminal nodes: VN • Reproduction Rule: R • Terminal nodes: VT S NP VP VP VP PP VP V NP ……
Formulation of grammar • Start symbol: S • Non-terminal nodes: VN • Reproduction Rule: R • Terminal nodes: VT
Formulation of grammar • Start symbol: S • Non-terminal nodes: VN • Reproduction Rule: R • Terminal nodes: VT
Image grammar • Start symbol: S • Reproduction Rules • Non-terminal nodes: VN • Terminal nodes: VT
Overlapping parts/Ambiguity • Similar color, occlusion, etc.
Stochastic Context Free Grammar • For each VN , we have reproduction rules: with a probability associated with each one: • Probability of parsing tree: • Probability of sentence:
Stochastic Grammar with Context • From left to right: bi-gram model (Markov chain) a sentence with n words: • Non-local relations: tree model
New issues in Image Grammar • Loss of “left to right” order: region adjacency graph
New issues in Image Grammar • Scaling makes different terminal in parsing tree
New issues in Image Grammar • Switch between texture and structure
Building the image grammar • Visual Vocabulary: primitives, sketch graph, textons… • Relations and configurations: co-occurance, attached, hinged, supported, occluded… • And-or Graph representation embedding image grammar • Learning /testing the parse graph find the possible inference
Database • Lotus Hill Institute Dataset • 636,748 images, 3,927,130 Physical Objects • A few hundred are free Benjamin Yao, Xiong Yang, and Song-Chun Zhu, “Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks.” EMMCVPR, 2007 http://www.imageparsing.com/
Free Data http://yoshi.cs.ucla.edu/yao/data/ • 6 categories, 145 subsets Manmade Object 75 Nature Object 40 Objects in Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport Activity 10 • Outline & segmentation of the object
Free Data http://yoshi.cs.ucla.edu/yao/data/ • 6 categories, 145 subsets Manmade Object 75 Nature Object 40 Objects in Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport Activity 10 • Segmentation of a scene (street)
Free Data http://yoshi.cs.ucla.edu/yao/data/ • 6 categories, 145 subsets Manmade Object 75 Nature Object 40 Objects in Scene 6 Transportation 9 UCLA Aerial Image 5 UIUC Sport Activity 10 • Physical parts of the object
Visual Vocabulary • The “Lego Land” • Language
Visual Vocabulary • : function of image primitives : a) geometry transformation b) appearance • : bond between each primitives
Visual Vocabulary • Sketch and Texture S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997
Primal sketch model Sketch graph Input image Texture pixels C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of International Conference on Computer Vision,2003.
Primal sketch model C. E. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating texture and structure,” in Proceedings of International Conference on Computer Vision,2003.
High level visual vocabulary • Cloth: collar, left/right sleeves, hands H. Chen, Z. J. Xu, Z. Q. Liu, and S. C. Zhu, “Composite templates for cloth modeling and sketching,” in Proceedings of IEEE Conference on Pattern Recognition and Computer Vision, New York, June 2006
Relations and configurations • Definition of relation: bonds: relations: , : structure, : compatibility • Three types of relations • Bonds and connections • Joints and junctions • Object interactions/semantics • Definition of configurations:
Relations • Bonds and connections connects primitives into bigger graphs intensity/color compatibility
Relations • Joint and junctions
Relations • Object interactions
Configuration • Spatial layout of entities at a certain level Primal sketch – parts – object – scene
Reconfigurable graphs • Treat bonds as random variables: address nodes
Inference of the configuration • Have the primal sketch of the image • Detect the ‘T-junction’ • Simulated annealing to infer the Gestalt Law Red dot: connect region Black line: known edge Green line: inferred connection R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
Reconfigurable graphs • Layer extraction Inferred connection Source image T-junction Ru-Xin Gao1, Tian-Fu Wu, Song-Chun Zhu, and Nong Sang, “Bayesian Inference for Layer Representation with Mixed Markov Random Field ”
Reconfigurable graphs R. X. Gao and S. C. Zhu, “From primal sketch to 2.1D sketch,” Technical Report, Lotus Hill Institute, 2006
And-Or Graph • Parse graph of the image pt: parse tree of vocabulary E: relations • Inference the parse graph: Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.
And-Or Graph • Contain all the valid parse graphs • And node, Or node, leaf-node • Relation between children of And node • Parse tree: assigning label on Or node Z. J. Xu, L. Lin, T. F. Wu, and S. C. Zhu, “Recursive top-down/bottom up algorithm for object recognition,” Technical Report, Lotus Hill Research Institute, 2007.
And-Or Graph • Definition: • image primitives • relations at all level • : probability model defined on the And-Or graph • : valid configuration of terminal nodes
Stochastic Model on And-Or graph • Terminal (leaf) node: • And-Or node: • Set of links: • Switch variable at Or-node: • Attributes of primitives:
Stochastic Model on And-Or graph • Terminal (leaf) node: • And-Or node: • Set of links: • Switch variable at Or-node: • Attributes of primitives: SCFG: weigh the frequency at the children of or-nodes
Stochastic Model on And-Or graph • Terminal (leaf) node: • And-Or node: • Set of links: • Switch variable at Or-node: • Attributes of primitives: Weigh the local compatibility of primitives (geometric and appearance)
Stochastic Model on And-Or graph • Terminal (leaf) node: • And-Or node: • Set of links: • Switch variable at Or-node: • Attributes of primitives: Spatial and appearance between primitives (parts or objects)
Learning And-Or Graph • Learning the vocabulary • Learning the relation set R, given • Learning the parameters , given R and
Learning And-Or Graph • Learning the vocabulary , and hierarchic And-Or Graph • Learning the relation set R, given • Learning the parameters , given R and Discussed in the paper
Learning And-Or Graph • Learning and Pursuing Relation Set R: • Start from Stochastic Context Free Graph (a) • Learn the relations that maximally reduce the KL divergence to the observation (b-e) Observation: Learning model: J. Porway, Z. Y. Yao, and S. C. Zhu, “Learning an And–Or graph for modeling and recognizing object categories,” Technical Report, Department of Statistics,2007
Learning And-Or Graph • Learning graph parameter • Approximating to • Similar to texture synthesis S. C. Zhu, Y. N. Wu, and D. B. Mumford, “Minimax entropy principle and its applications to texture modeling,” Neural Computation, vol. 9, no. 8, pp. 1627–1660, November 1997
Case I: Rectangle • Nodes: Rectangle • Two vanishing points, four edge direction • Rules: F. Han and S. C. Zhu, “Bottom-up/top-down image parsing by attribute graph grammar”. Proceedings of International Conference on Computer Vision, Beijing,China, 2005.