Recursive Composition in Computer Vision

Recursive Composition in Computer Vision Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba

Ideas behind Recursive Composition Pattern Theory. Grenander 94 Compositionality. Geman 02, 06 Stochastic Grammar. Zhu and Mumford 06 • How to deal with image complexity • A general framework for different vision tasks • Rich representation and tractable computation

Recursive Composition • Representation • Recursive Compositional Models (RCMs) • Inference • Recursive Optimization • Learning • Supervised Parameter Estimation • Unsupervised Recursive Dictionary Learning • RCM-1: Deformable Object • RCM-2: Articulated Object • RCM-3: Scene (Entire Image)

Model Deformable Object • Flat MRF • Nodes: object parts • Edges: spatial relations • Limitations: • Short range interaction • Sparse

Recursive Composision

Recursive Compositional Models:RCM-1 x: image y: (position, scale, orientation) graph=(nodes, edges) a: index of node b: child of a f: appearances on node a g: potentials on edges (a,b)

RCM-1: the Recursive Formula Recursion x: image ; y: (position, scale, orientation); Vertical independency; Self-similarity;

Recursive Composition • Representation • Recursive Compositional Models (RCMs) • Inference • Recursive Optimization • Learning • Supervised Parameter Estimation • Unsupervised Recursive Dictionary Learning

Polynomial-time Inference Recursion • Polynomial-time Complexity: Inference task: Recursive Optimization:

Supervised Learning Collins 02. Taskar et al. 04 • Supervised learning • Perceptron algorithm (MLE, max margin – svm) • Parameter estimation needs fast inference.

Supervised learning by Perceptron Algorithm where Inference is critical for learning • Goal: • Input: a set of training images with ground truth . Initialize parameter vector. • Training algorithm (Collins 02): Loop over training samples: i = 1 to N Step 1: find the best using inference: Step 2: Update the parameters: End of Loop.

Recursive Composition • Representation • Recursive Compositional Models (RCMs) • Inference • Recursive Optimization (Polynomial-time) • Learning • Supervised Parameter Estimation • RCM-1: Deformable Object

RCM-1: Multi-level Potentials = * [ Gabor, Edge, …] Potentials for appearance

RCM-1: Multi-level Potentials (position, scale, orientation) • Potentials for shape: triplet descriptors

The Inference Results after Supervised Learning

Segmentation Results

Evaluations: Segmentation and Parsing • Segmentation (Accuracy of pixel labeling) • The proportion of the correct pixel labels (object or non-object) • Parsing (Average Position Error of matching) • The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree

Recursive Composition • Modeling: (Representation) • Recursive Compositional Models (RCMs) • Inference: (Computing) • Recursive Optimization (Polynomial-time) • Learning: • Supervised Parameter Estimation • Unsupervised Recursive Learning • RCM-1: deformable object

Unsupervised Learning Correspondence is unknown ? Combinatorial Explosion problem • Task: given 10 training images, no labeling, no alignment, highly ambiguous features. • Induce the structure (nodes and edges) • Estimate the parameters.

Recursive Dictionary Learning Recursion Barlow 94. • Multi-level dictionary (layer-wise greedy) • Bottom-Up and Top-Down recursive procedure • Three Principles: • Recursive Composition • Suspicious Coincidence • Competitive Exclusion

10 images for training

Bottom-up Learning Suspicious Coincidence Clustering Composition Competitive Exclusion

The Dictionary: From Generic Parts to Object Structures Unified representation (RCMs) and learning Bridge the gap between the generic features and specific object structures

Dictionary Size, Part Sharing and Computational Complexity More Sharing

Top-down refinement Fill in missing parts Examine every node from top to bottom

Evaluations of Unsupervised Learning

Scale up the System: Issue I More classes/viewpoints -> more training/detection cost

Scale up the System: Issue II No enough data for rare viewpoints/classes

Our Strategy Joint multi-class multi-view learning Appearance sharing Part sharing

Joint Multi-Class Multi-View Learning 120 templates: 5 viewpoints & 26 classes

Different Viewpoints Share same appearance

Different Classes Share Common Parts

Compact Hierarchical Dictionary

Dense Part Sharing at Low Levels: Layer-2

Less Part Sharing: Layer-3

Sparse Part Sharing at High Levels: Layer-4

Re-usable Parts: All Layers

The more classes/viewpoints, the more amount of part sharing

Multi-View Single Class Performance

Recursive Composition • Representation • Recursive Compositional Models (RCMs) • Inference • Recursive Optimization (Polynomial-time) • Learning • Supervised Parameter Estimation • RCM-1: Deformable Object • RCM-2: Articulated Object

RCM-2 for Articulated Object: Horses multiple poses y=(switch, position, scale, orientation) Composition Switch

RCM-2 for Human Body

Recursive Composition • Representation • Recursive Compositional Models (RCMs) • Inference • Recursive Optimization (Polynomial-time) • Learning • Supervised Parameter Estimation • RCM-1: Deformable Object • RCM-2: Articulated Object • RCM-3: Scene (Entire Image)

Image Scene Parsing Task: Image Segmentation andLabeling

Recursive Composition in Computer Vision

Recursive Composition in Computer Vision

Presentation Transcript

Computer Vision

Computer Vision

Computer Vision

Computer Vision

Motion in Computer Vision

Computer Vision

Computer Vision

Challenges in Computer Vision

Computer Vision

Application in Computer Vision

Attention in Computer Vision

Computer Vision

Computer Vision

Computer Vision

Computer Vision

Application in Computer Vision

Computer Vision

Computer Vision

Computer Vision