600 likes | 820 Views
CS 636 Computer Vision. Statistical Object Recognition. Nathan Jacobs. Slides adapted from Lazebnik. Administrivia. Project 4 Final Project Next Class is Cancelled. Overview. Statistical Recognition generative vs. discriminative learning Bag of Features Models. Statistical Recognition.
E N D
CS 636 Computer Vision Statistical Object Recognition Nathan Jacobs Slides adapted from Lazebnik
Administrivia • Project 4 • Final Project • Next Class is Cancelled
Overview • Statistical Recognition • generative vs. discriminative learning • Bag of Features Models
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman
Steps for statistical recognition • Representation • Specify the model for an object category • Bag of features, part-based, global, etc. • Learning • Given a training set,find the parameters of the model • Generative vs. discriminative • Recognition • Apply the model to a new test image
Object categorization: the statistical viewpoint • MAP decision: vs.
posterior likelihood prior Object categorization: the statistical viewpoint • MAP decision: vs. • Bayes rule:
posterior likelihood prior Object categorization: the statistical viewpoint • Discriminative methods: model posterior • Generative methods: model likelihood and prior
Discriminative methods • Direct modeling of Decisionboundary Zebra Non-zebra
Generative methods • Model and
Generative vs. discriminative learning Generative Discriminative Posterior probabilities Class densities
Generative vs. discriminative methods • Generative methods + Can sample from them / compute how probable any given model instance is + Can be learned using images from just a single category – Sometimes we don’t need to model the likelihood when all we want is to make a decision • Discriminative methods + Efficient + Often produce better classification rates – Require positive and negative training data – Can be hard to interpret
Generalization • How well does a learned model generalize from the data it was trained on to a new test set? • Underfitting: model is too “simple” to represent all the relevant class characteristics • High training error and high test error • Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data • Low training error and high test error • Occam’s razor: given two models that represent the data equally well, the simpler one should be preferred
Occam’s razor: why is it a useful heuristic? 1NN 5NN logistic regression (x, y, x2, y2) logistic regression (x, y, sqrt(x2+y2))
Supervision • Images in the training set must be annotated with the “correct answer” that the model is expected to produce Contains a motorbike
Fully supervised “Weakly” supervised Unsupervised Definition depends on task
Face Recognition • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and Simile Classifiers for Face Verification,"ICCV 2009.
Face Recognition Attributes for training Similes for training • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and Simile Classifiers for Face Verification,"ICCV 2009.
Face Recognition Results on Labeled Faces in the Wild Dataset • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and Simile Classifiers for Face Verification,"ICCV 2009.
What task? • Classification • Object present/absent in image • Background may be correlated with object • Localization / Detection • Localize object within the frame • Bounding box or pixel-level segmentation
Datasets • Circa 2001: 5 categories, 100s of images per category • Circa 2004: 101 categories • Today: thousands of categories, tens of thousands of images
Caltech 101 & 256 http://www.vision.caltech.edu/Image_Datasets/Caltech101/ http://www.vision.caltech.edu/Image_Datasets/Caltech256/ Griffin, Holub, Perona, 2007 Fei-Fei, Fergus, Perona, 2004
The PASCAL Visual Object Classes Challenge (2005-2009) http://pascallin.ecs.soton.ac.uk/challenges/VOC/ 2008 Challenge classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle:aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
The PASCAL Visual Object Classes Challenge (2005-2009) • Main competitions • Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image • Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image http://pascallin.ecs.soton.ac.uk/challenges/VOC/
The PASCAL Visual Object Classes Challenge (2005-2009) • “Taster” challenges • Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise • Person layout: Predicting the bounding box and label of each part of a person (head, hands, feet) http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Lotus Hill Research Institute image corpus http://www.imageparsing.com/ Z.Y. Yao, X. Yang, and S.C. Zhu, 2007
Labeling with games http://www.gwap.com/gwap/ L. von Ahn, L. Dabbish, 2004; L. von Ahn, R. Liu and M. Blum, 2006
LabelMe http://labelme.csail.mit.edu/ Russell, Torralba, Murphy, Freeman, 2008
80 Million Tiny Images http://people.csail.mit.edu/torralba/tinyimages/
Dataset issues • How large is the degree of intra-class variability? • How “confusable” are the classes? • Is there bias introduced by the background? I.e., can we “cheat” just by looking at the background and not the object?
Steps for statistical recognition • Representation • Specify the model for an object category • Bag of features, part-based, global, etc. • Learning • Given a training set,find the parameters of the model • Generative vs. discriminative • Recognition • Apply the model to a new test image
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
Overview: Bag-of-features models • Origins and motivation • Image representation • Discriminative methods • Nearest-neighbor classification • Support vector machines • Generative methods • Naïve Bayes • Probabilistic Latent Semantic Analysis • Extensions: incorporating spatial information
Origin 1: Texture recognition • Texture is characterized by the repetition of basic elements or textons • For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 1: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)
Bags of features for image classification • Extract features
Bags of features for image classification • Extract features • Learn “visual vocabulary”
Bags of features for image classification • Extract features • Learn “visual vocabulary” • Quantize features using visual vocabulary
Bags of features for image classification • Extract features • Learn “visual vocabulary” • Quantize features using visual vocabulary • Represent images by frequencies of “visual words”
1. Feature extraction • Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005
1. Feature extraction • Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005 • Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005
1. Feature extraction • Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005 • Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 • Other methods • Random sampling (Vidal-Naquet & Ullman, 2002) • Segmentation-based patches (Barnard et al. 2003)
1. Feature extraction ComputeSIFT descriptor [Lowe’99] Normalizepatch Detectpatches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic
… 1. Feature extraction
… 2. Learning the visual vocabulary