1 / 60

Statistical Object Recognition

CS 636 Computer Vision. Statistical Object Recognition. Nathan Jacobs. Slides adapted from Lazebnik. Administrivia. Project 4 Final Project Next Class is Cancelled. Overview. Statistical Recognition generative vs. discriminative learning Bag of Features Models. Statistical Recognition.

alain
Download Presentation

Statistical Object Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 636 Computer Vision Statistical Object Recognition Nathan Jacobs Slides adapted from Lazebnik

  2. Administrivia • Project 4 • Final Project • Next Class is Cancelled

  3. Overview • Statistical Recognition • generative vs. discriminative learning • Bag of Features Models

  4. Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman

  5. Steps for statistical recognition • Representation • Specify the model for an object category • Bag of features, part-based, global, etc. • Learning • Given a training set,find the parameters of the model • Generative vs. discriminative • Recognition • Apply the model to a new test image

  6. Object categorization: the statistical viewpoint • MAP decision: vs.

  7. posterior likelihood prior Object categorization: the statistical viewpoint • MAP decision: vs. • Bayes rule:

  8. posterior likelihood prior Object categorization: the statistical viewpoint • Discriminative methods: model posterior • Generative methods: model likelihood and prior

  9. Discriminative methods • Direct modeling of Decisionboundary Zebra Non-zebra

  10. Generative methods • Model and

  11. Generative vs. discriminative learning Generative Discriminative Posterior probabilities Class densities

  12. Generative vs. discriminative methods • Generative methods + Can sample from them / compute how probable any given model instance is + Can be learned using images from just a single category – Sometimes we don’t need to model the likelihood when all we want is to make a decision • Discriminative methods + Efficient + Often produce better classification rates – Require positive and negative training data – Can be hard to interpret

  13. Generalization • How well does a learned model generalize from the data it was trained on to a new test set? • Underfitting: model is too “simple” to represent all the relevant class characteristics • High training error and high test error • Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data • Low training error and high test error • Occam’s razor: given two models that represent the data equally well, the simpler one should be preferred

  14. Occam’s razor: why is it a useful heuristic? 1NN 5NN logistic regression (x, y, x2, y2) logistic regression (x, y, sqrt(x2+y2))

  15. Supervision • Images in the training set must be annotated with the “correct answer” that the model is expected to produce Contains a motorbike

  16. Fully supervised “Weakly” supervised Unsupervised Definition depends on task

  17. Face Recognition • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and Simile Classifiers for Face Verification,"ICCV 2009.

  18. Face Recognition Attributes for training Similes for training • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and Simile Classifiers for Face Verification,"ICCV 2009.

  19. Face Recognition Results on Labeled Faces in the Wild Dataset • N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and Simile Classifiers for Face Verification,"ICCV 2009.

  20. What task? • Classification • Object present/absent in image • Background may be correlated with object • Localization / Detection • Localize object within the frame • Bounding box or pixel-level segmentation

  21. Datasets • Circa 2001: 5 categories, 100s of images per category • Circa 2004: 101 categories • Today: thousands of categories, tens of thousands of images

  22. Caltech 101 & 256 http://www.vision.caltech.edu/Image_Datasets/Caltech101/ http://www.vision.caltech.edu/Image_Datasets/Caltech256/ Griffin, Holub, Perona, 2007 Fei-Fei, Fergus, Perona, 2004

  23. The PASCAL Visual Object Classes Challenge (2005-2009) http://pascallin.ecs.soton.ac.uk/challenges/VOC/ 2008 Challenge classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle:aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

  24. The PASCAL Visual Object Classes Challenge (2005-2009) • Main competitions • Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image • Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image http://pascallin.ecs.soton.ac.uk/challenges/VOC/

  25. The PASCAL Visual Object Classes Challenge (2005-2009) • “Taster” challenges • Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise • Person layout: Predicting the bounding box and label of each part of a person (head, hands, feet) http://pascallin.ecs.soton.ac.uk/challenges/VOC/

  26. Lotus Hill Research Institute image corpus http://www.imageparsing.com/ Z.Y. Yao, X. Yang, and S.C. Zhu, 2007

  27. Labeling with games http://www.gwap.com/gwap/ L. von Ahn, L. Dabbish, 2004; L. von Ahn, R. Liu and M. Blum, 2006

  28. LabelMe http://labelme.csail.mit.edu/ Russell, Torralba, Murphy, Freeman, 2008

  29. 80 Million Tiny Images http://people.csail.mit.edu/torralba/tinyimages/

  30. Dataset issues • How large is the degree of intra-class variability? • How “confusable” are the classes? • Is there bias introduced by the background? I.e., can we “cheat” just by looking at the background and not the object?

  31. Caltech-101

  32. Steps for statistical recognition • Representation • Specify the model for an object category • Bag of features, part-based, global, etc. • Learning • Given a training set,find the parameters of the model • Generative vs. discriminative • Recognition • Apply the model to a new test image

  33. Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba

  34. Overview: Bag-of-features models • Origins and motivation • Image representation • Discriminative methods • Nearest-neighbor classification • Support vector machines • Generative methods • Naïve Bayes • Probabilistic Latent Semantic Analysis • Extensions: incorporating spatial information

  35. Origin 1: Texture recognition • Texture is characterized by the repetition of basic elements or textons • For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  36. Origin 1: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

  37. Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)

  38. US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)

  39. US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)

  40. US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/ Origin 2: Bag-of-words models • Unordered document representation: frequencies of words from a dictionary Salton & McGill (1983)

  41. Bags of features for image classification • Extract features

  42. Bags of features for image classification • Extract features • Learn “visual vocabulary”

  43. Bags of features for image classification • Extract features • Learn “visual vocabulary” • Quantize features using visual vocabulary

  44. Bags of features for image classification • Extract features • Learn “visual vocabulary” • Quantize features using visual vocabulary • Represent images by frequencies of “visual words”

  45. 1. Feature extraction • Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005

  46. 1. Feature extraction • Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005 • Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005

  47. 1. Feature extraction • Regular grid • Vogel & Schiele, 2003 • Fei-Fei & Perona, 2005 • Interest point detector • Csurka et al. 2004 • Fei-Fei & Perona, 2005 • Sivic et al. 2005 • Other methods • Random sampling (Vidal-Naquet & Ullman, 2002) • Segmentation-based patches (Barnard et al. 2003)

  48. 1. Feature extraction ComputeSIFT descriptor [Lowe’99] Normalizepatch Detectpatches [Mikojaczyk and Schmid ’02] [Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03] Slide credit: Josef Sivic

  49. 1. Feature extraction

  50. 2. Learning the visual vocabulary

More Related