220 likes | 486 Views
Part 1: Classical Image Classification Methods. Kai Yu Dept. of Media Analytics NEC Laboratories America. Andrew Ng Computer Science Dept. Stanford University. Outline of Part 2. Local Features, Sampling, Visual Words Discriminative Methods Bag-of-Words (BoW) representation
E N D
Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University
Outline of Part 2 • Local Features, Sampling, Visual Words • Discriminative Methods • Bag-of-Words (BoW) representation • Spatial pyramid matching (SPM) • Generative Methods • Part-based methods • Topic models
Outline of Part 2 • Local Features, Sampling, Visual Words • Discriminative Methods • Bag-of-Words (BoW) representation • Spatial pyramid matching (SPM) • Generative Methods • Part-based methods • Topic models
Local features • Distinctive descriptors of local image patches • Invariant to local translation, scale, … • and sometimes rotation or general affine transformations • The most famous choice is the SIFT feature
Sampling local features from images A set of points Image credits: F-F. Li, E. Nowak, J. Sivic
Visual words • Similar points are grouped into one visual word • Algorithms: k-means, agglomerative clustering, … • Points from different images are then more easily compared. Slide credit: Kristen Grauman
Outline of Part 2 • Local Features, Sampling, Visual Words, … • Discriminative Methods • Bag-of-Words (BoW) representation • Spatial pyramid matching (SPM) • Generative Methods • Part-based methods • Topic models
Bag-of-words (BoW) representation Analogy to documents Adapted from tutorial slides by Fei-Fei et al.
BoW for object categorization • Works pretty well for whole-image classification Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005) Slide credit: Svetlana Lazebnik
Unsupervised Dictionary Learning SIFT space R1 R2 R3 image database • Sample local features from images • Run k-mean or other clustering algorithm to get dictionary • Dictionary is also called “codebook”
Compute BoW histogram for each image R1 R1 R2 R2 Assign sift features into clusters R3 R3 Compute the frequency of each cluster within an image BoW histogram representations
Indication of BoW histogram • Summarize entire image based on its distribution of visual word occurrences • Turn bags of different sizes into a fixed length vector • Analogous to bag of words representation commonly used for text categorization.
Image classification based on BoW histogram BoW histogram vector space bird Decision boundary dog • Learn a classification model to determine the decision boundary • Nonlinear SVMs are commonly applied.
Issues • Sampling strategy • Learning codebook: size? supervised?, … • Classification: which method? scalability? • Scalability: how to handle millions of data? • How to use spatial information?
Spatial information • The BoW removes spatial layout. • This increases the invariance to scale, translation, and deformation, • But sacrifices discriminative power, especially when the spatial layout is important. Slide adapted from Bill Freeman
Spatial pyramid matching • Compute BoW for image regions at different locations in various scales Figure credit: Svetlana Lazebnik
A common pipeline for discriminative image classification using BoW Dictionary Learning Image Classification Dense/Sparse SIFT VQ Coding Dense/Sparse SIFT Spatial Pyramid Pooling K-means dictionary Nonlinear SVM
Combining multiple descriptors Multiple Feature Detectors Multiple Descriptors: SIFT, shape, color, … VQ Coding and Spatial Pooling Nonlinear SVM Diagram from SurreyUVA_SRKDA, winner team in PASCAL VOC 2008
Outline of Part 2 • Local Features, Sampling, Visual Words, … • Discriminative Methods • Bag-of-Words (BoW) representation • Spatial pyramid matching (SPM) • Generative Methods • Part-based methods • Topic models
“beach” z c w N D Topic models for images Latent Dirichlet Allocation (LDA) Fei-Fei et al. ICCV 2005 Slide credit Fei-Fei Li
Part-based Model Rob Fergus ICCV09 Tutorial Fischler & Elschlager 1973
For a comprehensive coverage of object categorization models, please visit Recognizing and Learning Object Categories Li Fei-Fei (Stanford), Rob Fergus (NYU), Antonio Torralba (MIT) http://people.csail.mit.edu/torralba/shortCourseRLOC/