Recognizing Natural Scenes with SIFT and Spatial Pyramid Matching

SIFT(Lowe 99)&Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories(Lazebnik et al 2006)(various slides stolen from the web) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

Scale-Invariant Feature Transform SIFT [Lowe] • Generates image features, “keypoints” • invariant to image scaling and rotation • partially invariant to change in illumination and 3D camera viewpoint • many can be extracted from typical images • highly distinctive

Algorithm Stages • Scale-space Extrema Detection • Uses difference-of-Gaussian function • Keypoint Localization • Sub-pixel location and scale fit to a model • Orientation assignment • 1 or more for each keypoint • Keypoint descriptor • Created from local image gradients

Scale Space

Difference Of Gaussian Pyramid Blur &Resample A B A B

Difference Of Gaussian Pyramid A- B

Extrema Detection • Keypoint must be a minima or maxima of its 8 neighbors at it’s scale and the 9 neighbors above and 9 below.

Extrema Detection

Keypoint Localization and Refinement • Refine keypoint/extrema position fitting a 3D quadratic model to get subpixel accuracy of x,y position and scale. • Throw out points that have low contrast • Remove points that are too “edgy”.

Keypoint Localization and Refinement

Orientation Assignment • Create histogram of local gradient directions computed at selected scale • Assign canonical orientation at peak of smoothed histogram • Each keypoint specifies stable 2D coordinates (x, y, scale, orientation)

Example from paper

SIFT Descriptor • Try to mimic complex cells in the visual cortex • Selective to spatial frequency and orientation but allows for shifts in position • Be robust to small affine transformations • Local affine transformations affect positions more than orientation and spatial frequency.

SIFT Descriptor • Thresholded image gradients are sampled over 16x16 array of locations at keypoint scale • Create array of orientation histograms rotated relative to orientation of keypoint. • 8 orientations x 4x4 histogram array = 128 dimensions • Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)

3D object recognition example from paper

SIFT Review • Generates image features, “keypoints” • invariant to image scaling and rotation • partially invariant to change in illumination and 3D camera viewpoint • many can be extracted from typical images • Each “keypoint” has an associated descriptor that is • Relative to keypoint orientation and scale • Is robust to small affine transformations.

SIFT Review • Note: • We can skip the keypoint detection. • Pick a grid over the image and make descriptor for each point. • Fei Fe and Perona (CVPR 2005) showed this works better for scene classification.

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories(Lazebnik et. al 2006)Many slides borrowed from http://www.ima.umn.edu/2005-2006/W5.22-26.06/activities/Lazebnik-Svetlana/ima_poster.pdfandhttp://people.csail.mit.edu/kgrauman/slides/pyr_match_iccv2005.ppt

Overview • Adds “approximate global geometric correspondence” to “bag of features” techniques for scene recognition • Spatial pyramid matching partitions the image into multiscale subregions and computes feature histograms. • Use “weak-features” (orientated edges at multiple scales) and “strong-features” (Vocabulary formed by gridded SIFT descriptors)

Motivation • A “pre-attentive” approach: Recognize scene as whole without examining its constituent objects.

Images as collections of features • Image as unordered set of d-dimensional feature vectors • Varying number of vectors per instance

Classifiers (hand wavy) • Training data: multiple images for each class • Image is represented by unordered set of features • We need some way to compare feature set X to feature set Y. • Some similarity function K(X,Y).

Classifiers (hand wavy) • Nearest neighbor: Input X, • find Y that maximizes K(X,Y) for all Y in the training set. • Label X with the class label for Y. • SVM: use K(X,Y) as kernel function • Inner product • Mercer Kernel

Partial matching Compare sets by computing a partialmatching between their features.

Computing the partial matching • Earth Mover’s Distance [Rubner, Tomasi, Guibas 1998] • Hungarian method [Kuhn, 1955] • Greedy matching … • Pyramid match [Grauman and Darrell, ICCV 2005] for sets with features of dimension

Pyramid match overview • Place multi-dimensional, multi-resolution grid over point sets • Consider points matched at finest resolution where they fall into same grid cell • Approximate optimal similarity with worst case similarity within pyramid cell Pyramid match measures similarity of a partial matching between two sets: No explicit search for matches!

Pyramid match overview optimal partial matching

Pyramid Match • d dimensional feature vectors • A sequence of grids at resolutions 0 … L • At level l d=2, L=2

Pyramid match Kernel • Matches at level l include matches at level l +1 • New matches at level l (for l=0…L-1) • Penalize easy matches at larger scales with weight: • Match kernel

Vocabulary of M features • Only features of the same type can be matched. • Each channel m treated separately

Vocabulary of M features

Spatial pyramid representation d=2 (x,y) M classes of features

Feature Extraction

Experimental Results

Scene Category Dataset

Scene Category Retrieval

Scene Category Confusion

Caltech 101

Caltech 101 Comparision

Caltech 101 Challenges

Gratz

Recognizing Natural Scenes with SIFT and Spatial Pyramid Matching

Recognizing Natural Scenes with SIFT and Spatial Pyramid Matching

Presentation Transcript

Optimal Stopping

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A A A

STAT 110 - Section 5 Lecture 23

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A

Positioning Chapter 9

STAT 110 - Section 5 Lecture 23

Cosmological Inflation

Positioning Chapter 9

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

The Best Algorithms are Randomized Algorithms

MAC Theory Chapter 7

Geo-Routing Chapter 2

All individuals are NOT created equal

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A

CSE 311 Foundations of Computing I

TexPoint fonts used in EMF.

TexPoint fonts used in EMF.

Positioning Chapter 10

Optimal Stopping

TexPoint fonts used in EMF.