1 / 42

Recognizing Natural Scenes with SIFT and Spatial Pyramid Matching

Explore the powerful image recognition techniques of SIFT (Scale-Invariant Feature Transform) and Spatial Pyramid Matching. Learn how SIFT generates distinct image keypoints invariant to scaling, rotation, and illumination changes, while Spatial Pyramid Matching enhances feature matching with multiscale subregion histograms. Discover how these methods revolutionize scene recognition with their robustness to transformations and geometric correspondence.

griffithp
Download Presentation

Recognizing Natural Scenes with SIFT and Spatial Pyramid Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIFT(Lowe 99)&Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories(Lazebnik et al 2006)(various slides stolen from the web) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

  2. Scale-Invariant Feature Transform SIFT [Lowe] • Generates image features, “keypoints” • invariant to image scaling and rotation • partially invariant to change in illumination and 3D camera viewpoint • many can be extracted from typical images • highly distinctive

  3. Algorithm Stages • Scale-space Extrema Detection • Uses difference-of-Gaussian function • Keypoint Localization • Sub-pixel location and scale fit to a model • Orientation assignment • 1 or more for each keypoint • Keypoint descriptor • Created from local image gradients

  4. Scale Space

  5. Difference Of Gaussian Pyramid Blur &Resample A B A B

  6. Difference Of Gaussian Pyramid A- B

  7. Extrema Detection • Keypoint must be a minima or maxima of its 8 neighbors at it’s scale and the 9 neighbors above and 9 below.

  8. Extrema Detection

  9. Keypoint Localization and Refinement • Refine keypoint/extrema position fitting a 3D quadratic model to get subpixel accuracy of x,y position and scale. • Throw out points that have low contrast • Remove points that are too “edgy”.

  10. Keypoint Localization and Refinement

  11. Keypoint Localization and Refinement

  12. Orientation Assignment • Create histogram of local gradient directions computed at selected scale • Assign canonical orientation at peak of smoothed histogram • Each keypoint specifies stable 2D coordinates (x, y, scale, orientation)

  13. Example from paper

  14. SIFT Descriptor • Try to mimic complex cells in the visual cortex • Selective to spatial frequency and orientation but allows for shifts in position • Be robust to small affine transformations • Local affine transformations affect positions more than orientation and spatial frequency.

  15. SIFT Descriptor • Thresholded image gradients are sampled over 16x16 array of locations at keypoint scale • Create array of orientation histograms rotated relative to orientation of keypoint. • 8 orientations x 4x4 histogram array = 128 dimensions • Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)

  16. 3D object recognition example from paper

  17. SIFT Review • Generates image features, “keypoints” • invariant to image scaling and rotation • partially invariant to change in illumination and 3D camera viewpoint • many can be extracted from typical images • Each “keypoint” has an associated descriptor that is • Relative to keypoint orientation and scale • Is robust to small affine transformations.

  18. SIFT Review • Note: • We can skip the keypoint detection. • Pick a grid over the image and make descriptor for each point. • Fei Fe and Perona (CVPR 2005) showed this works better for scene classification.

  19. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories(Lazebnik et. al 2006)Many slides borrowed from http://www.ima.umn.edu/2005-2006/W5.22-26.06/activities/Lazebnik-Svetlana/ima_poster.pdfandhttp://people.csail.mit.edu/kgrauman/slides/pyr_match_iccv2005.ppt

  20. Overview • Adds “approximate global geometric correspondence” to “bag of features” techniques for scene recognition • Spatial pyramid matching partitions the image into multiscale subregions and computes feature histograms. • Use “weak-features” (orientated edges at multiple scales) and “strong-features” (Vocabulary formed by gridded SIFT descriptors)

  21. Motivation • A “pre-attentive” approach: Recognize scene as whole without examining its constituent objects.

  22. Images as collections of features • Image as unordered set of d-dimensional feature vectors • Varying number of vectors per instance

  23. Classifiers (hand wavy) • Training data: multiple images for each class • Image is represented by unordered set of features • We need some way to compare feature set X to feature set Y. • Some similarity function K(X,Y).

  24. Classifiers (hand wavy) • Nearest neighbor: Input X, • find Y that maximizes K(X,Y) for all Y in the training set. • Label X with the class label for Y. • SVM: use K(X,Y) as kernel function • Inner product • Mercer Kernel

  25. Partial matching Compare sets by computing a partialmatching between their features.

  26. Computing the partial matching • Earth Mover’s Distance [Rubner, Tomasi, Guibas 1998] • Hungarian method [Kuhn, 1955] • Greedy matching … • Pyramid match [Grauman and Darrell, ICCV 2005] for sets with features of dimension

  27. Pyramid match overview • Place multi-dimensional, multi-resolution grid over point sets • Consider points matched at finest resolution where they fall into same grid cell • Approximate optimal similarity with worst case similarity within pyramid cell Pyramid match measures similarity of a partial matching between two sets: No explicit search for matches!

  28. Pyramid match overview optimal partial matching

  29. Pyramid Match • d dimensional feature vectors • A sequence of grids at resolutions 0 … L • At level l d=2, L=2

  30. Pyramid match Kernel • Matches at level l include matches at level l +1 • New matches at level l (for l=0…L-1) • Penalize easy matches at larger scales with weight: • Match kernel

  31. Vocabulary of M features • Only features of the same type can be matched. • Each channel m treated separately

  32. Vocabulary of M features

  33. Spatial pyramid representation d=2 (x,y) M classes of features

  34. Feature Extraction

  35. Experimental Results

  36. Scene Category Dataset

  37. Scene Category Retrieval

  38. Scene Category Confusion

  39. Caltech 101

  40. Caltech 101 Comparision

  41. Caltech 101 Challenges

  42. Gratz

More Related