420 likes | 431 Views
Explore the powerful image recognition techniques of SIFT (Scale-Invariant Feature Transform) and Spatial Pyramid Matching. Learn how SIFT generates distinct image keypoints invariant to scaling, rotation, and illumination changes, while Spatial Pyramid Matching enhances feature matching with multiscale subregion histograms. Discover how these methods revolutionize scene recognition with their robustness to transformations and geometric correspondence.
E N D
SIFT(Lowe 99)&Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories(Lazebnik et al 2006)(various slides stolen from the web) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA
Scale-Invariant Feature Transform SIFT [Lowe] • Generates image features, “keypoints” • invariant to image scaling and rotation • partially invariant to change in illumination and 3D camera viewpoint • many can be extracted from typical images • highly distinctive
Algorithm Stages • Scale-space Extrema Detection • Uses difference-of-Gaussian function • Keypoint Localization • Sub-pixel location and scale fit to a model • Orientation assignment • 1 or more for each keypoint • Keypoint descriptor • Created from local image gradients
Difference Of Gaussian Pyramid Blur &Resample A B A B
Extrema Detection • Keypoint must be a minima or maxima of its 8 neighbors at it’s scale and the 9 neighbors above and 9 below.
Keypoint Localization and Refinement • Refine keypoint/extrema position fitting a 3D quadratic model to get subpixel accuracy of x,y position and scale. • Throw out points that have low contrast • Remove points that are too “edgy”.
Orientation Assignment • Create histogram of local gradient directions computed at selected scale • Assign canonical orientation at peak of smoothed histogram • Each keypoint specifies stable 2D coordinates (x, y, scale, orientation)
SIFT Descriptor • Try to mimic complex cells in the visual cortex • Selective to spatial frequency and orientation but allows for shifts in position • Be robust to small affine transformations • Local affine transformations affect positions more than orientation and spatial frequency.
SIFT Descriptor • Thresholded image gradients are sampled over 16x16 array of locations at keypoint scale • Create array of orientation histograms rotated relative to orientation of keypoint. • 8 orientations x 4x4 histogram array = 128 dimensions • Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)
SIFT Review • Generates image features, “keypoints” • invariant to image scaling and rotation • partially invariant to change in illumination and 3D camera viewpoint • many can be extracted from typical images • Each “keypoint” has an associated descriptor that is • Relative to keypoint orientation and scale • Is robust to small affine transformations.
SIFT Review • Note: • We can skip the keypoint detection. • Pick a grid over the image and make descriptor for each point. • Fei Fe and Perona (CVPR 2005) showed this works better for scene classification.
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories(Lazebnik et. al 2006)Many slides borrowed from http://www.ima.umn.edu/2005-2006/W5.22-26.06/activities/Lazebnik-Svetlana/ima_poster.pdfandhttp://people.csail.mit.edu/kgrauman/slides/pyr_match_iccv2005.ppt
Overview • Adds “approximate global geometric correspondence” to “bag of features” techniques for scene recognition • Spatial pyramid matching partitions the image into multiscale subregions and computes feature histograms. • Use “weak-features” (orientated edges at multiple scales) and “strong-features” (Vocabulary formed by gridded SIFT descriptors)
Motivation • A “pre-attentive” approach: Recognize scene as whole without examining its constituent objects.
Images as collections of features • Image as unordered set of d-dimensional feature vectors • Varying number of vectors per instance
Classifiers (hand wavy) • Training data: multiple images for each class • Image is represented by unordered set of features • We need some way to compare feature set X to feature set Y. • Some similarity function K(X,Y).
Classifiers (hand wavy) • Nearest neighbor: Input X, • find Y that maximizes K(X,Y) for all Y in the training set. • Label X with the class label for Y. • SVM: use K(X,Y) as kernel function • Inner product • Mercer Kernel
Partial matching Compare sets by computing a partialmatching between their features.
Computing the partial matching • Earth Mover’s Distance [Rubner, Tomasi, Guibas 1998] • Hungarian method [Kuhn, 1955] • Greedy matching … • Pyramid match [Grauman and Darrell, ICCV 2005] for sets with features of dimension
Pyramid match overview • Place multi-dimensional, multi-resolution grid over point sets • Consider points matched at finest resolution where they fall into same grid cell • Approximate optimal similarity with worst case similarity within pyramid cell Pyramid match measures similarity of a partial matching between two sets: No explicit search for matches!
Pyramid match overview optimal partial matching
Pyramid Match • d dimensional feature vectors • A sequence of grids at resolutions 0 … L • At level l d=2, L=2
Pyramid match Kernel • Matches at level l include matches at level l +1 • New matches at level l (for l=0…L-1) • Penalize easy matches at larger scales with weight: • Match kernel
Vocabulary of M features • Only features of the same type can be matched. • Each channel m treated separately
Spatial pyramid representation d=2 (x,y) M classes of features