1 / 81

Representation, Description and Matching

Explore the Scale Invariant Feature Transform (SIFT) algorithm for matching corresponding image regions. Learn about keypoint detection, feature extraction, and descriptor computation techniques for robust matching.

acoughlin
Download Presentation

Representation, Description and Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representation, Description and Matching CS485/685 Computer Vision Dr. George Bebis

  2. Matching • Given covariant region descriptors, what information should we use for matching corresponding regions in different images? ?

  3. Simplest approach: correlation • Directly compare intensities using “sum of squared differences” • or “normalized cross-correlation”

  4. Simplest approach: correlation (cont’d) Works satisfactorily when we matching corresponding regions related mostly by translation. e.g., stereo pairs, video sequence assuming small camera motion

  5. Simplest approach: correlation (cont’d) • Sensitive to small variations with respect to: • Location • Pose • Scale • Intra-class variability • Poorly distinctive! • We will discuss a powerful descriptor called SIFT

  6. Eliminate rotational ambiguity Compute appearancedescriptors Extract affine regions Normalize regions SIFT (Lowe ’04) Region Detection Steps: Review Feature Extraction

  7. Scale Invariant Feature Transform (SIFT) • Remember how we resolved the orientation ambiguity? Eliminate rotational ambiguity Compute appearancedescriptors Extract affine regions Normalize regions ? SIFT (Lowe ’04)

  8. Scale Invariant Feature Transform (SIFT) p 2 0 • Find dominant gradient direction using the histograms of gradient direction. Dominant direction of gradient (36 bins)

  9. Scale Invariant Feature Transform (SIFT) • Same theory, except that we use 16histograms (8 bins each). 16 histograms x 8 orientations = 128 features Main idea: • Take a 16 x16 window around detected interest point • Divide into a 4x4 grid of cells • Compute histogram in each cell

  10. Properties of SIFT • Highly distinctive! • A single feature can be correctly matched with high probability against a large database of features from many images. • Scale and rotation invariant. • Partially invariant to 3D camera viewpoint • Can tolerate up to about 60 degree out of plane rotation • Can be computed fast and efficiently

  11. Properties of SIFT (cont’d) http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT Partially invariant to changes in illumination

  12. SIFT – Main Steps (1) Scale-space extrema detection • Extract scale and rotation invariant interest points (i.e., keypoints). (2) Keypoint localization • Determine location and scale for each interest point. (3) Orientation assignment • Assign one or more orientations to each keypoint. (4) Keypoint descriptor • Use local image gradients at the selected scale. D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60(2):91-110, 2004. Cited 9589 times (as of 3/7/2011)

  13. Scale-space Extrema Detection Harris-Laplacian Find local maxima of: Harris detector in space LoG in scale scale  LoG  y x  Harris  scale • SIFT • Find local maxima of: • DoG in space • DoG in scale  DoG  y x  Hessian  • σn =knσ0 • (k=2)

  14. 1. Scale-space Extrema Detection (cont’d) • DoG images are grouped by octaves • -An octave corresponds to doubling the value of σ • Fixed number of scales (i.e., levels) per octave 22σ0 Down-sample 2σ0 σ0

  15. 1. Scale-space Extrema Detection (cont’d) 2σ0 (ks=2) • Images separated by a constant factor k • If each octave is divided in s intervals, • we have s+1 DoG images/octave where: • ks=2 or k=21/s • Note: need (s+1)+2 = s+3 blurred images … k2σ0 kσ0 σ0

  16. Choosing SIFT parameters • Experimentally using a matching task: • 32 real images (outdoor, faces, aerial etc.) • Images subjected to a wide range of transformations (i.e., rotation, scaling, shear, change in brightness, noise). • Keypoints are detected in each image. • Parameters are chosen based on keypoint repeatability, localization, and matching accuracy.

  17. 1. Scale-space Extrema Detection (cont’d) • How many scales sampled per octave? # of keypoints increases but they are not stable! 3 scales

  18. 1. Scale-space Extrema Detection (cont’d) • Smoothing is applied to the first level of each octave. • How to choose σ? (i.e., integration scale) σ =1.6

  19. 1. Scale-space Extrema Detection (cont’d) 2σ • Pre-smoothing discards high frequencies. • Double the size of the input image • (i.e., using linear interpolation) prior to • building the first level of the DoG pyramid. • Increases the number of stable keypoints • by a factor of 4. … k2σ kσ σ

  20. 1. Scale-space Extrema Detection (cont’d) • Extract local extrema (i.e., minima or maxima) in DoG pyramid. • Compare each point to its 8 neighbors at the same level, 9 neighbors • in the level above, and 9 neighbors in the level below (i.e., 26 total).

  21. 2. Keypoint Localization Determine the location and scale of keypoints to sub-pixel and sub-scale accuracy by fitting a 3D quadratic function at each keypoint. Substantial improvement to matching and stability!

  22. 2. Keypoint Localization Use Taylor expansion of D(x,y,σ) (i.e., DoG function) around the sample point : where is the offset from this point.

  23. 2. Keypoint Localization To find the extrema of D(ΔX): ΔX can be computed by solving a 3x3 linear system: use finite differences:

  24. 2. Keypoint Localization (cont’d) If in any dimension, repeat. • Sub-pixel, sub-scale interpolated estimate:

  25. If reject keypoint • i.e., assumes that image values have been normalized in [0,1] 2. Keypoint Localization (cont’d) • Reject keypoints having low contrast. • i.e., sensitive to noise

  26. 2. Keypoint Localization (cont’d) • Reject points lying on edges (or being close to edges) • Harris uses the 2nd order moment matrix: R(AW) = det(AW) – α trace2(AW) or R(AW) = λ1λ2- α (λ1+ λ2)2

  27. 2. Keypoint Localization (cont’d) • SIFT uses the Hessian matrix for efficiency. • i.e., encodes principal curvatures α: largest eigenvalue (λmax) β: smallest eigenvalue (λmin) (proportional to principal curvatures) (r = α/β) (SIFT uses r = 10)

  28. 2. Keypoint Localization (cont’d) • (a) 233x189 image • (b) 832 DoG extrema • (c) 729 left after low • contrast threshold • (d) 536 left after testing • ratio based on Hessian

  29. p 2 0 3. Orientation Assignment • Create histogram of gradient directions, within a region around the keypoint, at selected scale (i.e., scale invariance): 36 bins (i.e., 10o per bin) • Histogram entries are weighted by (i) gradient magnitude and (ii) a Gaussian function with σ equal to 1.5 times the scale of the keypoint.

  30. p 2 0 3. Orientation Assignment (cont’d) • Assign canonical orientation at peak of smoothed histogram (fit parabola to better localize peak). • In case of peaks within 80% of highest peak, multiple orientations assigned to keypoints. • About 15% of keypoints has multiple orientations assigned. • Significantly improves stability of matching.

  31. 3. Orientation Assignment (cont’d) • Stability of location, scale, and orientation (within 15 degrees) under noise.

  32. 4. Keypoint Descriptor • Have achieved invariance to location, scale, and orientation. • Next, tolerate illumination and viewpoint changes. Orientation histogram of gradient magnitudes 8 bins

  33. 4. Keypoint Descriptor (cont’d) • Take a 16 x16 window around detected interest point. • Divide into a 4x4 grid of cells. • Compute histogram in each cell. (8 bins) 16 histograms x 8 orientations = 128 features

  34. 4. Keypoint Descriptor (cont’d) • Each histogram entry is weighted by (i) gradient magnitude and (ii) a Gaussian function with σ equal to 0.5 times the width of the descriptor window.

  35. 4. Keypoint Descriptor (cont’d) Partial Voting: distribute histogram entries into adjacent bins (i.e., additional robustness to shifts) Each entry is added to all bins, multiplied by a weight of 1-d, where d is the distance from the bin it belongs.

  36. 4. Keypoint Descriptor (cont’d) • Descriptor depends on two parameters: • (1) number of orientations r • (2) n x n array of orientation histograms rn2 features SIFT: r=8, n=4 128 features

  37. 4. Keypoint Descriptor (cont’d) • Invariance to affine (linear) illumination changes: • Normalization to unit length is sufficient. 128 features

  38. 4. Keypoint Descriptor (cont’d) • Non-linear illumination changes : • Saturation affects gradient magnitudes more than orientations • Threshold gradient magnitudes to be no larger than 0.2 and renormalize to unit length (i.e., emphasizes gradient orientations than magnitudes) 128 features

  39. Robustness to viewpoint changes • Match features after random change in image scale and orientation, with 2% image noise, and affine distortion. • Find nearest neighbor in database of 30,000 features. Additional robustness can be achieved using affine invariant region detectors.

  40. Distinctiveness • Vary size of database of features, with 30 degree affine change, 2% image noise. • Measure % correct for single nearest neighbor match.

  41. Matching SIFT features I2 I1 • Given a feature in I1, how to find the best match in I2? • Define distance function that compares two descriptors. • Test all the features in I2, find the one with min distance.

  42. Matching SIFT features (cont’d) f1 f2 I1 I2 • How to define the distance between two features f1, f2? • Simple approach: SSD(f1, f2) (i.e., sum of squared differences) • Can give good scores to very ambiguous (bad) matches

  43. Matching SIFT features (cont’d) • SSD(f1,f2) < t How to set t?

  44. Matching SIFT features (cont’d) f1 f2' f2 I1 I2 • How to define the difference between two features f1, f2? • Better approach: SSD(f1, f2) / SSD(f1, f2’) • f2 is best SSD match to f1 in I2 • f2’ is 2nd best SSD match to f1 in I2

  45. Matching SIFT features (cont’d) • Accept a match if SSD(f1, f2) / SSD(f1, f2’) < t • t=0.8 has given good results in object recognition. • Eliminated 90% of false matches. • Discarded less than 5% of correct matches

  46. Matching SIFT features (cont’d) How to evaluate the performance of a feature matcher? 50 75 200

  47. Matching SIFT features (cont’d) True positives (TP) = # of detected matches that are correct False positives (FP) = # of detected matches that are incorrect 50 true match 75 200 false match • Threshold t affects # of correct/false matches

  48. Matching SIFT features(cont’d) 0.7 TPrate 0 1 FP rate 0.1 • ROC Curve • - Generated by computing (FP, TP) for different thresholds. • - Need to maximize area under the curve (AUC) 1 http://en.wikipedia.org/wiki/Receiver_operating_characteristic

  49. Applications of SIFT Object recognition Object categorization Location recognition Robot localization Image retrieval Image panoramas

  50. Object Recognition Object Models

More Related