340 likes | 518 Views
Automatic Matching of Multi-View Images. Ed Bremer University of Rochester. References. [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004, http://lear.inrialpes.fr/pubs/2004/MS04a
E N D
Automatic Matching of Multi-View Images Ed Bremer University of Rochester
References • [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004, http://lear.inrialpes.fr/pubs/2004/MS04a • [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04 • [3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118. • [4] Matas, J., Chum, O., Urban, M., Pajdla,T. 2002. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, Proc British Machine Vision Conference BMVC2002, pages 384 – 393. • [5] Zisserman, A., Schaffalitzky, F., 2002, Multi-view matching for unordered image sets, or ”How do I organize my holiday snaps?”, Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 414-431, vol 1. • [6] Baumberg, A., 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR ,pages 774-781. • [7] Mikolajczyk, K, Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, pages 525-531. Automatic Matching of Multi-View Images
Outline • Motivation • Applications • Process Components • Region Detectors • Descriptors • Matching Criteria • Performance Evaluation • Conclusion & Next Steps Automatic Matching of Multi-View Images
Motivation • Multi-view/Multi-image Matching Multiple images of scene taken by single or multiple cameras with different rotation, scale, viewpoint and illumination 3D scene Automatic Matching of Multi-View Images
Motivation • Applications … detecting matching regions is used in all the following • Image registration • Super-resolution • Stereo vision • Object detection and recognition • Object and motion tracking • Indexing and retrieval of objects • 3D scene reconstruction • Scene recognition Automatic Matching of Multi-View Images
Examples of Multi-view Images [2] [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04 Automatic Matching of Multi-View Images
Process Components • Covariant region detection • Detect image regions covariant to class of transformation between reference image and transformed image • Invariant descriptor • Compute invariant descriptors from covariant regions • Descriptor matching • Compute distance between descriptors in reference image and transformed image [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a Automatic Matching of Multi-View Images
Region Detectors • Support regions for computation of descriptors • Determined independently in each image • Scale invariant or Affine invariant • Can be points (feature points) or regions (covariant) • Provide dense (local) coverage – robust to occlusion • Need to be stable and repeatable • Five region detectors - • Harris points -> invariant to rotation • Harris-Laplacian -> invariant to rotation and scale • Hessian-Laplace ->invariant to rotation and scale • Harris-Affine -> invariant to affine image transformations • Hessian-Affine -> invariant to affine image transformations [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a Automatic Matching of Multi-View Images
Region Detectors • Harris points - • Maxima of Harris function used to locate interest point • Support region fixed in size, 41x41 neighborhood centered at interest point • Harris-Laplace regions - • Scale adapted Harris function • Interest point is local minima or maxima across scale-space by Laplacian-of-Gaussian [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a Automatic Matching of Multi-View Images
Region Detectors • Harris-Laplace Performance - • Approximately 10% better than Laplacian, Lowe or gradient methods. • Harris standard detector is very poor under scale changes [7] Mikolajczyk, K., Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, Pages 525-531. Automatic Matching of Multi-View Images
Region Detectors • Hessian-Laplace regions - • Interest point is at local maxima of Hessian determinant • Location in scale-space using maxima of Laplacian-of-Gaussian (can also use Difference-of-Gaussians) [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a [3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118. Automatic Matching of Multi-View Images
Region Detectors • Harris-Affine regions - • Find regions using Harris-Laplace detector • Region based on 2nd moment & affine adapted • Hessian-Affine regions - • Find regions using Hessian-Laplace detector • Affine adapted region based on 2nd moment. [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04 Automatic Matching of Multi-View Images
Region Detectors • Regions produced by Harris-Affine and Hessian-Affine detectors [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04 Automatic Matching of Multi-View Images
Region Detectors • Affine normalization using 2nd moment matrix for region L and R [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04 Automatic Matching of Multi-View Images
Region Detectors • Region normalization • Detectors produce circular or elliptical regions • Size dependant on detection scale • Map regions to circular region with constant radius • Rotate regions in direction of dominant gradient orientation • Illumination normalization • Use affine transformation -> aI(x) + b • Mean and standard deviation of pixel intensities [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a Automatic Matching of Multi-View Images
Descriptors • Descriptors -> Feature vector • Invariant to changes in scale, rotation, affine translation and affine illumination • Need to be distinct, stable and repeatable • Distribution (histogram) type or Covariance type • Ten Descriptor types • Scale-Invariant Feature Transform (SIFT) • Gradient Location and Orientation histogram (GLOH) • Shape Context • Principal Component Analysis (PCA)-SIFT • Steerable Filters • Differential Invariants • Complex Filters • Moment Invariants • Cross-Correlation • Spin Image [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a Automatic Matching of Multi-View Images
Descriptors • SIFT and GLOH 3D Descriptors • SIFT -> 4 x 4 x 8 = 128 dimension descriptor • GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a Automatic Matching of Multi-View Images
Matching Criteria • Distance measure • Find putative matches between images • Mahalanobis distance – used for covariant descriptors • Euclidean distance – used for distribution (histogram) descriptors • Direct distance comparison not suitable for indexing or database searching • Simple threshold • Descriptors match if distance between is below threshold t • Descriptor in reference image can have many matches to descriptors in transformed image • Nearest Neighbor (NN) • Find closest match between descriptors in reference and transformed image • Descriptor in reference image can have only 1 match to descriptor in transformed image Automatic Matching of Multi-View Images
Performance Evaluation • Criterion basis • Recall rate = #correct matched/#correspondences • 1-precision = #false matches/[#correct matches + #false matches] • Ideal descriptor -> recall rate = 1, for all precision given no overlap error [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform • Scale Invariant Feature Transform (SIFT) Lowe [3] • Features – • Invariant to image scale, rotation • Invariant for small changes in illumination and 3D camera viewpoint • Extracts large number of highly distinctive features • Enables detection of small objects • Improved performance in cluttered scenes • Algorithms are efficient – complex operations applied to local regions or features vs whole image • Procedure • Scale-space extrema detection • Keypoint localization • Orientation asignment • Keypoint vector (descriptor) Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Scale-Space Blob Detector - • Search for stable features over all scales and image locations • Scale-space kernel -> Gaussian function • Difference of Gaussian Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Difference of Gaussian (DoG) • simple subtraction of blurred L images • Approximation to scale-normalized Laplacian of Gaussian Maxima or minima of scale-normalized Laplacian produces the most stable image features compared to gradient, Hessian, or Harris corner function (Mikolajczyk 2002) Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Scale-Space Image Set - • Divide each octave into s intervals • Compute s + 3 filtered (increasing blurry) images, k = 2(1/s) s = 3, k = 1.26 -> 6th –> 3.18σ 5th –> 2.52σ 4th –> 2.00σ 3rd –> 1.59σ 2nd –> 1.26σ 1st –> 1.00σ • Subtract adjacent images to produce DoG images • Repeat for next octave using 2nd image from top and decimate by 2 Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Scale-Space Pyramid - (from Lowe) Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Locating Scale-Space Extrema - • Detection of local maxima or minima of D(x, y, σ) • Compare each sample point to 8 neighbors in same scale image and 9 neighbors in scale image above and below. • Mark if sample is greater than or less than all of the neighbors • Compares s number of DoG images Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Improving Localization - • Reject points that have low contrast using: <threshold • Where –> • Gives offset extremum -> • Hessian and derivative of D(x, y, σ) uses differences of neighboring sample points. x = (x, y , σ)T is offset from sample point Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Edge Rejection - • Eliminate poorly defined peaks (edges) using Hessian matrix • Verify ratio of principal curves is less than threshold r<10 • Efficient to compute -> less than 20 floating point operations Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Results from Lowe [3] – 832 keypoints reduced to 536 (233x189 image) Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform • Results from Lowe [3] – performance measures Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform • Results from Lowe [3] – performance measures Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Orientation – rotational invariance • Use scale of point to select image L(x, y, σ) • Compute the gradient m(x, y) and orientationθ(x, y) at each image sample using differences. • Orientation histogram of sample points – entries weighted by gradient magnitude and a Gaussian window around the keypoint, bins cover 360° range • Peaks in histogram correspond to dominant directions of local gradients Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Descriptor – the feature vector • 8x8 sub-region histograms allow shift in gradient positions • 128 element feature vector -> 4x4 array of 8 orientations (2x2x8 from Lowe is shown below) • Feature vectors matched by nearest neighbor (Euclidean distance) Automatic Matching of Multi-View Images
SIFT - Scale Invariant Feature Transform [3] • Results from Lowe [3] – • Two training objects recognized in cluttered image • Small squares show point matches • Large rectangles shown border of training image after affine transformation Automatic Matching of Multi-View Images
Conclusions • Conclusions • Harris-Laplacian region detector performs better than Laplacian, DoG and gradient scale-space operators • Scale-space detectors provide invariance to rotation, scale and small changes to illumination and viewpoint. • Affine adaptation provides invariance to affine transformations • GLOH and SIFT descriptors provide the best performance. • Dense, localized descriptors perform well under occlusions • Nexts steps • Coding and testing of region detectors, descriptors and matching… Automatic Matching of Multi-View Images