Learning Local Image Descriptors

Learning Local Image Descriptors Matthew Brown University of British Columbia (prev.) Microsoft Research [ Collaborators: †Simon Winder, *Gang Hua, †Rick Szeliski †=MS Research, *=MS Live Labs]

Applications @MSFT • Panoramic Stitching • Digital Image Pro, Windows Live Photogallery, Expression, HDView • 3D Modelling • Photosynth • Virtual Earth • Location Recognition • Image Search • Lincoln [yellow = product, white = technology preview, grey = research ]

Photosynth [ http://labs.live.com/photosynth ]

Photo Tourism [ http://photour.cs.washington.edu ] Photo Explorer Scene reconstruction Input photographs Relative camera positions and orientations Point cloud Sparse correspondence [ Slide credit: Noah Snavely] Photosynth is based on Photo Tourism [Snavely, Seitz, Szeliski SIGGRAPH 2006 ] Photo Tourism uses SIFT for correspondence

multiview stereo = training data [ Seitz et al CVPR 2006, Goesele et al ICCV 2007 ]

Learning Image Features 3D Point Cloud [ Photo Tourism – Snavely, Seitz, Szeliski - SIGGRAPH 2006 ]

Problem Statement Find a function of a local image patch descriptor = f ( ) s.t. a nearest neighbour classifier† is optimal* † =for simplicity + efficiency * = measured by ROC curve Q: Form of the descriptor function f(.)?

Descriptor Algorithms Algorithm Normalized Image Patch Descriptor Vector Gradients Quantized to k Orientations Normalize Summation [ SIFT – Lowe ICCV 1999 ]

Descriptor Algorithms Algorithm Normalized Image Patch Descriptor Vector Gradients Quantized to k Orientations Normalize (plus PCA) Summation [ GLOH – MikolajzcykSchmid PAMI 2005 ]

Descriptor Algorithms Algorithm Normalized Image Patch Descriptor Vector Create Edge Map Normalize Summation [ Shape Context – BelongieMalikPuzicha NIPS 2000 ]

Descriptor Algorithms Algorithm Normalized Image Patch Descriptor Vector T S N Feature Detector Normalize Summation [ Geometric Blur – Berg Malik CVPR 2001 ]

Our Contribution T S N Normalized Image Patch Descriptor Vector Parameters Propose a framework for descriptor algorithms Learn parameters to find best performance Train on a ground truth data set based on accurate 3D matches

T-blocks T S N Normalized Image Patch (w x h) Descriptor Vector (w x h x k) • Haar wavelets • Local classifier • Quantized intensities • Output: one length k vector per source pixel • Transformation block • Local gradients • Steerable filters • Isotropic filters

S-Blocks T S N Normalized Image Patch (w x h) Descriptor Vector (w x h x k) (m x k) S1 S2 S3 S4 Spatial summation block with m regions Output: m length k vectors

N-Blocks T S N Normalized Image Patch (w x h) Descriptor Vector (w x h x k) (m x k) (m x k) • Normalization Block • Unit normalization • SIFT normalization with clipping

Learning Descriptors T S N

Learning Descriptors T1a S2 N2 Training Pairs Descriptor Distances Parameters Update Parameters (Powell) Correct Match % Incorrect Match %

Testing Descriptors T1a S2 N2 Test Pairs Descriptor Distances Parameters 95% Final Error Rate Correct Match % Incorrect Match %

Example of Parameter Learning

Results: Changing T-Blocks (k=4) Polar lattice S2 always has lower error rate than rectangular S1 Gradient and DOG with S2 beat our SIFT reference (4% vs 6% error)

Results: Changing T-Blocks (k=8)

Results: Changing T-Blocks (k=16) Steerable filters produce great results if phase information is kept

Results: Changing S-Blocks

Results SIFT normalization is important Best result: 4th order steerable filters with phase information combined with polar S4-25 Gaussian summation block (2% error vs SIFT at 6%) Very large numbers of dimensions

Dimension Reduction: PCA wPCA

Dimension Reduction: LDA wLDA

Results: LDA on patches Normalised patches Gradient patches • LDA on pixels ≈ SIFT (6%) • PCA gave small improvement

Results: LDA on patches Effect of # of Training Pairs • LDA on pixels ≈ SIFT (6%) • PCA gave small improvement • Need ~100,000 training examples

Results: LDA on T blocks T1 T3 • LDA on T1-T3 < 4.5% • Optimal #dimensions ~20-30 • Post-normalisation important

Results: LDA on T blocks LDA using T blocks T1–T4 • LDA on T1-T3 < 4.5% • Optimal #dimensions ~20-30 • Post-normalisation important

Results: LDA on descriptors LDA using CVPR 07 descriptors • Overall best results • #dimensions reduced from 100’s to 10’s • Need more challenging dataset!

Discussion: Image Descriptors Algorithm Normalized Image Patch Descriptor Vector “simple” “complex” T S N Feature Detector Normalize Summation

Conclusions • Future Work • Use multi-view stereo ground truth • Multi-level simple-complex architecture • + non-parametric T blocks • Learn interest point detectors [ refs: 1) Winder, Brown CVPR 2007 2) Hua, Brown, Winder ICCV 2007 ] mbrown@cs.ubc.ca Used learning to obtain good descriptors Achieved error rates 1/3 of SIFT Produced useful ground truth data set

HDView [http://research.microsoft.com/ivm/hdview.htm ]

Learning Local Image Descriptors

Learning Local Image Descriptors

Presentation Transcript

Molecular Descriptors

EQF descriptors and the QF - EHEA descriptors

Spiral Descriptors

Tasting Descriptors

Object Recognition using Local Descriptors

Aggregating local image descriptors into compact codes

Descriptors

Descriptors ( Description of Interest Regions with Local Binary Patterns)

Local Invariant Feature Descriptors

Combining Local Descriptors for 3D Object Recognition and Categorization

Comparing 3D descriptors for local search of craniofacial landmarks

Local Learning Partnerships

Patch Descriptors

Shape Descriptors

On Using SIFT Descriptors for Image Parameter Evaluation

Shape Based Image Retrieval Using Fourier Descriptors

Local Descriptors for Spatio-Temporal Recognition

Unsupervised Dynamic Texture Segmentation Using Local Spatiotemporal Descriptors

Learning from Local

Comparison of Image Feature Descriptors for Mobile Visual Search

Dublin Descriptors