1 / 29

Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification. Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik. Goal. Nearest neighbor classification. D ( , ). Learning a Distance Metric from Relative Comparisons.

Download Presentation

Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

  2. Goal

  3. Nearest neighbor classification D ( , )

  4. Learning a Distance Metric from Relative Comparisons [Schulz & Joachims, NIPS ’03] D ( , ) D ( , ) = D ( , ) ( - )T ( - )

  5. Approach image i image j

  6. Approach image i dji,m image j

  7. image k Approach image i Dji =Σ wj,mdji,m image j

  8. image k Approach image i < Dji Dki image j

  9. image i < Dji Dki image j image k Core wj,m ? image j

  10. Derivations • Notation • Large-margin formulation • Dual problem • Solution

  11. Dji =Σ wj,mdji,m Dji =wj·dji Dki > Dji wk·dki > wj ·dji wk·dki - wj ·dji ≥ 1 W w1w2…wk…wj… Xijk 0 0 … dki…-dji… wk·dki - wj ·dji ≥ 1 W·Xijk≥ 1 Notations for triplet i, j, k

  12. Large-margin formulation

  13. SVM

  14. SVM

  15. SVM

  16. SVM

  17. Soft-margin SVM

  18. Derivation

  19. Dual

  20. Details – Features and descriptors • Find ~400 features per image • Compute geometric blur descriptor

  21. Descriptors • Geometric blur

  22. Descriptors • Two sizes of geometric blur (42 pixels and 70 pixels) • Each is 204 dimensions (4 orientations and 51 samples each) • HSV histograms of 42-pixel patches

  23. Choosing triplets • Caltech101 – at 15 images per class • 31.8 million triplets • Many are easy to satisfy • For each image j, for each feature • Find the N images I with closest features • For each negative example iin I, form triplets (j, k, i) • Eliminates ~ half of triplets

  24. Choosing C

  25. Choosing C • Train with multiple values of C, testing on a held-out part of the training set • Choose whichever gives the best results • For each C, run online version of the training algorithm • Make one sweep through training triplets • For each misclassified triplet (i,j,k), update weights for the three images • Choose C which gets the most right answers

  26. Results • At 15 training examples per class: 63.2% (~3% improvement) • At 20 training examples per class: 66.6% (~5% improvement)

  27. Results • Confusion matrix Hardest categories: crocodile, cougar_body, cannon, bass

  28. Questions • Is there any disadvantage to a non-metric distance function? • Could the images be embedded in a metric space? • Why not learn everything? • Include a feature for each image pixel • Include multiple types of descriptors • Could this be used for to do unsupervised learning for sets of tagged images (e.g., for image segmentation)? • Can you learn a single distance per class?

More Related