1 / 54

Fast and Compact Retrieval Methods in Computer Vision Part II

Fast and Compact Retrieval Methods in Computer Vision Part II. A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition . CVPR 2008 A. Torralba, R. Fergus, W. Freeman . 80 million tiny images: a large dataset for non-parametric object and scene recognition. TR.

abra
Download Presentation

Fast and Compact Retrieval Methods in Computer Vision Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss.Small Codes and Large Image Databases for Recognition. CVPR 2008 A. Torralba, R. Fergus, W. Freeman . 80 million tiny images: a large dataset for non-parametric object and scene recognition. TR Presented by Ken and Ryan

  2. Outline • Large Datasets of Images • Searching Large Datasets • Nearest Neighbor • ANN: Locality Sensitive Hashing • Dimensionality Reduction • Boosting • Restricted Boltzmann Machines (RBM) • Results

  3. Goal • Develop efficient image search and scene matching techniques that are fast and require very little memory • Particularly on VERY large image sets Query

  4. Motivation • Image sets • Vogel & Schiele: 702 natural scenes in 6 cat • Olivia & Torralba: 2688 • Caltech 101: ~50 images/cat ~ 5000 • Caltech 256: 80-800 images/cat ~ 30608 • Why do we want larger datasets?

  5. Motivation • Classify any image • Complex classification methods don’t extend well • Can we use a simple classification method?

  6. Thumbnail Collection Project • Collect images for ALL objects • List obtained from WordNet • 75,378 non-abstract nouns in English

  7. Thumbnail Collection Project • Collected 80M images • http://people.csail.mit.edu/torralba/tinyimages

  8. How Much is 80M Images? • One feature-length movie: • 105 min = 151K frames @ 24 FPS • For 80M images, watch 530 movies • How do we store this? • 1k * 80M = 80 GB • Actual storage: 760GB

  9. First Attempt • Store each image as 32x32 color thumbnail • Based on human visual perception • Information: 32*32*3 channels =3072 entries

  10. First Attempt • Used SSD++ to find nearest neighbors of query image • Used first 19 principal components

  11. Motivation Part 2 • Is this good enough? • SSD is naïve • Still too much storage required • How can we fix this? • Traditional methods of searching large datasets • Binary reduction

  12. Locality-Sensitive Hash Families

  13. LSH Example

  14. Binary Reduction 164 GB 320 MB Gist vector Lots of pixels Binary reduction 80 million images? 512 values 32 bits

  15. Gist “The ‘gist’ is an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.)” A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision, 42(3):145–175, 2001.

  16. Gist

  17. http://ilab.usc.edu/siagian/Research/Gist/Gist.html Gist vector

  18. Querying Dataset Query Image

  19. Querying » ? 1

  20. Querying » ? 6

  21. Querying

  22. Boosting • Positive and negative image pairs train the discovery of the binary reduction. • 150K pairs • 80% negatives = 1 & & = -1

  23. BoostSSC xi Weight  • Similarity Sensitive Coding • Weights start uniformly N values

  24. BoostSSC Feature vector x from image i • For each bit m: • Choose the index n that minimizes a weighted error across entire training set n Binary reduction h(x) m N values M bits

  25. BoostSSC xi xj • Weak classifications are evaluated via regression stumps: n If xi and xj are similar, we should get 1 for most n’s. • We need to figure out a, b, and T for each n. N values

  26. BoostSSC n n n xi xj • Try a range of threshold T: • Regress f across entire training set to find each a and b. • Keep the T that fits the best. • Then, keep the n that causes the least weighted error. N values

  27. BoostSSC xi xj m n M bits N values

  28. BoostSSC xi xj Weight  • Update weights. • Affects future error calculations n N values

  29. BoostSSC xi • In the end, each bit has an n index and a threshold. M bits N values

  30. BoostSSC

  31. Restricted Boltzmann Machine (RBM) Architecture • Network of binary stochastic units • Hinton & Salakhutdinov, Nature 2006 Parameters: w: Symmetric Weights b: Biases h: Hidden Units v: Visible Units

  32. Multi-Layer RBM Architecture

  33. Training RBM Models • Two phases • Pre-training • Unsupervised • Use Contrastive Divergence to learn weights and biases • Gets parameters in the right ballpark • Fine-tuning • Supervised • No longer stochastic • Backpropogate error to update parameters • Moves parameters to local minimum

  34. Greedy Pre-training (Unsupervised)

  35. Greedy Pre-training (Unsupervised)

  36. Greedy Pre-training (Unsupervised)

  37. Neighborhood Components Analysis • Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Output of RBM W are RBM weights

  38. Neighborhood Components Analysis • Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Assume K=2 classes

  39. Neighborhood Components Analysis • Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Pulls nearby points of same class closer

  40. Neighborhood Components Analysis • Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Pulls nearby points of same class closer Goal is to preserve neighborhood structure of original, high-dimensional space

  41. Experiments and Results

  42. Searching • Bit limitations: • Hashing scheme: • Max. capacity for 13M images: 30 bits • Exhaustive search: • 256 bits possible

  43. Searching Results

  44. LabelMe Retrieval

More Related