1 / 70

The Future of Image Search

The Future of Image Search. Jitendra Malik UC Berkeley. The Motivation…. There are now billions of images on the web and in collections such as Flickr. Suppose I want to find pictures of monkeys. Google Image Search -- monkey. Google Image Search -- monkey. Google Image Search -- monkey.

Download Presentation

The Future of Image Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Future of Image Search Jitendra Malik UC Berkeley

  2. The Motivation… • There are now billions of images on the web and in collections such as Flickr. • Suppose I want to find pictures of monkeys..

  3. Google Image Search -- monkey

  4. Google Image Search -- monkey

  5. Google Image Search -- monkey

  6. Google Image Search -- monkey Words are not enough…

  7. Flickr Search for tag monkey Even with humans doing the labeling, the data is extremely noisy -- context, polysemy, photo sets Tags are not enough either!

  8. Content based Image Retrieval circa 1990s QBIC, IBM 1993 Color & Color Layout VisualSeek Smith & Chang, 1996 Walrus - Natsev et al 1999 Region color and texture Blobworld - Carson et al 1999 NeTra - Ma et al 1999

  9. Color and Texture models Color Features: Histogram of what colors appear in the image Texture Features: Histograms of 16 filters = *

  10. The Semantic Gap • First generation CBIR systems were based on color and texture; however these do not capture what users really care about : conceptual or semantic categories. • Perception studies suggest that the most important cue to visual categorization is shape. This was ignored in earlier work (because it was hard!) • Over the last 5 -10 years, we have seen rapid progress in capturing shape.

  11. The Research Program.. • Automatically generate annotations corresponding to object labels or activities in video • Combine these with other metadata such as text

  12. Water back Grass Tiger Tiger Sand head eye legs tail mouse shadow From Pixels to Perception outdoor wildlife

  13. Object Category Recognition

  14. Modeling shape variation in a category • D’Arcy Thompson: On Growth and Form, 1917 studied transformations between shapes of organisms

  15. Attneave’s Cat (1954)Line drawings convey most of the information

  16. Taxonomy and Partonomy • Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia • Recognition can be at multiple levels of categorization, or be identification at the level of specific individuals , as in faces. • Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes. Also true of scenes. • Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al). Biederman has estimated the number of basic visual categories as ~ 30 K • In a partonomy each level contributes useful information for recognition.

  17. Matching with Exemplars • Use exemplars as templates • Correspond features between query and exemplar • Evaluate similarity score Database of Templates Query Image

  18. Matching with Exemplars • Use exemplars as templates • Correspond features between query and exemplar • Evaluate similarity score Database of Templates Query Image Best matching template is a helicopter

  19. 3D objects using multiple 2D views View selection algorithm from Belongie, Malik & Puzicha (2001)

  20. Error vs. Number of Views

  21. Three Big Ideas • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features

  22. Comparing Pointsets

  23. Shape Context Count the number of points inside each bin, e.g.: Count = 4 ... Count = 10 • Compact representation of distribution of points relative to each point (Belongie, Malik & Puzicha, 2001)

  24. Shape Context

  25. Geometric Blur(Local Appearance Descriptor) Berg & Malik '01 Compute sparse channels from image Extract a patch in each channel Apply spatially varying blur and sub-sample ~ Descriptor is robust to small affine distortions Geometric Blur Descriptor (Idealized signal)

  26. Three Big Ideas • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features

  27. Modeling shape variation in a category • D’Arcy Thompson: On Growth and Form, 1917 • studied transformations between shapes of organisms

  28. MatchingExample model target

  29. Handwritten Digit Recognition • MNIST 600 000 (distortions): • LeNet 5: 0.8% • SVM: 0.8% • Boosted LeNet 4: 0.7% • MNIST 60 000: • linear: 12.0% • 40 PCA+ quad: 3.3% • 1000 RBF +linear: 3.6% • K-NN: 5% • K-NN (deskewed): 2.4% • K-NN (tangent dist.): 1.1% • SVM: 1.1% • LeNet 5: 0.95% • MNIST 20 000: • K-NN, Shape Context matching: 0.63%

  30. 171 of 192 images correctly identified: 92 % EZ-Gimpy Results horse spade smile join canvas here

  31. Three Big Ideas • Correspondence based on local shape/appearance descriptors • Deformable Template Matching • Machine learning for finding discriminative features

  32. 83/400 79/400 Discriminative learning(Frome, Singer, Malik, 2006) • weights on patch features in training images • distance functions from training images to any other images • browsing, retrieval, classification

  33. want: image j image k image i triplets • learn from relative similarity compare image-to-imagedistances image-to-image distances based on feature-to-image distances

  34. dij dik xijk ... 0.5 -0.2 focal image version image k ... image i (focal) 0.2 0.8 0.2 - 0.8 image j 0.4 0.3 ... 0.3 0.4 =

  35. large-margin formulation • slack variables like soft-margin SVM • w constrained to be positive • L2 regularization

  36. Caltech-101 [Fei-Fei et al. 04] • 102 classes, 31-300 images/class

  37. retrieval results: retrieval example query image

  38. Caltech 101 classification results(Combining classifiers does better still - Verma & Ray)

  39. 15 training/class, 63.2%

  40. So, what is missing? • These are isolated objects on simple backgrounds; real objects are part of scenes. • The general case has been solved for some categories e.g. faces.

  41. Face Detection Schneiderman & Kanade (CMU), 2000… Results on various images submitted to the CMU on-line face detector

  42. Detection: Is this an X? Ask this question over and over again, varying position, scale, multiple categories… Speedups: hierarchical, early reject, feature sharing, cueing but same underlying question!

  43. Detection: Is this an X? Ask this question over and over again, varying position, scale, multiple categories… Speedups: hierarchical, early reject, feature sharing, but same underlying question!

  44. Detection: Is this an X? Boosted dec. trees, cascades + Very fast evaluation - Slow training (esp. multi-class) Linear SVM + Fast evaluation + Fast training - Need to find good features Non-linear kernelized SVM + Better class. acc. than linear . Medium training - Slow evaluation Ask this question over and over again, varying position, scale, multiple categories… Speedups: hierarchical, early reject, feature sharing, but same underlying question!

  45. Detection: Is this an X? Boosted dec. trees, cascades + Very fast evaluation - Slow training (esp. multi-class) Linear SVM + Fast evaluation + Fast training - Need to find good features Non-linear kernelized SVM + Better class. acc. than linear . Medium training - Slow evaluation This work Ask this question over and over again, varying position, scale, multiple categories… Speedups: hierarchical, early reject, feature sharing, but same underlying question!

  46. Classification Using Intersection Kernel Support Vector Machines is efficient. Subhransu Maji and Alexander C. Berg and Jitendra Malik. Proceedings of CVPR 2008, Anchorage, Alaska, June 2008. Software and more results available at http://www.cs.berkeley.edu/~smaji/projects/fiksvm/

  47. Linear Separators (aka. Perceptrons) Support Vector Machines

  48. Other possible solutions Support Vector Machines

  49. Which one is better? B1 or B2? How do you define better? Support Vector Machines

More Related