1 / 44

Introduction to Object Recognition

CS 636 Computer Vision. Introduction to Object Recognition. Nathan Jacobs. Slides adapted from Lazebnik. Object Recognition: History and Overview. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba , and Jean Ponce. ~10,000 to 30,000 visual categories. OBJECTS. INANIMATE.

ipo
Download Presentation

Introduction to Object Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 636 Computer Vision Introduction to Object Recognition Nathan Jacobs Slides adapted from Lazebnik

  2. Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce

  3. ~10,000 to 30,000 visual categories

  4. OBJECTS INANIMATE ANIMALS PLANTS NATURAL MAN-MADE ….. VERTEBRATE MAMMALS BIRDS TAPIR BOAR GROUSE CAMERA

  5. Popular Object Recognition Datasets http://www.vision.caltech.edu/Image_Datasets/Caltech256/ http://www.image-net.org/ 12,184,113 images 17624 synsets (categories) 30,607 images 256 categories 80 Million Tiny Images http://groups.csail.mit.edu/vision/TinyImages/ 75,062 nouns http://groups.csail.mit.edu/vision/SUN/ 79,302,017 images 899 categories 130,519 images

  6. So what does object recognition involve?

  7. Scene categorization • outdoor • city • …

  8. Image-level annotation: are there people? • outdoor • city • …

  9. Object detection: where are the people?

  10. Image parsing mountain tree building banner street lamp vendor people

  11. Modeling variability Variability: Camera position Illumination Shape parameters Within-class variations?

  12. Within-class variations

  13. manifold representations

  14. q Camera position Illumination Variability: Alignment What if the shape is known? Roberts (1965); Lowe (1987); Faugeras & Hebert (1986); Grimson & Lozano-Perez (1986); Huttenlocher & Ullman (1987)

  15. xi xi ' T Recall: Alignment • Alignment: fitting a model to a transformation between pairs of features (matches) in two images Find transformation Tthat minimizes

  16. Alignment: Huttenlocher & Ullman (1987)

  17. hashing Invariance to: Variability Camera position Illumination Internal parameters Duda & Hart ( 1972); Weiss (1987); Mundy et al. (1992-94); Rothwell et al. (1992); Burns et al. (1993)

  18. Projective invariants (Rothwell et al., 1992): http://astrometry.net/ “If you have astronomical imaging of the sky with celestial coordinates you do not know—or do not trust—then Astrometry.net is for you. Input an image and we'll give you back astrometric calibration meta-data, plus lists of known objects falling inside the field of view.” B C D A

  19. Representing and recognizing object categories is harder... ACRONYM (Brooks and Binford, 1981) Binford (1971), Nevatia & Binford (1972), Marr & Nishihara (1978)

  20. Recognition by components Geons (Biederman 1987) ???

  21. General shape primitives? Generalized cylinders Ponce et al. (1989) Forsyth (2000) Zisserman et al. (1995)

  22. Empirical models of image variability Appearance-based techniques Turk & Pentland (1991); Murase & Nayar (1995); etc.

  23. Eigenfaces (Turk & Pentland, 1991)

  24. Color Histograms Swain and Ballard, Color Indexing, IJCV 1991.

  25. Appearance manifolds H. Murase and S. Nayar, Visual learning and recognition of 3-d objects from appearance, IJCV 1995

  26. Limitations of global appearance models • Requires global registration of patterns • Not robust to clutter, occlusion, geometric transformations

  27. Turk and Pentland, 1991 Belhumeur, Hespanha, & Kriegman, 1997 Schneiderman & Kanade 2004 Viola and Jones, 2000 Sliding window approaches • Schneiderman & Kanade, 2004 • Argawal and Roth, 2002 • Poggio et al. 1993

  28. Sliding window approaches • Scale / orientation range to search over • Speed • Context

  29. Local features Combining local appearance, spatial constraints, invariants, and classification techniques from machine learning. Lowe’02 Schmid & Mohr’97 Mahamud & Hebert’03

  30. Local features for recognition of object instances

  31. Local features for recognition of object instances • Lowe, et al. 1999, 2003 • Mahamud and Hebert, 2000 • Ferrari, Tuytelaars, and Van Gool, 2004 • Rothganger, Lazebnik, and Ponce, 2004 • Moreels and Perona, 2005 • …

  32. Representing categories: Parts and Structure Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)

  33. Parts-and-shape representation • Model: • Object as a set of parts • Relative locations between parts • Appearance of part Figure from [Fischler& Elschlager73]

  34. Object Bag-of-features models Bag of ‘words’

  35. Objects as texture • All of these are treated as being the same • No distinction between foreground and background: scene recognition?

  36. Timeline of recognition • 1965-late 1980s: alignment, geometric primitives • Early 1990s: invariants, appearance-based methods • Mid-late 1990s: sliding window approaches • Late 1990s: feature-based methods • Early 2000s: parts-and-shape models • 2003 – present: bags of features • Present trends: combination of local and global methods, modeling context, emphasis on “image parsing”

  37. Geometric context D. Hoiem, A. Efros, and M. Herbert. Putting Objects in Perspective. CVPR 2006.

  38. Global scene context • The “gist” of a scene: Oliva & Torralba (2001) http://people.csail.mit.edu/torralba/code/spatialenvelope/

  39. Scene-level context for image parsing J. Tighe and S. Lazebnik, ECCV 2010 http://www.cs.unc.edu/~jtighe/Papers/ECCV10/

  40. J. Hays and A. Efros, Scene Completion using Millions of Photographs, SIGGRAPH 2007

  41. What “works” today • Reading license plates, zip codes, checks

  42. What “works” today • Reading license plates, zip codes, checks • Fingerprint recognition

  43. What “works” today • Reading license plates, zip codes, checks • Fingerprint recognition • Face detection

  44. What “works” today • Reading license plates, zip codes, checks • Fingerprint recognition • Face detection • Recognition of flat textured objects (CD covers, book covers, etc.)

More Related