Visual Object Recognition

Visual Object Recognition Rob Fergus Courant Institute, New York University http://cs.nyu.edu/~fergus/icml_tutorial/

Agenda • Introduction • Bag-of-words models • Visual words with spatial location • Part-based models • Discriminative methods • Segmentation and recognition • Recognition-based image retrieval • Datasets & Conclusions

Recognizing and Learning Object Categories: Year 2007 Li Fei-Fei, Princeton Rob Fergus, NYU Antonio Torralba, MIT http://people.csail.mit.edu/torralba/shortCourseRLOC

Agenda • Introduction • Bag-of-words models • Visual words with spatial location • Part-based models • Discriminative methods • Segmentation and recognition • Recognition-based image retrieval • Datasets & Conclusions

So what does object recognition involve?

Classification: are there street-lights in the image?

Detection: localize the street-lights in the image

Object categorization mountain tree building banner street lamp vendor people

Scene and context categorization • outdoor • city • …

meters Ped Ped Car meters Application: Assisted driving Pedestrian and car detection Lane detection • Collision warning systems with adaptive cruise control, • Lane departure warning systems, • Rear object detection systems,

Application:Computational photography

Application: Improving online search Query: STREET Organizing photo collections

Challenges 1: view point variation Michelangelo 1475-1564

Challenges 2: scale

Challenges 3: illumination slide credit: S. Ullman

Challenges 4: background clutter Bruegel, 1564

Challenges 5: occlusion http://lh5.ggpht.com/_wJc6t2hDl2M/RrL7Gh6sS7I/AAAAAAAAAYY/n3xaHc2opls/DSC00633.JPG

Challenges 6: deformation http://img.timeinc.net/time/asia/magazine/2007/1112/racehorse_1112.jpg Xu, Beihong 1943

History: single object recognition Object 1 Object 2 Object 3

Single object recognition history: Geometric methods David Lowe [1985] Rothwell et al. [1992]

Single object recognition history: Appearance-based methods • Murase & Nayer 1995 • Schmid & Mohr 1997 • Lowe, et al. 1999, 2003 • Mahamud and Herbert, 2000 • Ferrari et al. 2004 • Rothganger et al. 2004 • Moreels and Perona, 2005 • …

Challenges 7: intra-class variation Shoe class Instance 1 Instance 2 Instance 3

History: early object categorization

Fischler, Elschlager, 1973 • Turk and Pentland, 1991 • Belhumeur, Hespanha, & Kriegman, 1997 • Rowley & Kanade, 1998 • Schneiderman & Kanade 2004 • Viola and Jones, 2000 • Heisele et al., 2001 • Amit and Geman, 1999 • LeCun et al. 1998 • Belongie and Malik, 2002 • DeCoste and Scholkopf, 2002 • Simard et al. 2003 • Poggio et al. 1993 • Argawal and Roth, 2002 • Schneiderman & Kanade, 2004 • …..

~10,000 to 30,000

Three main issues • Representation • How to represent an object category • Learning • How to form the classifier, given training data • Recognition • How the classifier is to be used on novel data

Representation • Generative / discriminative / hybrid

Representation • Generative / discriminative / hybrid • Appearance only or location and appearance

Representation • Generative / discriminative / hybrid • Appearance only or location and appearance • Invariances • View point • Illumination • Occlusion • Scale • Deformation • Clutter • etc.

Representation • Generative / discriminative / hybrid • Appearance only or location and appearance • Invariances • Part-based or global with sub-window

Representation • Generative / discriminative / hybrid • Appearance only or location and appearance • Invariances • Parts or global w/sub-window • Use set of features or each pixel in image

Learning • Unclear how to model categories, so learn rather than manually specify

Learning • Unclear how to model categories, so learn rather than manually specify • Methods of training: generative vs. discriminative

Learning • Unclear how to model categories, so learn rather than manually specify • Methods of training: generative vs. discriminative • Level of supervision • Manual segmentation; bounding box; image labels; noisy labels Contains a motorbike

Learning • Unclear how to model categories, so learn rather than manually specify • Methods of training: generative vs. discriminative • Level of supervision • Manual segmentation; bounding box; image labels; noisy labels • -- Training images: • Issue of over-fitting (typically limited training data) • Negative images for discriminative methods

Learning • Unclear how to model categories, so learn rather than manually specify • Methods of training: generative vs. discriminative • Level of supervision • Manual segmentation; bounding box; image labels; noisy labels • -- Training images: • Issue of over-fitting (typically limited training data) • Negative images for discriminative methods • -- Priors

Recognition • Scale / orientation range to search over • Speed • Context

Recognition • Context enables pruning of detector output Hoiem, Efros, Herbert, 2006

Visual Object Recognition