Learning Hierarchical Models of Scenes, Objects and Parts

Learning Hierarchical Models of Scenes, Objects and Parts E. Sudderth et al.

Introduction • Generative models for detection and recognition of objects in natural and cluttered scenes • Take advantage of similarities between object categories which improves performance • Using contextual knowledge about objects found in a scene and common spatial relationships between those objects Scene object part features

The Generative Model Multinomial distribution Normal distribution

SIFT descriptors and K-means clustering used to create a visual words • Information is shared in 2 ways: • Parts combine the same features in different spatial configurations • Objects reuse the same parts in different proportions • The multinomial distributions have Dirichlet priors while the Gaussian distributions have the inverse-Wishart priors

Related Models • Similar to the author-topic model • Different from previous models because it incorporates x (the geometry or position of parts) solved using EM • Has capability of training from few examples and sharing of parts leads to simple learning algorithms which scale linearly with the number of parts

Learning Objects with shared parts 1) 2) Not considering the reference position 3) Total cost of Gibbs sampling update for M images containing N features and P parts is O(NMP)

r is unobserved and use EM to compute the mode of these parameters 1) E-step: get the expected rm 2) M-step: get the maximum likelihood estimates of the parameters EM updates applied after every Gibbs sampling operation where

Object Detection and Recognition • Computing the likelihood that the features in the test image t are generated by object category o • And is approximated as • S is samples • is the approximate modes of the posterior distribution • Without the reference position

Object Categorization Experiments • 16 categories – 7 animal faces, 5 animals and 4 vehicle types • 30 training examples from each of the 16 categories with 300 visual words, 32 shared parts • Alignment of images already done, so no need to infer reference image • Learning procedure was found not very sensitive to hyperparameter values.

Shared parts

Detection and Recognition • 100 training images to learn the background parts and use likelihood to classify test image as background or object • Compared shared model with unshared models • Also compared the model to only bag of words model • α affects the level of sharing • Large α increases sharing hence better detection while small α reduces sharing but slightly increases recognition • A good value for α = 1/P where P is number of parts

Hierarchical models for scenes

Single car Multiple cars problem • 72 images, 26 fully labeled, remaining labeled for cars • 40 images used for training, 6 shared parts Lighting problem

Conclusion • They described a hierarchical model for scenes, objects and parts • Showed importance of spatial information • Showed that sharing parts helps in learning from few examples and has performance benefits

References • Learning Hierarchical Models of Scenes, Objects, and Parts http://ssg.mit.edu/~esuddert/papers/iccv05.pdf • Slides of Fei-Fei

Learning Hierarchical Models of Scenes, Objects and Parts