A Thousand Words in a Scene

A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept. 2006

Outline • Introduction • Image Representation • Bag-of-Visterms (BOV) Representation • Probabilistic Latent Semantic Analysis (PLSA) • Scene Classification • Experiments • Classification • Image Ranking • Conclusion

Introduction • Main work • Scene modeling and classification • What’s new? • Combine text modeling methods and local invariant features to represent an image. • A text-like bag-of-visterms representation (histogram of quantized local visual features) • Probabilistic Latent Semantic Analysis (PLSA) • Scene classification is based on the image representation • Scenes can be ranked via PLSA

Introduction • Framework An image Interest pointdetector Low level feature extraction Feature Extraction Local descriptors Approach to text-like representation Quantization Text-modeling methods BOV PLSA Classification (SVM) Classification / ranking

Image Representation • Local invariant features • Interest point detection • Extract characteristic points and more generally regions from the images. • Invariant to geometric and photometric transformations, given an image and transformed versions, same points are extracted. • Employ the Difference of Gaussians (DOG) point detector: • Compare a point with its eight neighbors to find minimum/maximum. • Invariant to translation, scale, rotation and illumination variations.

Image Representation • Local descriptors • Compute the descriptor on the region around each interest point. • Use Scale Invariant Feature Transform (SIFT) feature as local descriptor. • Low level feature extraction example Each point has a feature vector of 128D

Image Representation • Quantization • Quantize each local descriptor into a symbol via K-means • Bag-of-visterms representation • Histogram of the visterms • Cons: no spatial information between visterms.

Image Representation • Probabilistic Latent Semantic Analysis (PLSA) • Introduce latent variables zl, called aspect, and associate a zl with each observation (visterm), • Build a joint probability model over images and visterms • Likelihood of the model parameters is • Image representation

Image Representation • Polysemy and synonymy with visterms • Polysemy: a single visterm may represent different scene content. • Synonymy: several visterms may characterized the same image content. • Example: • samples from 3 randomly selected visterms from a vocabulary of size 1000. • not all visterms have a clear semantic interpretation. • Pros of PLSA • Introduce aspect to capture visterm co-occurrence, thus can handle polysemy and synonymy issues.

Experiments • Classification • BOV classification (three-class) • Dataset: indoor, city, landscape • Training&testing: the whole dataset is slip into 10 parts, one for training, the other 9 for testing. • Baseline methods: histograms on low-level features;

Experiments • PLSA classification (three-class) • PLSA-I: use the same part of data to train SVM as well as learning the aspect models. • PLSA-O: use an auxiliarty dataset to learn the aspect models.

Experiments • Aspect-based image ranking • Given an aspect z, images can be ranked according to • Dataset: landscape/city

Conclusion • The proposed scene modeling method is effective for scene classification • A visual scene is presented as a mixture of aspects in PLSA modeling.

A Thousand Words in a Scene

A Thousand Words in a Scene

Presentation Transcript

a picture is worth . . . a thousand words

A Picture is Worth a Thousand Words:

A Picture is Worth a Thousand Words

A Picture Worth a Thousand Words

A Picture is Worth a Thousand Words

A picture is worth a thousand words . . . .

A PICTURE IS WORTH A THOUSAND WORDS

A Picture is worth a Thousand words

A Picture is Worth A Thousand Words

Can a picture tell a thousand words?

A picture tells a thousand words

A picture is worth a thousand words!

A picture tells a thousand words:

A Picture Is Worth A Thousand Words

A SMILE IS WORTH A THOUSAND WORDS

Words Can Fly A Thousand Miles

“A Picture’s Worth a Thousand Words”

“A Picture is Worth a Thousand Words”

A Picture Is Worth A Thousand Words

A Map is worth a thousand words

A Picture is Worth a Thousand Words

A Picture is Worth a Thousand Words