1 / 13

A Thousand Words in a Scene

A Thousand Words in a Scene. P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept. 2006. Outline. Introduction Image Representation Bag-of-Visterms (BOV) Representation Probabilistic Latent Semantic Analysis (PLSA) Scene Classification Experiments

farrah
Download Presentation

A Thousand Words in a Scene

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept. 2006

  2. Outline • Introduction • Image Representation • Bag-of-Visterms (BOV) Representation • Probabilistic Latent Semantic Analysis (PLSA) • Scene Classification • Experiments • Classification • Image Ranking • Conclusion

  3. Introduction • Main work • Scene modeling and classification • What’s new? • Combine text modeling methods and local invariant features to represent an image. • A text-like bag-of-visterms representation (histogram of quantized local visual features) • Probabilistic Latent Semantic Analysis (PLSA) • Scene classification is based on the image representation • Scenes can be ranked via PLSA

  4. Introduction • Framework An image Interest pointdetector Low level feature extraction Feature Extraction Local descriptors Approach to text-like representation Quantization Text-modeling methods BOV PLSA Classification (SVM) Classification / ranking

  5. Image Representation • Local invariant features • Interest point detection • Extract characteristic points and more generally regions from the images. • Invariant to geometric and photometric transformations, given an image and transformed versions, same points are extracted. • Employ the Difference of Gaussians (DOG) point detector: • Compare a point with its eight neighbors to find minimum/maximum. • Invariant to translation, scale, rotation and illumination variations.

  6. Image Representation • Local descriptors • Compute the descriptor on the region around each interest point. • Use Scale Invariant Feature Transform (SIFT) feature as local descriptor. • Low level feature extraction example Each point has a feature vector of 128D

  7. Image Representation • Quantization • Quantize each local descriptor into a symbol via K-means • Bag-of-visterms representation • Histogram of the visterms • Cons: no spatial information between visterms.

  8. Image Representation • Probabilistic Latent Semantic Analysis (PLSA) • Introduce latent variables zl, called aspect, and associate a zl with each observation (visterm), • Build a joint probability model over images and visterms • Likelihood of the model parameters is • Image representation

  9. Image Representation • Polysemy and synonymy with visterms • Polysemy: a single visterm may represent different scene content. • Synonymy: several visterms may characterized the same image content. • Example: • samples from 3 randomly selected visterms from a vocabulary of size 1000. • not all visterms have a clear semantic interpretation. • Pros of PLSA • Introduce aspect to capture visterm co-occurrence, thus can handle polysemy and synonymy issues.

  10. Experiments • Classification • BOV classification (three-class) • Dataset: indoor, city, landscape • Training&testing: the whole dataset is slip into 10 parts, one for training, the other 9 for testing. • Baseline methods: histograms on low-level features;

  11. Experiments • PLSA classification (three-class) • PLSA-I: use the same part of data to train SVM as well as learning the aspect models. • PLSA-O: use an auxiliarty dataset to learn the aspect models.

  12. Experiments • Aspect-based image ranking • Given an aspect z, images can be ranked according to • Dataset: landscape/city

  13. Conclusion • The proposed scene modeling method is effective for scene classification • A visual scene is presented as a mixture of aspects in PLSA modeling.

More Related