1 / 27

Shiri Gordon Electrical Engineering – System, Faculty of Engineering, Tel-Aviv University

Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles. Shiri Gordon Electrical Engineering – System, Faculty of Engineering, Tel-Aviv University Under the supervision of: Doctor Hayit Greenspan.

balin
Download Presentation

Shiri Gordon Electrical Engineering – System, Faculty of Engineering, Tel-Aviv University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System, Faculty of Engineering, Tel-Aviv University Under the supervision of: Doctor Hayit Greenspan

  2. Introduction :Content-Based Image Retrieval (CBIR) • The interest in Content-Based Image Retrieval (CBIR) and efficient image search algorithms has grown out of the necessity of managing large image databases • Most CBIR systems are based on search-by-query • The user provides an example image • The database is searched exhaustively for images which are most similar to the query

  3. CBIR: Issues • Image representation • Distance measure between images • Image search algorithms • Qbic - IBM Blobworld – Berkeley Photobook – MIT VisualSEEk – Colombia

  4. What is Image Clustering ?? • Performing supervised / unsupervised mapping of the archive images into classes • The classes should provide the same information about the image archive as the entire image collection

  5. Why do we need Clustering ?? • Faster search-by-query algorithms Browsing environment Image categorization Images Clustercenter Queryimage

  6. Why do we need Clustering ?? • Faster search-by-query algorithms Browsing environment Image categorization Images Clustercenter

  7. “Yellow” “Blue” “Green” Why do we need Clustering ?? • Faster search-by-query algorithms Browsing environment Image categorization Images Clustercenter

  8. GMM-IB System Block-Diagram Clustering via Information-Bottleneck (IB) method Image GMM Image Clusters Cluster GMM Images

  9. Image Representation[ “Blobworld”: Belongie, Carson, Greenspan, Malik, PAMI 2002] Pixels Featurevectors Regions • Feature space= color (CIE-lab); Spatial (x,y); … • Grouping the feature vectors in a 5-dimensional space • Image is modeled as a Gaussian mixture distribution in feature space

  10. Image Representation via Gaussian Mixture Modeling (GMM) EM • Feature Space GMM • Parameter set : • Expectation-maximization (EM) algorithm- to determine the maximum likelihood parameters of a mixture of k Gaussians • Initialization of the EM algorithm via K-means • Model selection via MDL (Minimum Description Length)

  11. GMM 5-dimensional space: Color (L*a*b) & Spatial (x,y)

  12. Image Models Category Model Variability in colors per spatial location Variability in location per spatial color Category GMM Images

  13. KL distance between Image model to category model: Image\category (1) (2) (3) (4) (1)monkey 6.5 32.5 34.8 16.4 (2)snow 29.6 10.4 42.1 30.4 (3)sunset 30.2 36.3 14.2 27.7 (4)flowers 14.4 28.7 29.1 8.5 GMM – KL Framework [Greenspan, Goldberger, Ridel . CVIU 2001] • Kullback-Leibler (KL) distance between distributions: Feature set extracted from image Data set size Image distribution Category distribution

  14. Unsupervised Clustering using the Information-Bottleneck (IB) principle • The desired clustering is the one that minimizes the loss of mutual information between objects and features extracted from them • The information contained in the objects about the features is ‘squeezed’ through a compact ‘bottleneck’ of clusters N.Slonim, N.Tishby. In Proc. of NIPS 1999

  15. Objects Information Bottleneck Principle Motivation Clusters Features Number ofrequired clusters

  16. Prior probability KL distance: Information Bottleneck PrincipleGreedy Criterion • The minimization problem posed by the IB principle can be approximated by various algorithms using a greedy merging criterion:

  17. Images Prior probability KL distance GMM-IB Framework Feature vectors Image clusters

  18. Example 8 7 6 5 4 3 2 1 0

  19. ResultsAIB - Optimum number of clusters Loss of mutual information during the clustering process

  20. ResultsAIB - Generated Tree ?

  21. Mutual Information as a quality measure The reduction in the uncertainty of X based on the knowledge of Y: No closed-form expression for a mixture of Gaussian distribution The greedy criterion derived from the IB principle provides a tool for approximating this measure

  22. Mutual Information as a quality measureExample C1 C2 C3

  23. Results Image database of 1460 images selectively hand-picked from the COREL database to create 16 labeled categories Building the GMM model for each image Applying the various algorithms, using various image representations to the database

  24. Comparing between clustering methodologies ResultsRetrieval Experiments Clustering for efficient retrieval

  25. Comparing between clustering algorithms Clustering method I(C;Y) AIB 1.63 K-means + reduced GMM 1.68 SIB + average GMM 1.67 ResultsMutual Information as a quality measure Comparing between image representations

  26. Summary • Image clustering is done using the IB method • IB is applied on continuous representations of images and categories with Gaussian Mixture Models • From the AIB algorithm : • We conclude the optimal number of clusters in the database • We have a “built-in” distance measure • The database is arranged in a tree structure that provides a browsing environment and more efficient search algorithms • The tree can be modified using algorithms like the SIB and K-means to achieve a more stable solution

  27. Future Work • Making the current framework more feasible for large databases: • A simpler approximation for the KL-distance • Incorporating the reduced category GMM into the clustering algorithms • Performing relaxation on the hierarchical tree structure • Using the tree structure for the creation of a “user-friendly” environment • Extending the feature space

More Related