200 likes | 303 Views
Computer Engineering Department. 21/03/2010. Local versus Global Interactions in Clustering Algorithms. Wesam M. Ashour. Computer Engineering Department. 21/03/2010. Outline. Clustering? - K-means Clustering Algorithm New algorithms - Weighted K-means (WKM)
E N D
Computer Engineering Department 21/03/2010 Local versus Global Interactions in Clustering Algorithms Wesam M. Ashour
Computer Engineering Department 21/03/2010 Outline • Clustering? - K-means Clustering Algorithm • New algorithms - Weighted K-means (WKM) - Inverse Weighted K-means (IWKM) • Topology-Preserving mappings • - Generative Topographic Mapping (GTM) - Inverse-Weighted K-means Topology-Preserving Map (IKToM)
Computer Engineering Department 21/03/2010 Clustering? • Cluster: a collection of data objects • Objects are similar to objects in same cluster • Objects are dissimilar to objects in other clusters • Cluster analysis • Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups • Clustering is unsupervised learning: no predefined classes
Computer Engineering Department 21/03/2010 Clustering? • Partitioning Algorithms • Hierarchical Algorithms • Density based Algorithms • Grid based Algorithms • Graph based Algorithms • Model based Algorithms
Computer Engineering Department 21/03/2010 Clustering? • Pattern Recognition • Compression • Web documents • Biology • Marketing
Computer Engineering Department 21/03/2010 Background K-means The algorithm tries to locate K prototypes throughout a data set in such a way that the K prototypes in some way best represent the data. Disadvantage Specify the number of clusters in advance Sensitivity to prototypes initialization Dead Prototypes Converge to local optimum
Computer Engineering Department 21/03/2010 Weighted K-Means (WKM) • The Performance function for K-means may be written as (1) • Optimization x1 m1 m2 x3 x2 m3
Computer Engineering Department 21/03/2010 Weighted K-Means (cont.) • Consider the following performance function: (2) • Optimization x1 m1 m2 x3 x2 m3
Computer Engineering Department 21/03/2010 Weighted K-Means (cont.) • We wish to form a performance function with following properties: • Minimum performance gives good clustering • Creates a relationship between all data points and all prototypes (3)
Computer Engineering Department 21/03/2010 Weighted K-Means (cont.) Batch Mode All data points come together
Computer Engineering Department 21/03/2010 Weighted K-Means (cont.) • Optimization: generate two sets of updates Let mr be the closest prototype to xi, then Batch Mode (5) (4) Where Vkis the index of data points that are closest to mk and Vj is the index of the other points
Computer Engineering Department 21/03/2010 Weighted K-Means (cont.) • Problem which needs to be solved! (7)
Computer Engineering Department 21/03/2010 Inverse-Weighted K-Means (IWKM) (10) • Optimization • Batch Mode • Find the partial derivative of the performance with respect to mk, assign to zero and then solve for mk (11)
Computer Engineering Department 21/03/2010 Simulation Example 2 Example 1 IWKM IWKM K-means K-means
Computer Engineering Department 21/03/2010 Simulation Example 3 IWKM K-means
Computer Engineering Department 21/03/2010 Simulation Example 5 Example 4 : IWKM IWKM KHMO
Computer Engineering Department 21/03/2010 Inverse-weighted K-means Topology-Preserving Map (IKToM) • Has the same structure as GTM • K latent points in a latent space with some structure • Mapped through M basis functions to feature space • Then mapped to data space to K points using weights W, mk=ΦkW • Use IWKM to find mk
Computer Engineering Department 21/03/2010 Simulation Example 1: Genes data set (40 samples, 3036 dimensions, 3 types) Example 2: Algae data set (72 samples, 18 dimensions, 9 types) Example 3: Glass data set ( 218 samples, 10 dimensions, 6 types
Computer Engineering Department 21/03/2010 Conclusion • WKM and IWKM • Solves the problem of sensitivity to initial conditions in K-means • Provides two sets of updates • Works well in high dimensional data sets • Can be extended for visualization • Visualization • Extension of IWKM • Has the same structure as GTM
Computer Engineering Department 21/03/2010 Thank You Any please question ?