390 likes | 636 Views
PARTITIONAL CLUSTERING. Deniz ÜSTÜN. CONTENT. WHAT IS CLUSTERING ? WHAT IS PARTITIONAL CLUSTERING ? THE USED ALGORITHMS IN PARTITIONAL CLUSTERING. What is Clustering ?.
E N D
PARTITIONAL CLUSTERING Deniz ÜSTÜN
CONTENT • WHAT IS CLUSTERING ? • WHAT IS PARTITIONAL CLUSTERING ? • THE USED ALGORITHMS IN PARTITIONAL CLUSTERING
What is Clustering ? • A process of clustering is classification of the objects which are similar among them, and organizing of data into groups. • The techniques for Clustering are among the unsupervised methods.
What is Partitional Clustering ? • The Partitional Clustering Algorithms separate the similar objects to the Clusters. • The Partitional Clustering Algorithms are succesful to determine center based Cluster. • The Partitional Clustering Algorithms divide n objects to k cluster by using k parameter. • The techniques of the Partitional Clustering start with a randomly chosen clustering and then optimize the clustering according to some accuracy measurement.
The Used Algorithms in Partitional Clustering • K-MEANS ALGORITHM • K-MEDOIDS ALGORITHM • FUZZY C-MEANS ALGORITHM
K-MEANS ALGORITHM • K-MEANS algorithm is introduced as one of the simplest unsupervised learning algorithms that resolve the clustering problems by J.B. MacQueen in 1967 (MacQueen, 1967). • K-MEANS algorithm allows that one of the data belong to only a cluster. • Therefore, this algorithm is a definite clustering algorithm. • Given the N-sample of the clusters in the N-dimensional space.
K-MEANS ALGORITHM • This space is separated, {C1,C2,…,Ck} the K clusters. • The vector mean (Mk) of the Ck cluster is given (Kantardzic, 2003) : where the value of Xk is i.sample belong to Ck. The square-error formula for the Ck is given :
K-MEANS ALGORITHM • The square-error formula for the Ck is called the changing in cluster. • The square-error for all the clusters is the sum of the changing in clusters. • The aim of the square-error method is to find the K clusters that minimize the value of the Ek2 according to the value of the given K
K-MEDOIDS ALGORITHM • The aim of the K-MEDOIDS algorithm is to find the K representative objects (Kaufman and Rousseeuw, 1987). • Each cluster in K-MEDOIDS algorithm is represented by the object in cluster. • K-MEANS algorithm determine the clusters by the mean process. However, K-MEDOIDS algorithm find the cluster by using mid-point.
K-MEDOIDS ALGORITHMEXAMPLE-1 Select the Randomly K-Medoids
K-MEDOIDS ALGORITHMEXAMPLE-1 Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHMEXAMPLE-1 Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHMEXAMPLE-1 Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHMEXAMPLE-1 Determine New Medoid for Each Cluster
K-MEDOIDS ALGORITHMEXAMPLE-1 Determine New Medoid for Each Cluster
K-MEDOIDS ALGORITHMEXAMPLE-1 Allocate to Each Point to Closest Medoid
K-MEDOIDS ALGORITHMEXAMPLE-1 Stop Process
FUZZY C-MEANS ALGORITHM • Fuzzy C-MEANS algorithm is the best known and widely used a method. • Fuzzy C-MEANS algorithm is introduced by DUNN in 1973 and improved by BEZDEK in 1981 [Höppner vd, 2000]. • Fuzzy C-MEANS lets that objects are belonging to two and more cluster. • The total value of the membership of a data for all the classes is equal to one. • However, the value of the memebership of the cluster that contain this object is high than other clusters. • This Algorithm is used the least squares method [Höppner vd, 2000].
FUZZY C-MEANS ALGORITHM The algorithm start by using randomly membership matrix (U) and then the center vector calculate [Höppner vd, 2000].
FUZZY C-MEANS ALGORITHM According to the calculated center vector, the membership matrix (u) is computed by using the given as: The new membership matrix (unew) is compared with the old membership matrix (uold) and the the process continues until the difference is smaller than the value of the ε
FUZZY C-MEANS ALGORITHMEXAMPLE C=3 m=5 ε=1e-6
Results • K-MEDOIDS is the best algorithm according to K-MEANS and FUZZY C-MEANS. • However, K-MEDOIDS algorithm is suitable for small datasets. • K-MEANS algorithm is the best appropriate in terms of time. • In FUZZY C-MEANS algorithm, a object can belong to one or more cluster. • However, a object can belong to only a cluster in the other two algorithms.
References • [MacQueen, 1967] J.B., MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations”, Proc. Symp. Math. Statist.and Probability (5th), 281-297,(1967). • [Kantardzic, 2003] M., Kantardzic, “Data Mining: Concepts, Methods and Algorithms”, Wiley, (2003). • [Kaufman and Rousseeuw, 1987] L., Kaufman, P. J., Rousseeuw, “Clustering by Means of Medoids,” Statistical Data Analysis Based on The L1–Norm and Related Methods, edited by Y. Dodge, North-Holland, 405–416, (1987). • [Kaufman and Rousseeuw, 1990] L., Kaufman, P. J., Rousseeuw, “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley and Sons., (1990). • [Höppner vd, 2000] F., Höppner, F., Klawonn, R., Kruse, T., Runkler, “Fuzzy Cluster Analysis”, John Wiley&Sons, Chichester, (2000). • [Işık and Çamurcu, 2007] M., Işık, A.Y., Çamurcu, “K-MEANS, K-MEDOIDS ve Bulanık C-MEANS Algoritmalarının Uygulamalı olarak Performanslarının Tespiti”, İstanbul Ticaret Üniversitesi Fen Bilimleri Dergisi, Sayı :11, 31-45, (2007).