140 likes | 285 Views
Clustering. Definition. Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
E N D
Definition • Clustering is “the process of organizing objects into groups whose members are similar in some way”. • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
Pengklusteranmerupakanpengelompokan record, pengamatan, ataumemperhatikandanmembentukkelasobjek-objek yang memilikikemiripan. • Beberapaalgoritmapengelompokkandiantaranyaadalah EM dan Fuzzy C-Means
Clustering Main Features • Clustering – a data mining technique • Usage: • Statistical Data Analysis • Machine Learning • Data Mining • Pattern Recognition • Image Analysis • Bioinformatics
How many clusters? Six Clusters Two Clusters Four Clusters Notion of a Cluster can be Ambiguous
Distance based method • In this case we easily identify the 4 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are “close” according to a given distance. This is called distance-based clustering.
Limitations of K-means: Non-globular Shapes Original Points K-means (2 Clusters)
Limitations of K-means: Differing Sizes K-means (3 Clusters) Original Points
Types of Clustering • Hierarchical • Finding new clusters using previously found ones • Partitional • Finding all clusters at once
A Partitional Clustering Partitional Clustering Original Points
Hierarchical Clustering Traditional Hierarchical Clustering Traditional Dendrogram Non-traditional Hierarchical Clustering Non-traditional Dendrogram
AlgoritmaPengelompokan K-Means Langkah-langkahalgoritma K-Means: • Tentukanberapakelompok yang akandibuatsebanyak k kelompok. • Secarasembarangpilih k buahcatatan yang adasebagaipusat-pusatkeompokawal. • Setiapcatatanakanditentukanpusatkelompokterdekatnya. • Perbaruipusat-pusatkelompok. • Pusatkelompok yang terdekatpadasetiapcatatanakanditentukan, danseterusnyasampainilairasiotidakmembesarlagi.
RumusJarakduatitik: Between Cluster Variation (BCV): BCV=d(m1,m2)+d(m1,3)+d(m2,m3) Dalamhalini, d(mi,j) menyatakanjarak mikemj Within Cluster Variation (WCV): WCV=(jarakpusattiap cluster yang paling minimum)2