1 / 15

Chapter 4 Clustering

Chapter 4 Clustering . What is Clustering?. The process of organizing objects into groups whose members are similar in some way Statistics, machine learning, and database researchers have studied data clustering Recent emphasis on large datasets. Approaches to Clustering.

molly
Download Presentation

Chapter 4 Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4Clustering

  2. What is Clustering? • The process of organizing objects into groups whose members are similar in some way • Statistics, machine learning, and database researchers have studied data clustering • Recent emphasis on large datasets

  3. Approaches to Clustering • Two main approaches to clustering: • PartitionalClustering • A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset • Hierarchical clustering • A set of nested clusters organized as a hierarchical tree

  4. Problem Statement • N objects to be grouped in kclusters • Different possibilities • If we have 5 objects, to be classified into 2 clusters, what are the number of possibilities? 25 / 2!= 32/2=16 • The objective is to find a grouping such that the distances between objects in a group is minimum

  5. Types • Statistical methods • K-means algorithm • Probabilistic clustering • The agglomerative algorithm • Neural network based approaches • Kohonen’s self organizing maps (SOM) • Evolutionary computing (GA) • Text Clustering

  6. K-means Algorithm • Randomly select k points to be the starting points for the centroids of the k clusters. • Assign each object to the centroid closest to the object, forming k exclusive clusters of examples. • Calculate new centroids of the clusters. Take the average of all the attribute values of the objects belonging to the same cluster. • Check if the cluster centroids have changed their coordinates. If yes, repeat from Step 2. • If no, cluster detection is finished, and all objects have their cluster memberships defined.

  7. K-Means Flowchart

  8. Numerical Example • One-dimensional database with N = 9 • Objects labeled z1…z9 • Let k = 2 • Let us start with z1 to z2 as the initial centroids: z1=2 z2=4 • Compute distance to centroids.

  9. Example - Clustering

  10. Example- Re-compute the Means

  11. Example • Reassign each object to the two clusters based on the new calculations: Centroid-1= 2.5 Centriod-2= 16

  12. Clustering- iteration-2

  13. Example- Re-compute the Means

  14. Clustering- iteration 3 • Reassign each object to the two clusters based on the new calculations: Centroid-1= 3 Centriod-2= 18

  15. Example • No Change in clusters, so the algorithm stops, • The means have converged to their optimal values.

More Related