1 / 24

Introduction to Hierarchical Clustering Analysis

Introduction to Hierarchical Clustering Analysis. Dinh Dong Luong. Introduction. Data clustering concerns how to group a set of objects based on their similarity of attributes and/or their proximity in the vector space. Main methods Partitioning : K-Means… Hierarchical : BIRCH,ROCK,…

tranemily
Download Presentation

Introduction to Hierarchical Clustering Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Hierarchical Clustering Analysis Dinh Dong Luong

  2. Introduction • Data clustering concerns how to group a set of objects based on their similarity of attributes and/or their proximity in the vector space. • Main methods • Partitioning : K-Means… • Hierarchical : BIRCH,ROCK,… • Density-based: DBSCAN,… • A good clustering method will produce high quality clusters with • high intra-class similarity: cohesive within clusters • low inter-class similarity: distinctive between clusters

  3. Stages in clustering

  4. Clustering Algorithms A. Distance and Similarity Measures B. Hierarchical Clustering — Agglomerative Single linkage, complete linkage, group average linkage, median linkage, centroid linkage, balanced iterative reducing and clustering using hierarchies (BIRCH), clustering using representatives (CURE), robust clustering using links (ROCK) — Divisive divisive analysis (DIANA), monothetic analysis (MONA)

  5. Distance and Similarity Measures

  6. Similarity Measurements • Pearson Correlation Two profiles (vectors) and +1  Pearson Correlation  – 1

  7. Similarity Measurements • Pearson Correlation: Trend Similarity

  8. Similarity Measurements • Euclidean Distance

  9. Similarity Measurements • Euclidean Distance: Absolute difference

  10. Similarity Measurements • Cosine Correlation +1  Cosine Correlation  – 1

  11. Similarity Measurements • Cosine Correlation: Trend + Mean Distance

  12. Similarity Measurements

  13. Similarity Measurements Similar?

  14. Taxonomy of Clustering Approaches

  15. Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

  16. Hierarchical Clustering Calculate the similarity between all possible combinations of two profiles • Keys • Similarity • Clustering Two most similar clusters are grouped together to form a new cluster Calculate the similarity between the new cluster and all remaining clusters.

  17. General agglomerative clustering

  18. Clustering C1 Merge which pair of clusters? C2 C3

  19. Clustering Single Linkage Dissimilarity between two clusters = Minimum dissimilarity between the members of two clusters + + C2 C1

  20. Clustering Complete Linkage Dissimilarity between two clusters = Maximum dissimilarity between the members of two clusters + + C2 C1

  21. Clustering Average Linkage Dissimilarity between two clusters = Averaged distances of all pairs of objects (one from each cluster). + + C2 C1

  22. Clustering Average Group Linkage Dissimilarity between two clusters = Distance between two cluster means. + + C2 C1

  23. My Idea Presentation

  24. Future Work • Step 1: Use a simple hierarchical algorithms with moment features to run and evaluate clustering results. • Step 2: Find out good features for clustering on our dataset by trying some feature variance (Haar-like, shape quantization,…). • Step 3: Choose an optimal hierarchical clustering algorithm

More Related