240 likes | 253 Views
Learn about the main methods and stages in clustering algorithms, including hierarchical clustering techniques, similarity measurements, and taxonomy of clustering approaches. Discover how to calculate similarities, merge clusters, and choose the optimal hierarchical algorithms. Explore future work ideas for implementing clustering on datasets effectively.
E N D
Introduction to Hierarchical Clustering Analysis Dinh Dong Luong
Introduction • Data clustering concerns how to group a set of objects based on their similarity of attributes and/or their proximity in the vector space. • Main methods • Partitioning : K-Means… • Hierarchical : BIRCH,ROCK,… • Density-based: DBSCAN,… • A good clustering method will produce high quality clusters with • high intra-class similarity: cohesive within clusters • low inter-class similarity: distinctive between clusters
Clustering Algorithms A. Distance and Similarity Measures B. Hierarchical Clustering — Agglomerative Single linkage, complete linkage, group average linkage, median linkage, centroid linkage, balanced iterative reducing and clustering using hierarchies (BIRCH), clustering using representatives (CURE), robust clustering using links (ROCK) — Divisive divisive analysis (DIANA), monothetic analysis (MONA)
Similarity Measurements • Pearson Correlation Two profiles (vectors) and +1 Pearson Correlation – 1
Similarity Measurements • Pearson Correlation: Trend Similarity
Similarity Measurements • Euclidean Distance
Similarity Measurements • Euclidean Distance: Absolute difference
Similarity Measurements • Cosine Correlation +1 Cosine Correlation – 1
Similarity Measurements • Cosine Correlation: Trend + Mean Distance
Similarity Measurements Similar?
Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.
Hierarchical Clustering Calculate the similarity between all possible combinations of two profiles • Keys • Similarity • Clustering Two most similar clusters are grouped together to form a new cluster Calculate the similarity between the new cluster and all remaining clusters.
Clustering C1 Merge which pair of clusters? C2 C3
Clustering Single Linkage Dissimilarity between two clusters = Minimum dissimilarity between the members of two clusters + + C2 C1
Clustering Complete Linkage Dissimilarity between two clusters = Maximum dissimilarity between the members of two clusters + + C2 C1
Clustering Average Linkage Dissimilarity between two clusters = Averaged distances of all pairs of objects (one from each cluster). + + C2 C1
Clustering Average Group Linkage Dissimilarity between two clusters = Distance between two cluster means. + + C2 C1
Future Work • Step 1: Use a simple hierarchical algorithms with moment features to run and evaluate clustering results. • Step 2: Find out good features for clustering on our dataset by trying some feature variance (Haar-like, shape quantization,…). • Step 3: Choose an optimal hierarchical clustering algorithm