240 likes | 252 Views
Introduction to Hierarchical Clustering Analysis. Dinh Dong Luong. Introduction. Data clustering concerns how to group a set of objects based on their similarity of attributes and/or their proximity in the vector space. Main methods Partitioning : K-Means… Hierarchical : BIRCH,ROCK,…
E N D
Introduction to Hierarchical Clustering Analysis Dinh Dong Luong
Introduction • Data clustering concerns how to group a set of objects based on their similarity of attributes and/or their proximity in the vector space. • Main methods • Partitioning : K-Means… • Hierarchical : BIRCH,ROCK,… • Density-based: DBSCAN,… • A good clustering method will produce high quality clusters with • high intra-class similarity: cohesive within clusters • low inter-class similarity: distinctive between clusters
Clustering Algorithms A. Distance and Similarity Measures B. Hierarchical Clustering — Agglomerative Single linkage, complete linkage, group average linkage, median linkage, centroid linkage, balanced iterative reducing and clustering using hierarchies (BIRCH), clustering using representatives (CURE), robust clustering using links (ROCK) — Divisive divisive analysis (DIANA), monothetic analysis (MONA)
Similarity Measurements • Pearson Correlation Two profiles (vectors) and +1 Pearson Correlation – 1
Similarity Measurements • Pearson Correlation: Trend Similarity
Similarity Measurements • Euclidean Distance
Similarity Measurements • Euclidean Distance: Absolute difference
Similarity Measurements • Cosine Correlation +1 Cosine Correlation – 1
Similarity Measurements • Cosine Correlation: Trend + Mean Distance
Similarity Measurements Similar?
Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.
Hierarchical Clustering Calculate the similarity between all possible combinations of two profiles • Keys • Similarity • Clustering Two most similar clusters are grouped together to form a new cluster Calculate the similarity between the new cluster and all remaining clusters.
Clustering C1 Merge which pair of clusters? C2 C3
Clustering Single Linkage Dissimilarity between two clusters = Minimum dissimilarity between the members of two clusters + + C2 C1
Clustering Complete Linkage Dissimilarity between two clusters = Maximum dissimilarity between the members of two clusters + + C2 C1
Clustering Average Linkage Dissimilarity between two clusters = Averaged distances of all pairs of objects (one from each cluster). + + C2 C1
Clustering Average Group Linkage Dissimilarity between two clusters = Distance between two cluster means. + + C2 C1
Future Work • Step 1: Use a simple hierarchical algorithms with moment features to run and evaluate clustering results. • Step 2: Find out good features for clustering on our dataset by trying some feature variance (Haar-like, shape quantization,…). • Step 3: Choose an optimal hierarchical clustering algorithm