Agglomerative clustering (AC)

Agglomerative clustering (AC) Clustering algorithms: Part 2c • Pasi Fränti • 25.3.2014 • Speech & Image Processing Unit • School of Computing • University of Eastern Finland • Joensuu, FINLAND

Agglomerative clusteringCategorization by cost function Single link • Minimize distance of nearest vectors Complete link • Minimize distance of two furthest vectors Ward’s method • Minimize mean square error • In Vector Quantization, known as Pairwise Nearest Neighbor (PNN) method We focus on this

Pseudo code

Pseudo code • PNN(X, M) → C, P • FOR i←1 TO N DO • p[i]←i; c[i]←x[i]; • REPEAT • a,b ← FindSmallestMergeCost(); • MergeClusters(a,b); • m←m-1; • UNTIL m=M; O(N) O(N2) N times T(N) = O(N3)

Ward’s method[Ward 1963: Journal of American Statistical Association] Merge cost: Local optimization strategy: Nearest neighbor search: • Find the cluster pair to be merged • Update of NN pointers

Example of distance calculations

Example of the overall process M=5000 M=50 M=5000 M=4999 M=4998 . . . M=50 . . M=16 M=15 M=16 M=15

Detailed example of the process

Example - 25 Clusters MSE ≈ 1.01*109

Storing distance matrix • Maintain the distance matrix and update rows for the changed cluster only! • Number of distance calculations reduces from O(N2) to O(N) for each step. • Search of the minimum pair still requires O(N2) time  still O(N3) in total. • It also requires O(N2) memory.

Heap structure for fast search[Kurita 1991: Pattern Recognition] • Search reduces O(N2)  O(logN). • In total: O(N2 logN)

Store nearest neighbor (NN) pointers[Fränti et al., 2000: IEEE Trans. Image Processing] Time complexity reduces to O(N 3)  Ω (N 2)

Pseudo code • PNN(X, M) → C, P • FOR i←1 TO N DO • p[i]←i; c[i]←x[i]; • FOR i←1 TO N DO • NN[i]← FindNearestCluster(i); • REPEAT • a ← SmallestMergeCost(NN); • b ← NN[i]; • MergeClusters(C,P,NN,a,b,); • UpdatePointers(C,NN); • UNTIL m=M; O(N) O(N2) O(N) O(N) http://cs.uef.fi/pages/franti/research/pnn.txt

Example with NN pointers[Virmajoki 2004: Pairwise Nearest Neighbor Method Revisited ]

ExampleStep 1

ExampleStep 2

ExampleStep 3

ExampleStep 4

ExampleFinal

Time complexities of the variants

Number of neighbors (τ)

Processing time comparison With NN pointers

Algorithm:Lazy-PNN T. Kaukoranta, P. Fränti and O. Nevalainen, "Vector quantization by lazy pairwise nearest neighbor method", Optical Engineering, 38 (11), 1862-1868, November 1999

Monotony property of merge cost [Kaukoranta et al., Optical Engineering, 1999] Merge costs values are monotonically increasing: d(Sa, Sb) d(Sa, Sc) d(Sb, Sc)  d(Sa, Sc)  d(Sa+b, Sc)

Lazy variant of the PNN • Store merge costs in heap. • Update merge cost value only when it appears at top of the heap. • Processing time reduces about 35%.

Combining PNN and K-means K-means

Algorithm:Iterative shrinking P. Fränti and O. Virmajoki “Iterative shrinking method for clustering problems“Pattern Recognition, 39 (5), 761-765, May 2006.

Agglomerative clustering based on merging

Agglomeration based on cluster removal[Fränti and Virmajoki, Pattern Recognition, 2006]

Merge versus removal

Pseudo code of iterative shrinking (IS)

Cluster removal in practice Find secondary cluster: Calculate removal cost for every vector:

Partition updates

Complexity analysis Number of vectors per cluster: If we iterate until M=1: Adding the processing time per vector:

Algorithm:PNN with kNN-graph P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph". IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), 1875-1881, November 2006

Agglomerative clustering with kNN graph

Example of 2NN graph

Example of 4NN graph

Graph using double linked lists

Merging a and b

Agglomerative clustering (AC)

Agglomerative clustering (AC)

Presentation Transcript

Semantic Smoothing of Document Models for Agglomerative Clustering

MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays

Clustering: Partition Clustering

Fast Agglomerative Clustering for Rendering

Bad Luck Mike

Genesis, chapter 5

FAST AND SIMPLE AGGLOMERATIVE LBVH CONSTRUCTION

AC

An Enhanced Agglomerative Clustering Algorithm for Solving Vehicle Routing Problem

Fast Agglomerative Clustering for Rendering

PCluster: Probabilistic Agglomerative Clustering of Gene Expression Profiles

AC Repair Service Near Me 9266608882