280 likes | 511 Views
An Impossibility Theorem for Clustering . By Jon Kleinberg. Definitions. Clustering function: operates on a set S of more than 2 points and the distances among them where is a partition of S Distance function: the distance is 0 only for d(i,i)
E N D
An Impossibility Theorem for Clustering By Jon Kleinberg
Definitions • Clustering function: operates on a set S of more than 2 points and the distances among them where is a partition of S • Distance function: the distance is 0 only for d(i,i) • Does not require the triangle inequality.
Many different clustering criteria • k-center • k-median • k-means • Inter-Intra • etc
k-Center Minimize maximum distance
k-median Minimize average distance k-means: minimize distance squared
Inter-Intra T(C) D(C) Maximize D(C) – T(C)
Motivation • Each criterion optimizes different features • Is there one clustering criterion with phenomenal cosmic powers?
Method • Give three intuitive axioms that any criterion should satisfy • Surprise: Not possible to satisfy all three • Reminiscent of Arrow’s Impossibility theorem: ranking is impossible
Axiom 1 – Scale-Invariance • For any distance function d and any β >0 we have that f(S,d)=f(S,βd)
Axiom 2 - Richness • Range(f) is equal to all partitions of S • i.e. All possible clusterings can be generated given the right distances
d(i,j) d’(i,j) d(i,j) d’(i,j) Axiom 3 - Consistency • Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=
Definition • Anti-chain: A collection of partitions is an anti-chain if it does not contain two distinct partitions such that one is a refinement of the other • Anti-Chains can not satisfy Richness
Main Result • For each , there is no clustering function f that satisfies Scale-Invariance, Richness and Consistency • Implied by proof that if f satisfies Scale-Invariance and Consistency, then Range(f) is an anti-chain
Reminder of Axioms • Scale-Invariance: For any distance function d and any β >0 we have that f(d)=f(β d) • Richness: Range(f) is equal to all partitions of S • Consistency: Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=
Single Linkage • Cluster by combining the closest points 0 1 4 9 10 12 15 19 20
Any two axioms • For every pair of axioms, there is a stopping condition for single linkage • Consistency + Richness: only link if distance is less than r • Consistency + SI: stop when you have k connected components • Richness + SI: if x is the diameter of the graph, only add edges with weight βx
Centroid-Based Clustering • (k,g)-centroid clustering function: Choose T, a set of k centroid points such that is minimized • If g is identity, we get k-median, etc. • Result: For every and every function g and n significantly larger than k the (k,g)-centroid clustering function does not satisfy consistency.
Proof: A contradiction X (size m) Y (size λm) r+δ ε r
A new distance function X0 (size m/2) Y (size λm) r’ < r r’ r+δ ε r r+δ r’ X1 (size m/2)
Wrapping Up • If we pick λ, r, r’, ε and δ right then we can have: • But then our new centers are in X0 and X1 • But our new distance followed consistency, so it should give us X and Y. • This covers the case where k is 2.
Discussion: Relaxing Axioms • Refinement-consistency: if d’ is an f(d)-transformation of d, then f(d’) is a refinement of f(d) • Near-Richness: all partitions except the trivial one can be obtained • These together allow a function that satisfies these replacements. • What other relaxations could we have?
Discussion • Does this mean there is a law of continuous employment for clustering criterion creators? • Is the clustering function properly defined? • Allow overlaps • Allow outliers • Are these the right axioms? • All partitions possible vs. power set • Axioms for graph clustering?