Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida

Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida Constrained Locally Weighted Clustering

Contents • Introduction • Locally Weighted Clustering • Constrained Clustering • Experiments • Conclusions

Clustering • Clustering is to partition a given dataset into a set of meaningful clusters, so that data objects of a cluster share some similar characteristics. • Data are generally complicated and in high dimensional spaces. The clustering task is non-trivial.

Overview • Clusters reside in subspaces. • Locally Weighted Clustering: each cluster is associated with an independent weighting vector to capture its local correlation structure. • Pairwise instance-level constraints are usually available for clustering practices. • Constrained Clustering: data points are arranged into small groups based on the given constraints, and then, these groups are assigned to feasible closest clusters.

Conventional Clustering • Partitional: [K-Means] • Hierarchical: [Single-link, Complete-Link, Ward’s, Bisection K-Means] • Euclidean distance is used to measure the (dis)similarity between two objects. All dimensions are equally important throughout the whole space.

Challenges • Data reside in high dimensional spaces. • Curse of dimensionality: the space becomes sparse, and objects becomes (equally) far away from each other. • Clusters reside in subspaces. • Different subsets of data may exhibit different correlations; and in each subset, the correlation may vary along different dimensions.

Related Methods • Global projections: dimension reduction and manifold learning [PCA, LPP] • Adaptive dimension selection: [CLIQUE, ProClus] • Adaptive dimension weighting: [LAC]

Dim 1 Dim 2 Dim 3 Dim 1 & 3 Dim 2 & 3 Different Correlation Structures Dim 1 & 2

Iterate until convergence. Euclidean distance: K-Means • Iteratively refine the clustering objective. Start with initial centroids. S1: Assign points to closest centroids. NMI: 0.4628 Rand: 0.7373 S2: Update centroids.

PCA and LPP • Projection directions are defined in order to minimize data distortion. LPP 2-Dim projections PCA 2-Dim projections NMI: 0.5014 Rand: 0.7507 NMI: 0.5294 Rand: 0.7805

Heterogeneous Correlations • Data in a cluster can be strongly correlated in some dimensions, and in the rest dimensions, the data may vary greatly. The correlation structures differ from cluster to cluster. • A dimension is not equally important for all the clusters. • In a cluster, dimensions are not equally important.

Correlations Weights Dim 1 & 3 Dim 2 & 3 A weight vector is associated with a cluster. Dim 1 & 2

Local Weights • A cluster is embedded in the subspace spanned by an adaptive combination of the dimensions. • In the neighborhood of a cluster, weighted Euclidean distance is adopted.

Locally Weighted Clustering • Minimize the sum of weighted distances • Get rid of zero weights by constraints: • Optimal centroids and weights:

Pairwise distances of points in cluster k. Smaller weights Smaller pairwise distances, greater correlations Larger weights Iterate until convergence. Locally Weighted Clustering • Weights of a cluster only depend on data points that belong to this cluster. Start with initial centroids, weights. S1: Assign points to closest centroids. S2: Update centroids, weights Greater pairwise distances, smaller correlations

Objective function: Constraints: LAC • LAC is sensitive to tunable parameter.

Dim 1 & 3 Dim 2 & 3 LWC Dim 1 & 2 NMI: 1 Rand: 1

Constrained Clustering • A pairwise instance-level constraint tells whether the two points belong to the same cluster • Must link • Cannot link • This form of partial knowledge is usually accessible and valuable to clustering practices. • Constrained clustering: utilizes a given set of constraints to derive better data partitions.

Related Methods • Learn a suitable distance metric [RCA, DCA] • Guide the clustering process: • Enforce the constraints [Constrained K-Means] • Penalize constraint violations [CVQE] • Unified method: [MPCK-Means]

Chunklet • Chunklet: ‘a subset of points that are known to belong to the same although unknown class’. • Data objects which are inferred similar, can be placed into the same chunklet. • A set of pairwise constraints can be represented in a Chunklet graph.

Must Link Cannot Link Chunklet Graph

1 4 1 1 1 3 1 Chunklet Graph Must Link Cannot Link the number of points in a chunklet

3 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 Graph Construction • Initially, each point is a chunklet node. • Merge two nodes if they are inferred similar. • An edge is added between two chunklet nodes if they are inferred dissimilar. Do cluster assignment in chunklet.

Chunklet Point x belongs to the chunklet Size of the chunklet C1 C2 Gaussian Data • Assume: Data in each cluster follow a Gaussian distribution. • Two clusters:

Chunklet Assignment • A chunklet can be assigned to a cluster in bulk: • Two neighboring chunklets can be assigned to two different clusters:

Probability of Assignment • In case of two clusters: • One single chunklet is assigned correctly with the probability where • Two neighboring chunklets

1 1 C1 C2 1 K-Means • K-Means • unaware of constraints, • assigns points independently. • The average number of points (in a chunklet) that are assigned correctly to their true cluster: • Each event (a single point): • events are independent; the occurrences follow a Binomial distribution.

1 1 1 1 C1 C1 C2 C2 1 1 Constrained K-Means • Constrained K-Means enforces the constraints strictly. • The average number of correct assignments: Assume the 3 points belong to cluster 1.

1 1 C1 C2 1 Chunklet Assignment • Chunklet is assigned in bulk. • The average number of correct assignments: Similarly, we can analyze the case of two neighboring chunklets.

Chunklet versus One-by-one • It is better to assign points in chunklet. • The bigger the chunklet, the more correct assignments. • It is better to assign two neighboring chunklets together.

Build the chunklet graph. S1: Assign points to closest centroids. Chunklet assignments S2: Update centroids, weights Iterate until convergence. CLWC • Combine local weighting scheme with chunklet assignment. Start with initial centroids, weights.

2 2 3 C2 C1 2 1 2 1 1 C3 Chunklet Assignment • Try to do the most confident assignments first. • If a node has a neighbor, assign they two. • Assign larger chunklets first. • Chunklets are placed in closest feasible clusters.

K-Means 50 links 300 links Better Clustering 4 classes of images (100 each) from COREL DB Ground truth

Techniques: Datasets: Evaluating metrics: Experimental Setup Pairwise constraints

K-Means Hierarchical Clustering Dimension Reduction Manifold Learning LAC LWC Performances

Direct enforcement CLWC Metric learning violation penalty Performances

Performances

Conclusions • An independent weighting vector is used to capture the local correlation structure around a cluster. The weights help define the embedding subspace of a cluster. • Data points are grouped into chunklets based on the input constraints. The points in a chunklet are treated as a whole in the assignment process. Try to do the most confident assignments first (least likely incorrect).

Thank you!

. • .

Efficiency • The cost of each iteration is • Local weighting generally lets the algorithm converge fast. • More constraints, the faster the algorithm converges.

3 1 2 3 1 C1 C2 No feasible assignment. Constraint Violations • No guarantee to satisfy all constraints.

Constraint Violations

Probability Constraints • Use a real value in the range [-1, 1], to denote the similarity between two points, the confidence that the two points are in the same cluster. • Clique: points are similar (with a high similarity value) to each other. • For each point, search a clique (include this point). • The degree of dissimilar between two cliques can be computed. • Do assignment in clique.

Two Neighboring Chunklets • Number of correct assignments:

Dim 1 Dim 2 Dim 3

Hao Cheng, Kien A Hua, and Khanh Vu School of Electrical Engineering and Computer Science University of Central Florida