KCK-means A Clustering Method based on Kernel Canonical Correlation Analysis

KCK-meansA Clustering Method based on KernelCanonical Correlation Analysis Dr. Yingjie Tian

Outline • Motivation & Challenges • KCCA, Kernel Canonical Correlation Analysis • Our method: KCK-means • Experiments • Conclusions

Motivation • Previous Similarity Metrics • Euclidean distance • Squared Mahalanobis distance • Mutual neighbor distance • … • Fail when there are non-linear correlation between attributes

Motivation • In some interesting application domains, attributes can be naturally split into two subsets, either of which suffices for learning • Intuitively, there may be some projections can reveal the ground truth in these two views • KCCA is a technique that can extract common features from a pair of multivariate data • It is the most promising candidate

Canonical Correlation Analysis(1/2) • X = {x1, x2, … , xl} and Y = {y1, y2, … , yl} denote two views • CCA finds projection vectors wx and wymax the correlation coefficient between and • That is: • Cxyis the between-sets covariance matrix of X and Y, Cxx and Cyy are within-sets covariance matrices.

Canonical Correlation Analysis(2/2) • Cyy is invertible, then solving for the generalized eigenvectors, then we can obtain the sequence of wx’s and then find the corresponding wy’s by using

Why Kernel CCA • Why use Kernel extension of CCA? • CCA may not extract useful descriptors of the data because of its linearity • In order to find nonlinear correlated projections • Sx = { }, Sy= { } KCCA maps xi and yi to and • then and are treated as instances to run CCA routine.

KCCA • Objective function: where α andβ are two desirable projections Kx= andKy= are two kernel matrices • We use Partial Gram-Schmidt Orthogonolisation (PGSO) to approximate the kernel matrices

How to solve KCCA • α can be solved from is used for regularization • βcan be obtained from • a number of α andβ (and corresponding λ) can be found

Project into ground truth • Two kernel functions are defined as Kx(xi, xj) = Ky(yi, yj)= • For any x* and y*, their projections can be obtained by P(x*)= Kx(xi, X)α and P(y*)= Ky(yi, Y)β for two views respectively

Why use other pairs of projections? • In accordance to (Zhou, Z.H, et al), two views are conditionally independent given the class label, the biggest α and β should be in accordance with the ground-truth. • However, in real-world, such conditional independence rarely holds, and information conveyed by the other pairs of correlated projections should not be omitted

Similarity measure based on KCCA μis a parameter which regulates the proportion of the distance between the original instances and the distance of their projections

KCK-means for 2-views • Our method is proposed based on K-means • In fact, we just extend K-means by adding the process of solving the fsim

KCK-means for 1-view • However, two-view data sets are rare in real world • (Nigam, K. et al.) points out that if there is sufficient redundancy among the features, we are able to identify a fairly reasonable division of them • Similarly, we try to randomly split 1-view data set into two parts and treat them as the 2 views of the original data set to perform KCK-means.

Evaluation Metrics • Pair-Precision: • Mutual Information: • Intuitive-Precision:

Results on 2-views and 1-views

Influence of η • There is a precision parameter (or stopping criterion)—η in the PGSO algorithm • The dimensions of the projections rely on η • We also investigate its influence on the performance of KCK-means

Influence of η(2-views)

Influence of η(1-view)

Conclusions(1/2) • Results reflect that by using KCK-means, much better quality of clusters could be obtained than those obtained from K-means and agglomerative hierarchical clustering • We also note that when μis set to be very small or even zero, the performance of KCK-means is the best • It means using the projections obtained from KCCA the similarity between instances already can be measured good enough

Conclusions(1/2) • However, when the number of dimensions of the projections obtained from KCCA is very small, the performance of KCK-means descends very much even worse than those of the two traditional clustering algorithms. • It means, in real-world applications, information conveyed by the other pairs of correlated projections should be also considered • All in all, Dimensions of projections used in KCK-means must be enough

Thank You !

KCK-means A Clustering Method based on Kernel Canonical Correlation Analysis

KCK-means A Clustering Method based on Kernel Canonical Correlation Analysis

Presentation Transcript

Multivariate Data Analysis Chapter 8 - Canonical Correlation Analysis

Canonical Correlation

Canonical Correlation Analysis for Feature Reduction

Kernel-based Weighted Multi-view Clustering

Canonical Correlation

A Similarity-Based Robust Clustering Method

Kernel Canonical Correlation Analysis (Language Independent Document Representation)

A novel ant-based clustering algorithm using the kernel method

Kernel Discriminant Analysis Based on Canonical Difference for Face Recognition in Image Sets

Kernel Canonical Correlation Analysis (Language Independent Document Representation)

Kernel Canonical Correlation Analysis

Canonical Correlation

Canonical Correlation Analysis and Related Techniques

DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation

Canonical Correlation Analysis (CCA)

9. Canonical Correlation

Rek-means A k-means Based Clustering Algorithm

Canonical Correlation: Equations

Correlation Clustering

Correlation Clustering

Canonical Correlation Analysis

Canonical Correlation