280 likes | 585 Views
KCK-means A Clustering Method based on Kernel Canonical Correlation Analysis. Dr. Yingjie Tian. Outline. Motivation & Challenges KCCA, Kernel Canonical Correlation Analysis Our method: KCK-means Experiments Conclusions. Outline. Motivation & Challenges
E N D
KCK-meansA Clustering Method based on KernelCanonical Correlation Analysis Dr. Yingjie Tian
Outline • Motivation & Challenges • KCCA, Kernel Canonical Correlation Analysis • Our method: KCK-means • Experiments • Conclusions
Outline • Motivation & Challenges • KCCA, Kernel Canonical Correlation Analysis • Our method: KCK-means • Experiments • Conclusions
Motivation • Previous Similarity Metrics • Euclidean distance • Squared Mahalanobis distance • Mutual neighbor distance • … • Fail when there are non-linear correlation between attributes
Motivation • In some interesting application domains, attributes can be naturally split into two subsets, either of which suffices for learning • Intuitively, there may be some projections can reveal the ground truth in these two views • KCCA is a technique that can extract common features from a pair of multivariate data • It is the most promising candidate
Outline • Motivation & Challenges • KCCA, Kernel Canonical Correlation Analysis • Our method: KCK-means • Experiments • Conclusions
Canonical Correlation Analysis(1/2) • X = {x1, x2, … , xl} and Y = {y1, y2, … , yl} denote two views • CCA finds projection vectors wx and wymax the correlation coefficient between and • That is: • Cxyis the between-sets covariance matrix of X and Y, Cxx and Cyy are within-sets covariance matrices.
Canonical Correlation Analysis(2/2) • Cyy is invertible, then solving for the generalized eigenvectors, then we can obtain the sequence of wx’s and then find the corresponding wy’s by using
Why Kernel CCA • Why use Kernel extension of CCA? • CCA may not extract useful descriptors of the data because of its linearity • In order to find nonlinear correlated projections • Sx = { }, Sy= { } KCCA maps xi and yi to and • then and are treated as instances to run CCA routine.
KCCA • Objective function: where α andβ are two desirable projections Kx= andKy= are two kernel matrices • We use Partial Gram-Schmidt Orthogonolisation (PGSO) to approximate the kernel matrices
How to solve KCCA • α can be solved from is used for regularization • βcan be obtained from • a number of α andβ (and corresponding λ) can be found
Outline • Motivation & Challenges • KCCA, Kernel Canonical Correlation Analysis • Our method: KCK-means • Experiments • Conclusions
Project into ground truth • Two kernel functions are defined as Kx(xi, xj) = Ky(yi, yj)= • For any x* and y*, their projections can be obtained by P(x*)= Kx(xi, X)α and P(y*)= Ky(yi, Y)β for two views respectively
Why use other pairs of projections? • In accordance to (Zhou, Z.H, et al), two views are conditionally independent given the class label, the biggest α and β should be in accordance with the ground-truth. • However, in real-world, such conditional independence rarely holds, and information conveyed by the other pairs of correlated projections should not be omitted
Similarity measure based on KCCA μis a parameter which regulates the proportion of the distance between the original instances and the distance of their projections
KCK-means for 2-views • Our method is proposed based on K-means • In fact, we just extend K-means by adding the process of solving the fsim
KCK-means for 1-view • However, two-view data sets are rare in real world • (Nigam, K. et al.) points out that if there is sufficient redundancy among the features, we are able to identify a fairly reasonable division of them • Similarly, we try to randomly split 1-view data set into two parts and treat them as the 2 views of the original data set to perform KCK-means.
Outline • Motivation & Challenges • KCCA, Kernel Canonical Correlation Analysis • Our method: KCK-means • Experiments • Conclusions
Evaluation Metrics • Pair-Precision: • Mutual Information: • Intuitive-Precision:
Influence of η • There is a precision parameter (or stopping criterion)—η in the PGSO algorithm • The dimensions of the projections rely on η • We also investigate its influence on the performance of KCK-means
Outline • Motivation & Challenges • KCCA, Kernel Canonical Correlation Analysis • Our method: KCK-means • Experiments • Conclusions
Conclusions(1/2) • Results reflect that by using KCK-means, much better quality of clusters could be obtained than those obtained from K-means and agglomerative hierarchical clustering • We also note that when μis set to be very small or even zero, the performance of KCK-means is the best • It means using the projections obtained from KCCA the similarity between instances already can be measured good enough
Conclusions(1/2) • However, when the number of dimensions of the projections obtained from KCCA is very small, the performance of KCK-means descends very much even worse than those of the two traditional clustering algorithms. • It means, in real-world applications, information conveyed by the other pairs of correlated projections should be also considered • All in all, Dimensions of projections used in KCK-means must be enough