170 likes | 353 Views
1 /14. Machine Learning Seminar Series. Diffusion Maps and Spectral Clustering. Author : Ronald R. Coifman et al. (Yale University) Presenter : Nilanjan Dasgupta (SIG Inc.). Z. Y. X. 2 /14. Motivation. -- Datum. Low-dimensional Manifold.
E N D
1/14 Machine Learning Seminar Series Diffusion Maps and Spectral Clustering Author : Ronald R. Coifman et al. (Yale University) Presenter : Nilanjan Dasgupta (SIG Inc.)
Z Y X 2/14 Motivation -- Datum Low-dimensional Manifold • Data lie on a low-dimensional manifold. The shape of the • manifold is not known a priori. • PCA would fail to make compact representation since the • manifold is not linear ! • Spectral clustering as a non-linear dimensionality reduction • scheme.
3/14 Outline • Non-linear dimensionality reduction and spectral clustering. • Diffusion based probabilistic interpretation of spectral methods. • Eigenvectors of normalized graph Laplacian is a discrete • approximation of the continuous Fokker-Plank operator. • Justification of the success of spectral clustering. • Conclusions.
4/14 Spectral clustering • Nomalized graph Laplacian : • Given N data points where each , the distance • (similarity) between any two points xi and xj is given by • with Gaussian kernel of width e • and a diagonal normalization matrix • Solve the normalized eigenvalue problem • Use first few eigenvectors of M for low-dimensional • representation of data or good coordinates for clustering.
5/14 Spectral Clustering : previous work • Non-linear dimensionality analysis by S. Roweis and L.Saul • (published in Science magazine, 2000). • Belkin & Niyogi (NIPS’02) show that if data are sampled uniformly • from the low-dimensional manifold, first few eigenvectors of • M=D-1L are discrete approximation of the Laplace-Beltrami • operator on the manifold. • Meila & Shi (AIStat’01) interpret M as a stochastic matrix • representing random walk on the graph.
6/14 Diffusion distance and Diffusion map • A symmetric matrix Ms can be derived from M as • M and Ms has same N eigenvalues, • Under random walk representation of the graph M f: left eigenvector of M y : right eigenvector of M e : time step
7/14 Diffusion distance and Diffusion map • e has the dual representation (time step and kernel width). • If one starts random walk from location xi , the probability of • landing in location y after r time steps is given by • For large e, all points in the graph are connected (Mi,j >0) and • the eigenvalues of M where ei is a row vector with all zeros except that ith position = 1.
8/14 Diffusion distance and Diffusion map • One can show that regardless of starting point xi Left eigenvector of M with eigenvalue l0=1 with • Eigenvector f0(x) has the dual representation : • 1. Stationary probability distribution on the curve, i.e., the • probability of landing at location x after taking infinite • steps of random walk (independent of the start location). • 2. It is the density estimate at location x.
9/14 Diffusion distance • For any finite time r, • yk and fk are the right and left eigenvectors of graph Laplacian M. • is the kth eigenvalue of Mr (arranged in descending order). • Given the definition of random walk, we denote Diffusion • distance as a distance measure at time t between two pmfs as with empirical choice w(y)=1/f0(y).
10/14 Diffusion Map • Diffusion distance : • Diffusion map : Mapping between original space and first • k eigenvectors as Relationship : • This relationship justifies using Euclidean distance in diffusion • map space for spectral clustering. • Since , it is justified to stop at appropriate k with • a negligible error of order O(lk+1/lk)t).
Y X 11/14 Asymptotics of Diffusion Map • Suppose {xi} are sampled i.i.d. from probability density p(x) • defined over manifold Z • Suppose p(x) = e-U(x) with U(x) is potential • (energy) at location x. • As , random walk on a discrete graph • converges to random walk on the continuous manifold W. • The forward and backward operators are given by
12/14 Asymptotics of Diffusion Map • Tf[f] : the probability distribution after one time-step e • f(x) is probability distribution on the graph at t=0. • Tb[y](x) is the mean of function y after one time-step e, for a random walk • that started at location x at time t=0. • Consider the limit , i.e., when • each data point contains infinite nearby neighbors. Hence • in that limit, random walk converges to a diffusion process • with probability density evolving continuously in time as
13/14 Fokker-Plank operator • Infinitesimal generators (propagators) : • The eigenfunctions of Tf and Tbconverge to those of Hf and Hb, respectively. • The backward generator is given by the Fokker –Plank operator which corresponds to a diffusion process in a potential field 2U(x).
14/14 Spectral clustering and Fokker-Plank operator • The term is interpreted as the drift term towards • low potential (higher data density). • The left and right eigenvectors of M can be viewed as discrete • approximations of Tf and Tb, respectively. • Tf and Tb can be viewed as approximation to Hf and Hb, which • in the asymptotic case ( ) can be viewed as diffusion • process with potential 2U(x) (p(x)=exp(-U(x)).