80 likes | 209 Views
NIPS 2005 Review: Diffusion Maps, Spectral Clustering, and Eigenfunctions of Fokker-Planck Operators Boaz Nadler, St é phane Lafon, Ronald R. Coifman, Ioannis G. Kevrekidis. Presented by: Jonathan Huang (jch1@cs.cmu.edu) Advisor: Carlos Guestrin 1/24/2006. Main Idea .
E N D
NIPS 2005 Review: Diffusion Maps, Spectral Clustering, and Eigenfunctions of Fokker-Planck Operators Boaz Nadler, Stéphane Lafon, Ronald R. Coifman, Ioannis G. Kevrekidis Presented by: Jonathan Huang (jch1@cs.cmu.edu) Advisor: Carlos Guestrin 1/24/2006
Main Idea • A diffusion interpretation of clustering and dimensionality reduction methods which use the spectrum of the normalized graph Laplacian.
The Normalized Graph Laplacian • Given a point cloud, x1,x2,…,xn, first form a matrix based on the Heat Kernel: • Let D be a diagonal matrix with Dii=j Wij • An algorithm for spectral clustering or dimensionality reduction might at this point find the first few eigenvectors of M=D-1W (or the last eigenvectors of D-W).
Random Walks on a Graph • Notice that M is a stochastic matrix! (Dividing by D normalizes all the rows to sum to one) • We can view M as the transition matrix for a random walk on a graph, where the probabilities make it easy to jump to nearby points, and difficult to jump to far away points.
The Eigenvalue Connection • Let p(t,xj|xi) be the probability of being at point xj at time t given that we started at point xi. What does this distribution look like as t!1? (What does xMt tend to?) • Answer: • The eigenvalues of M look like: 0 = 1 > 1¸2¸ … ¸n-1¸ 0 • This means that no matter how we begin the random walk, we will always converge to the principle eigenvector, the stationary distribution (0).
Main Results • Define the diffusion distance as: • Define the diffusion mapt(x) as the mapping from the original space onto the space spanned by the first k eigenvectors. • Theorem: Diffusion distances in the original space are the same as Euclidean distances in the image of the diffusion map. • This theorem justifies using Euclidean distances in the diffusion map space for clustering/dimensionality reduction purposes! • At large enough times t, this distance is well approximated by using only a few eigenvectors.
More Results • It was previously known that M converges in probability (as time steps get very small and number of points get large) to a certain Fokker-Planck operator which corresponds to a diffusion pde. (continuous in time and space). • Boundary Conditions: • Assumptions: the point cloud is obtained by sampling from a probability density which is confined to a compact connected set with smooth boundary . • Result: In the limit !0, we have reflecting boundary conditions on :