370 likes | 393 Views
This overview explores dimensionality reduction methods for high-dimensional data, including images, physical measurements, and statistical data. Discovering non-linear structures with Isomap and Locally Linear Embedding. Techniques like PCA for linear structures and Multidimensional Scaling are discussed. Details on Isomap's geodesic distances and LLE's local shape reconstruction are provided. Results include Isomap space morphing and flattening. Published in January 2006 by Olga Sorkine.
E N D
Non-linear dimension-reduction methods Olga Sorkine January 2006
Overview • Dimensionality reduction of high-dimensional data • Good for learning, visualization and … parameterization
Dimension reduction • Input: points in some D-dimensional space (D is large) • Images • Physical measurements • Statistical data • etc… • We want to discover some structure/correlation in the input data. Hopefully, the data lives on a d-dimensional surface (d << D). • Discover the real dimensionality d • Find a mapping from RD to Rd that preserves something about the data • Today we’ll talk about preserving variance/distances
Discovering linear structures • PCA – finds linear subspaces that best preserve the variance of the data points
Linear is sometimes not enough • When our data points sit on a non-linear manifold • We won’t find a goodlinear mapping from the data points to a plane, because there isn’t any
Today • Two methods to discover such non-linear manifolds: • Isomap (descendent of MultiDimensional Scaling) • Llocally Linear Embedding
Notations • Input data points: columns of X RDn • Assume that the center of mass of the points is the origin
Reminder about PCA • PCA finds a linear d-dimensional subspace of RD along which the variance of the data is the biggest • Denote by the data points projected onto the d-dimensional space. PCA finds such subspace that: • When we do parallel projection of the data points, the distances between them can only get smaller. So finding a subspace which attains the maximum scatter means we get the distances somehow preserved.
Reminder about PCA • To find the principal axes: • Compute the scatter matrix S RDD • Diagonalize S: • The eigenvectors of S are the principal directions. The eigenvalues are sorted in descending order. • Take d first eigenvectors as the “principal subspace” and project the data points onto this subspace.
Why this works? • The eigenvectors vi are the maxima of the following quadratic form: • In fact, we get directions of maximal variance:
Multidimensional Scaling J. Tenenbaum, V. Silva, J.C. Langford Science, December 2000
Multidimensional scaling (MDS) • The idea: compute the pairwise distances between the input points: • Now, find n points in low-dimensional space Rd, so that their distance matrix is as close as possible to M.
MDS – the math details We look for X’, such that || M’ – M || is as small as possible, where M’ is the Euclidean distances matrix for points xi’.
MDS – the math details Ideally, we want: want to get rid of these
MDS – the math details Trick: use the “magic matrix” J :
MDS – the math details Cleaning the system:
How to find X’ We will use the spectral decomposition of B:
How to find X’ So we find X’ by throwing away the last nd eigenvalues
Isomap • The idea of Tenenbaum et al.: estimate geodesic distances of the data points (instead of Euclidean) • Use K nearest neighbors or -balls to define neighborhood graphs • Approximate the geodesics by shortest paths on the graph.
6 0 4 0 2 0 1 5 0 1 5 1 0 1 0 5 5 0 0 - 5 - 5 - 1 0 - 1 0 - 1 5 - 1 5 Inducing a graph
Finding geodesic paths • Compute weighted shortest paths on the graph (Dijkstra)
Locating new points in the Isomap embedding • Suppose we have a new data point p RD • Want to find where it belongs in the Rd embedding • Compute the distances from p to all other points:
Locally Linear Embedding S.T. Roweis and L.K. Saul Science, December 2000
The idea • Define neighborhood relations between points • K nearest neighbors • -balls • Find weights that reconstruct each data point from its neighbors: • Find low-dimensional coordinates so that the same weights hold:
Local information reconstructs global one • The weights wij capture the local shape • Invariant to translation, rotation and scale of the neighborhood • If the neighborhood lies on a manifold, the local mapping from the global coordinates (RD) to the surface coordinates (Rd) is almost linear • Thus, the weights wij should hold also for manifold (Rd) coordinate system!
Solving the minimizations • Linear least squares (using Lagrange multipliers) • To find that minimize, a sparse eigen-problem is solved. Additional constraints are added for conditioning:
Some results • The Swiss roll