360 likes | 518 Views
Non-linear dimension-reduction methods. Olga Sorkine January 2006. Overview. Dimensionality reduction of high-dimensional data Good for learning, visualization and … parameterization. Dimension reduction. Input: points in some D -dimensional space ( D is large) Images
E N D
Non-linear dimension-reduction methods Olga Sorkine January 2006
Overview • Dimensionality reduction of high-dimensional data • Good for learning, visualization and … parameterization
Dimension reduction • Input: points in some D-dimensional space (D is large) • Images • Physical measurements • Statistical data • etc… • We want to discover some structure/correlation in the input data. Hopefully, the data lives on a d-dimensional surface (d << D). • Discover the real dimensionality d • Find a mapping from RD to Rd that preserves something about the data • Today we’ll talk about preserving variance/distances
Discovering linear structures • PCA – finds linear subspaces that best preserve the variance of the data points
Linear is sometimes not enough • When our data points sit on a non-linear manifold • We won’t find a goodlinear mapping from the data points to a plane, because there isn’t any
Today • Two methods to discover such non-linear manifolds: • Isomap (descendent of MultiDimensional Scaling) • Llocally Linear Embedding
Notations • Input data points: columns of X RDn • Assume that the center of mass of the points is the origin
Reminder about PCA • PCA finds a linear d-dimensional subspace of RD along which the variance of the data is the biggest • Denote by the data points projected onto the d-dimensional space. PCA finds such subspace that: • When we do parallel projection of the data points, the distances between them can only get smaller. So finding a subspace which attains the maximum scatter means we get the distances somehow preserved.
Reminder about PCA • To find the principal axes: • Compute the scatter matrix S RDD • Diagonalize S: • The eigenvectors of S are the principal directions. The eigenvalues are sorted in descending order. • Take d first eigenvectors as the “principal subspace” and project the data points onto this subspace.
Why this works? • The eigenvectors vi are the maxima of the following quadratic form: • In fact, we get directions of maximal variance:
Multidimensional Scaling J. Tenenbaum, V. Silva, J.C. Langford Science, December 2000
Multidimensional scaling (MDS) • The idea: compute the pairwise distances between the input points: • Now, find n points in low-dimensional space Rd, so that their distance matrix is as close as possible to M.
MDS – the math details We look for X’, such that || M’ – M || is as small as possible, where M’ is the Euclidean distances matrix for points xi’.
MDS – the math details Ideally, we want: want to get rid of these
MDS – the math details Trick: use the “magic matrix” J :
MDS – the math details Cleaning the system:
How to find X’ We will use the spectral decomposition of B:
How to find X’ So we find X’ by throwing away the last nd eigenvalues
Isomap • The idea of Tenenbaum et al.: estimate geodesic distances of the data points (instead of Euclidean) • Use K nearest neighbors or -balls to define neighborhood graphs • Approximate the geodesics by shortest paths on the graph.
6 0 4 0 2 0 1 5 0 1 5 1 0 1 0 5 5 0 0 - 5 - 5 - 1 0 - 1 0 - 1 5 - 1 5 Inducing a graph
Finding geodesic paths • Compute weighted shortest paths on the graph (Dijkstra)
Locating new points in the Isomap embedding • Suppose we have a new data point p RD • Want to find where it belongs in the Rd embedding • Compute the distances from p to all other points:
Locally Linear Embedding S.T. Roweis and L.K. Saul Science, December 2000
The idea • Define neighborhood relations between points • K nearest neighbors • -balls • Find weights that reconstruct each data point from its neighbors: • Find low-dimensional coordinates so that the same weights hold:
Local information reconstructs global one • The weights wij capture the local shape • Invariant to translation, rotation and scale of the neighborhood • If the neighborhood lies on a manifold, the local mapping from the global coordinates (RD) to the surface coordinates (Rd) is almost linear • Thus, the weights wij should hold also for manifold (Rd) coordinate system!
Solving the minimizations • Linear least squares (using Lagrange multipliers) • To find that minimize, a sparse eigen-problem is solved. Additional constraints are added for conditioning:
Some results • The Swiss roll