Non-linear Methods for Dimension Reduction: An Overview

Non-linear dimension-reduction methods Olga Sorkine January 2006

Overview • Dimensionality reduction of high-dimensional data • Good for learning, visualization and … parameterization

Dimension reduction • Input: points in some D-dimensional space (D is large) • Images • Physical measurements • Statistical data • etc… • We want to discover some structure/correlation in the input data. Hopefully, the data lives on a d-dimensional surface (d << D). • Discover the real dimensionality d • Find a mapping from RD to Rd that preserves something about the data • Today we’ll talk about preserving variance/distances

Discovering linear structures • PCA – finds linear subspaces that best preserve the variance of the data points

Linear is sometimes not enough • When our data points sit on a non-linear manifold • We won’t find a goodlinear mapping from the data points to a plane, because there isn’t any

Today • Two methods to discover such non-linear manifolds: • Isomap (descendent of MultiDimensional Scaling) • Llocally Linear Embedding

Notations • Input data points: columns of X  RDn • Assume that the center of mass of the points is the origin

Reminder about PCA • PCA finds a linear d-dimensional subspace of RD along which the variance of the data is the biggest • Denote by the data points projected onto the d-dimensional space. PCA finds such subspace that: • When we do parallel projection of the data points, the distances between them can only get smaller. So finding a subspace which attains the maximum scatter means we get the distances somehow preserved.

Reminder about PCA • To find the principal axes: • Compute the scatter matrix S  RDD • Diagonalize S: • The eigenvectors of S are the principal directions. The eigenvalues are sorted in descending order. • Take d first eigenvectors as the “principal subspace” and project the data points onto this subspace.

Why this works? • The eigenvectors vi are the maxima of the following quadratic form: • In fact, we get directions of maximal variance:

Multidimensional Scaling J. Tenenbaum, V. Silva, J.C. Langford Science, December 2000

Multidimensional scaling (MDS) • The idea: compute the pairwise distances between the input points: • Now, find n points in low-dimensional space Rd, so that their distance matrix is as close as possible to M.

MDS – the math details We look for X’, such that || M’ – M || is as small as possible, where M’ is the Euclidean distances matrix for points xi’.

MDS – the math details Ideally, we want: want to get rid of these

MDS – the math details Trick: use the “magic matrix” J :

MDS – the math details Cleaning the system:

How to find X’ We will use the spectral decomposition of B:

How to find X’ So we find X’ by throwing away the last nd eigenvalues

Isomap • The idea of Tenenbaum et al.: estimate geodesic distances of the data points (instead of Euclidean) • Use K nearest neighbors or -balls to define neighborhood graphs • Approximate the geodesics by shortest paths on the graph.

6 0 4 0 2 0 1 5 0 1 5 1 0 1 0 5 5 0 0 - 5 - 5 - 1 0 - 1 0 - 1 5 - 1 5 Inducing a graph

Defining neighborhood and weights

Finding geodesic paths • Compute weighted shortest paths on the graph (Dijkstra)

Locating new points in the Isomap embedding • Suppose we have a new data point p RD • Want to find where it belongs in the Rd embedding • Compute the distances from p to all other points:

Some results

Morph in Isomap space

Flattening results (Zigelman et al.)

Locally Linear Embedding S.T. Roweis and L.K. Saul Science, December 2000

The idea • Define neighborhood relations between points • K nearest neighbors • -balls • Find weights that reconstruct each data point from its neighbors: • Find low-dimensional coordinates so that the same weights hold:

Local information reconstructs global one • The weights wij capture the local shape • Invariant to translation, rotation and scale of the neighborhood • If the neighborhood lies on a manifold, the local mapping from the global coordinates (RD) to the surface coordinates (Rd) is almost linear • Thus, the weights wij should hold also for manifold (Rd) coordinate system!

Solving the minimizations • Linear least squares (using Lagrange multipliers) • To find that minimize, a sparse eigen-problem is solved. Additional constraints are added for conditioning:

Some results • The Swiss roll

Some results

The end

Non-linear Methods for Dimension Reduction: An Overview