1 / 36

Non-linear dimension-reduction methods

Non-linear dimension-reduction methods. Olga Sorkine January 2006. Overview. Dimensionality reduction of high-dimensional data Good for learning, visualization and … parameterization. Dimension reduction. Input: points in some D -dimensional space ( D is large) Images

sondram
Download Presentation

Non-linear dimension-reduction methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-linear dimension-reduction methods Olga Sorkine January 2006

  2. Overview • Dimensionality reduction of high-dimensional data • Good for learning, visualization and … parameterization

  3. Dimension reduction • Input: points in some D-dimensional space (D is large) • Images • Physical measurements • Statistical data • etc… • We want to discover some structure/correlation in the input data. Hopefully, the data lives on a d-dimensional surface (d << D). • Discover the real dimensionality d • Find a mapping from RD to Rd that preserves something about the data • Today we’ll talk about preserving variance/distances

  4. Discovering linear structures • PCA – finds linear subspaces that best preserve the variance of the data points

  5. Linear is sometimes not enough • When our data points sit on a non-linear manifold • We won’t find a goodlinear mapping from the data points to a plane, because there isn’t any

  6. Today • Two methods to discover such non-linear manifolds: • Isomap (descendent of MultiDimensional Scaling) • Llocally Linear Embedding

  7. Notations • Input data points: columns of X  RDn • Assume that the center of mass of the points is the origin

  8. Reminder about PCA • PCA finds a linear d-dimensional subspace of RD along which the variance of the data is the biggest • Denote by the data points projected onto the d-dimensional space. PCA finds such subspace that: • When we do parallel projection of the data points, the distances between them can only get smaller. So finding a subspace which attains the maximum scatter means we get the distances somehow preserved.

  9. Reminder about PCA • To find the principal axes: • Compute the scatter matrix S  RDD • Diagonalize S: • The eigenvectors of S are the principal directions. The eigenvalues are sorted in descending order. • Take d first eigenvectors as the “principal subspace” and project the data points onto this subspace.

  10. Why this works? • The eigenvectors vi are the maxima of the following quadratic form: • In fact, we get directions of maximal variance:

  11. Multidimensional Scaling J. Tenenbaum, V. Silva, J.C. Langford Science, December 2000

  12. Multidimensional scaling (MDS) • The idea: compute the pairwise distances between the input points: • Now, find n points in low-dimensional space Rd, so that their distance matrix is as close as possible to M.

  13. MDS – the math details We look for X’, such that || M’ – M || is as small as possible, where M’ is the Euclidean distances matrix for points xi’.

  14. MDS – the math details Ideally, we want: want to get rid of these

  15. MDS – the math details Trick: use the “magic matrix” J :

  16. MDS – the math details Cleaning the system:

  17. How to find X’ We will use the spectral decomposition of B:

  18. How to find X’ So we find X’ by throwing away the last nd eigenvalues

  19. Isomap • The idea of Tenenbaum et al.: estimate geodesic distances of the data points (instead of Euclidean) • Use K nearest neighbors or -balls to define neighborhood graphs • Approximate the geodesics by shortest paths on the graph.

  20. 6 0 4 0 2 0 1 5 0 1 5 1 0 1 0 5 5 0 0 - 5 - 5 - 1 0 - 1 0 - 1 5 - 1 5 Inducing a graph

  21. Defining neighborhood and weights

  22. Finding geodesic paths • Compute weighted shortest paths on the graph (Dijkstra)

  23. Locating new points in the Isomap embedding • Suppose we have a new data point p RD • Want to find where it belongs in the Rd embedding • Compute the distances from p to all other points:

  24. Some results

  25. Morph in Isomap space

  26. Flattening results (Zigelman et al.)

  27. Flattening results (Zigelman et al.)

  28. Flattening results (Zigelman et al.)

  29. Locally Linear Embedding S.T. Roweis and L.K. Saul Science, December 2000

  30. The idea • Define neighborhood relations between points • K nearest neighbors • -balls • Find weights that reconstruct each data point from its neighbors: • Find low-dimensional coordinates so that the same weights hold:

  31. Local information reconstructs global one • The weights wij capture the local shape • Invariant to translation, rotation and scale of the neighborhood • If the neighborhood lies on a manifold, the local mapping from the global coordinates (RD) to the surface coordinates (Rd) is almost linear • Thus, the weights wij should hold also for manifold (Rd) coordinate system!

  32. Solving the minimizations • Linear least squares (using Lagrange multipliers) • To find that minimize, a sparse eigen-problem is solved. Additional constraints are added for conditioning:

  33. Some results • The Swiss roll

  34. Some results

  35. Some results

  36. The end

More Related