1 / 19

Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques

Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques. Jonathan Huang ( jch1@cs.cmu.edu ) Advanced Perception 2006, CMU May 1 st , 2006. ISOMAP Algorithm Review. Connect each point to its k nearest neighbors to form a graph

amadis
Download Presentation

Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques Jonathan Huang (jch1@cs.cmu.edu) Advanced Perception 2006, CMU May 1st, 2006

  2. ISOMAP Algorithm Review • Connect each point to its k nearest neighbors to form a graph • Approximate pairwise geodesic distances using Dijkstra on this graph • Apply Metric MDS to recover a low dimensional isometric embedding

  3. Questionable Assumptions • ISOMAP can fail in both of its steps, (during the geodesic approximation and during MDS) if the assumptions which guarantee success are not met: • Geodesic Approximation • Points need to be sampled uniformly (and fairly densely) from a manifold with no noise • The intrinsic parameter space must be convex • MDS • There might not exist an isometric embedding (or anything close to one)

  4. ISOMAP: Topological Instabilities • ISOMAP is prone to short circuits and topological instabilities • It’s not always clear how to define neighborhoods • We might get a disconnected graph…

  5. ISOMAP: Convex Intrinsic Geometry • Image databases (Donoho/Grimes) Input Data ISOMAP output

  6. ISOMAP: Nonconvex Intrinsic Geometry • Problems in estimating the geodesic can arise if the parameter space is not convex • Examples (S1 topology, Images of a rotating teapot) Input Data ISOMAP output eigenvalues Images by Lawrence Saul and Carrie Grimes

  7. ISOMAP: Nonconvex Intrinsic Geometry • ISOMAP can fail for • Occlusions • Periodic Gaits (Is this guy wearing clothes???) Images by Lawrence Saul and Carrie Grimes

  8. ISOMAP: Complexity • For large datasets, ISOMAP can be slow • k-nearest neighbors scales as O(n2 D) (the naïve implementation anyway) • Djikstra scales as O(n2 logn + n2 k) • Metric MDS scales as O(n2 d) • One solution is to use Nystrom approximations (Landmark ISOMAP) • But we need lots of points to get an accurate approximation to the true geodesic!

  9. ISOMAP: Dimensionality Estimation • Preserving distances may hamper dimensionality reduction! • Gauss’ Theorema Egregium says that some objects just can’t be embedded isometrically in a lower dimension • (The intrinsic curvature of a surface is invariant under local isometry) • It is sometimes possible to figure out the intrinsic dimension of a surface by looking at the spectrum given by MDS. This will not work for a large class of surfaces though. Fish Bowl Dataset ISOMAP embedding A Better embedding

  10. ISOMAP Weakness Summary • Sensitive to noise (short circuits) • Fails for nonconvex parameter spaces • Fails to recover correct dimension for spaces with high intrinsic curvature • Slow for large training sets

  11. LLE Algorithm Review • Compute the k nearest neighbors • Solve for the weights necessary to reconstruct each point using a linear combination of its neighbors • Find a low dimensional embedding which minimizes reconstruction loss

  12. LLE • LLE has some nice properties • The result is globally optimal • The “hardest” part is a sparse eigenvector problem • Does not worry about distance preservation (this can be good or bad I suppose) • But… • Dimensionality estimation is not as straight forward • There are no theoretical guarantees • Like ISOMAP, it is sensitive to noise

  13. LLE: Estimating Dimension • The eigenvalues of the matrix do not clearly indicate dimensionality!

  14. LLE • Dependency on the size of the neighborhood set (ISOMAP also has this problem)

  15. LLE Alternative • In contrast to ISOMAP, LLE does not really come with many theoretical guarantees • There are LLE extensions that do have guarantees: e.g. Hessian LLE

  16. LLE • Versus PCA on Digit Classification • Is Dimensionality Reduction really the right thing to do for this supervised learning task? • Does the space of handwritten digits have manifold geometry?

  17. General Remarks • ISOMAP, LLE share many virtues, but they are also marred by the same flaws • Sensitivity to noise • Sensitivity to non-uniformly sampled data • No principled approaches to determining intrinsic topology (or dimensionality for that matter) • No principled way to set K, the size of the neighborhood set • Dealing with different clusters (connected components) • No easy out-of-sample extensions

  18. More General Remarks • Manifold Learning is not always appropriate! • Example: PCA is often bad for classification • When does natural data actually lie on manifold? • How do we reconcile nonlinear dimensionality reduction with kernel methods? • There is very little work that’s been done on determining the intrinsic topology of high dimensional data (and topology is important if we hope to recover natural parameterizations)

  19. References • J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319-2323, 2000. • Sam Roweis & Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326, 2000. • David Donoho and Carrie Grimes. Hessian Eigenmaps: New Locally-Linear Embedding Techniques for High-Dimensional Data. PNAS 100 (2003) 5591—5596. • David Donoho and Carrie Grimes. Image manifolds which are isometric to Euclidean space. TR2002-27 (Dept. of Statistics, Stanford University, Stanford, CA). 2002. • Lawrence K. Saul & Sam T. Roweis. Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds. JMLR, v4, pp. 119-155, 2003.

More Related