230 likes | 742 Views
Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques. Jonathan Huang ( jch1@cs.cmu.edu ) Advanced Perception 2006, CMU May 1 st , 2006. ISOMAP Algorithm Review. Connect each point to its k nearest neighbors to form a graph
E N D
Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques Jonathan Huang (jch1@cs.cmu.edu) Advanced Perception 2006, CMU May 1st, 2006
ISOMAP Algorithm Review • Connect each point to its k nearest neighbors to form a graph • Approximate pairwise geodesic distances using Dijkstra on this graph • Apply Metric MDS to recover a low dimensional isometric embedding
Questionable Assumptions • ISOMAP can fail in both of its steps, (during the geodesic approximation and during MDS) if the assumptions which guarantee success are not met: • Geodesic Approximation • Points need to be sampled uniformly (and fairly densely) from a manifold with no noise • The intrinsic parameter space must be convex • MDS • There might not exist an isometric embedding (or anything close to one)
ISOMAP: Topological Instabilities • ISOMAP is prone to short circuits and topological instabilities • It’s not always clear how to define neighborhoods • We might get a disconnected graph…
ISOMAP: Convex Intrinsic Geometry • Image databases (Donoho/Grimes) Input Data ISOMAP output
ISOMAP: Nonconvex Intrinsic Geometry • Problems in estimating the geodesic can arise if the parameter space is not convex • Examples (S1 topology, Images of a rotating teapot) Input Data ISOMAP output eigenvalues Images by Lawrence Saul and Carrie Grimes
ISOMAP: Nonconvex Intrinsic Geometry • ISOMAP can fail for • Occlusions • Periodic Gaits (Is this guy wearing clothes???) Images by Lawrence Saul and Carrie Grimes
ISOMAP: Complexity • For large datasets, ISOMAP can be slow • k-nearest neighbors scales as O(n2 D) (the naïve implementation anyway) • Djikstra scales as O(n2 logn + n2 k) • Metric MDS scales as O(n2 d) • One solution is to use Nystrom approximations (Landmark ISOMAP) • But we need lots of points to get an accurate approximation to the true geodesic!
ISOMAP: Dimensionality Estimation • Preserving distances may hamper dimensionality reduction! • Gauss’ Theorema Egregium says that some objects just can’t be embedded isometrically in a lower dimension • (The intrinsic curvature of a surface is invariant under local isometry) • It is sometimes possible to figure out the intrinsic dimension of a surface by looking at the spectrum given by MDS. This will not work for a large class of surfaces though. Fish Bowl Dataset ISOMAP embedding A Better embedding
ISOMAP Weakness Summary • Sensitive to noise (short circuits) • Fails for nonconvex parameter spaces • Fails to recover correct dimension for spaces with high intrinsic curvature • Slow for large training sets
LLE Algorithm Review • Compute the k nearest neighbors • Solve for the weights necessary to reconstruct each point using a linear combination of its neighbors • Find a low dimensional embedding which minimizes reconstruction loss
LLE • LLE has some nice properties • The result is globally optimal • The “hardest” part is a sparse eigenvector problem • Does not worry about distance preservation (this can be good or bad I suppose) • But… • Dimensionality estimation is not as straight forward • There are no theoretical guarantees • Like ISOMAP, it is sensitive to noise
LLE: Estimating Dimension • The eigenvalues of the matrix do not clearly indicate dimensionality!
LLE • Dependency on the size of the neighborhood set (ISOMAP also has this problem)
LLE Alternative • In contrast to ISOMAP, LLE does not really come with many theoretical guarantees • There are LLE extensions that do have guarantees: e.g. Hessian LLE
LLE • Versus PCA on Digit Classification • Is Dimensionality Reduction really the right thing to do for this supervised learning task? • Does the space of handwritten digits have manifold geometry?
General Remarks • ISOMAP, LLE share many virtues, but they are also marred by the same flaws • Sensitivity to noise • Sensitivity to non-uniformly sampled data • No principled approaches to determining intrinsic topology (or dimensionality for that matter) • No principled way to set K, the size of the neighborhood set • Dealing with different clusters (connected components) • No easy out-of-sample extensions
More General Remarks • Manifold Learning is not always appropriate! • Example: PCA is often bad for classification • When does natural data actually lie on manifold? • How do we reconcile nonlinear dimensionality reduction with kernel methods? • There is very little work that’s been done on determining the intrinsic topology of high dimensional data (and topology is important if we hope to recover natural parameterizations)
References • J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319-2323, 2000. • Sam Roweis & Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326, 2000. • David Donoho and Carrie Grimes. Hessian Eigenmaps: New Locally-Linear Embedding Techniques for High-Dimensional Data. PNAS 100 (2003) 5591—5596. • David Donoho and Carrie Grimes. Image manifolds which are isometric to Euclidean space. TR2002-27 (Dept. of Statistics, Stanford University, Stanford, CA). 2002. • Lawrence K. Saul & Sam T. Roweis. Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds. JMLR, v4, pp. 119-155, 2003.