1 / 31

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction. Presented by Dragana Veljkovic. Overview. Curse-of-dimensionality Dimension reduction techniques Isomap Locally linear embedding (LLE) Problems and improvements. Problem description.

aquila
Download Presentation

Nonlinear Dimensionality Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nonlinear Dimensionality Reduction Presented by Dragana Veljkovic

  2. Overview • Curse-of-dimensionality • Dimension reduction techniques • Isomap • Locally linear embedding (LLE) • Problems and improvements

  3. Problem description • Large amount of data being collected leads to creation of very large databases • Most problems in data mining involve data with a large number of measurements (or dimensions) • E.g. Protein matching, fingerprint recognition, meteorological predictions, satellite image repositories • Reducing dimensions increases capability of extracting knowledge

  4. Problem definition • Original high dimensional data: X = (x1, …, xn) where xi=(xi1,…,xip)T • underlying low dimensional data: Y = (y1, …, yn) where yi=(yi1,…,yiq)T and q<<p • Assume X forms a smooth low dimensional manifold in high dimensional space • Find the mapping that captures the important features • Determine q that can best describe the data

  5. Different approaches • Local or Shape preserving • Global or Topology preserving • Local embeddings • Local – simplify representation of each object regardless of the rest of the data • Features selected retain most of the information • Fourier decomposition, wavelet decomposition, piecewise constant approximation, etc.

  6. Global or Topology preserving • Mostly used for visualization and classification • PCA or KL decomposition • MDS • SVD • ICA

  7. Local embeddings (LE) • Overlapping local neighborhoods, collectively analyzed, can provide information on global geometry • LE preserves the local neighborhood of each object • preserving the global distances through the non-neighboring objects • Isomap and LLE

  8. Another classification • Linear and Non Linear methods

  9. Neighborhood Two ways to select neighboring objects: • k nearest neighbors (k-NN) – can make non-uniform neighbor distance across the dataset • ε-ball – prior knowledge of the data is needed to make reasonable neighborhoods; size of neighborhood can vary

  10. Isomap – general idea • Only geodesic distances reflect the true low dimensional geometry of the manifold • MDS and PCA see only Euclidian distances and there for fail to detect intrinsic low-dimensional structure • Geodesic distances are hard to compute even if you know the manifold • In a small neighborhood Euclidian distance is a good approximation of the geodesic distance • For faraway points, geodesic distance is approximated by adding up a sequence of “short hops” between neighboring points

  11. Isomap algorithm • Find neighborhood of each object by computing distances between all pairs of points and selecting closest • Build a graph with a node for each object and an edge between neighboring points. Euclidian distance between two objects is used as edge weight • Use a shortest path graph algorithm to fill in distance between all non-neighboring points • Apply classical MDS on this distance matrix

  12. Isomap

  13. Isomap on face images

  14. Isomap on hand images

  15. Isomap on written two-s

  16. Isomap - summary • Inherits features of MDS and PCA: • guaranteed asymptotic convergence to true structure • Polynomial runtime • Non-iterative • Ability to discover manifolds of arbitrary dimensionality • Perform well when data is from a single well sampled cluster • Few free parameters • Good theoretical base for its metrics preserving properties

  17. Problems with Isomap • Embeddings are biased to preserve the separation of faraway points, which can lead to distortion of local geometry • Fails to nicely project data spread among multiple clusters • Well-conditioned algorithm but computationally expensive for large datasets

  18. Improvements to Isomap • Conformal Isomap – capable of learning the structure of certain curved manifolds • Landmark Isomap – approximates large global computations by a much smaller set of calculation • Reconstruct distances using k/2 closest objects, as well as k/2 farthest objects

  19. Locally Linear Embedding (LLE) • Isomap attempts to preserve geometry on all scales, mapping nearby points close and distant points far away from each other • LLE attempts to preserve local geometry of the data by mapping nearby points on the manifold to nearby points in the low dimensional space • Computational efficiency • Representational capacity

  20. LLE – general idea • Locally, on a fine enough scale, everything looks linear • Represent object as linear combination of its neighbors • Representation indifferent to affine transformation • Assumption: same linear representation will hold in the low dimensional space

  21. LLE – matrix representation X = W*X where • X is p*n matrix of original data • W is n*n matrix of weights and • Wij =0 if Xj is not neighbor of Xi • rows of W sum to one Need to solve system Y = W*Y • Y is q*n matrix of underlying low dimensional data • Minimize error:

  22. LLE - algorithm • Find k nearest neighbors in X space • Solve for reconstruction weights W • Compute embedding coordinates Y using weights W: • create sparse matrix M = (I-W)'*(I-W) • Compute bottom q+1 eigenvectors of M • Set i-th row of Y to be i+1 smallest eigen vector

  23. Numerical Issues • Covariance matrix used to compute W can be ill-conditioned, regularization needs to be used • Small eigen values are subject to numerical precision errors and to getting mixed • But, sparse matrices used in this algorithm make it much faster then Isomap

  24. LLE

  25. LLE – effect of neighborhood size

  26. LLE – with face picture

  27. LLE – Lips pictures

  28. PCA vs. LLE

  29. Problems with LLE • If data is noisy, sparse or weakly connected coupling between faraway points can be attenuated • Most common failure of LLE is mapping close points that are faraway in original space – arising often if manifold is undersampled • Output strongly depends on selection of k

  30. References • Roweis, S. T. and L. K. Saul (2000). "Nonlinear dimensionality reduction by locally linear embedding " Science290(5500): 2323-2326. • Tenenbaum, J. B., V. de Silva, et al. (2000). "A global geometric framework for nonlinear dimensionality reduction " Science290(5500): 2319-2323. • Vlachos, M., C. Domeniconi, et al. (2002). "Non-linear dimensionality reduction techniques for classification and visualization." Proc. of 8th SIGKDD, Edmonton, Canada. • de Silva, V. and Tenenbaum, J. (2003). “Local versus global methods for nonlinear dimensionality reduction”, Advances in Neural Information Processing Systems,15.

  31. Questions?

More Related