1 / 24

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction. CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen , Springer, 2007. Agenda. What is dimensionality reduction? Linear methods Principal components analysis

spencer
Download Presentation

Non-linear Dimensionality Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen, Springer, 2007

  2. Agenda • What is dimensionality reduction? • Linear methods • Principal components analysis • Metric multidimensional scaling (MDS) • Non-linear methods • Distance preserving • Topology preserving • Auto-encoders (Deep neural networks)

  3. Dimensionality Reduction • Mapping d-dimensional data points y to p-dimensional vectors x; p < d. • Purposes • Visualization • Classification/regression • Most of the times we are only interested in the forward mapping y to x. • The backward mapping is difficult in general. • If the forward and the backward mappings are linear they method is called linear, else it is called non-linear dimensionality reduction technique.

  4. Two Benchmark Manifolds

  5. Distance Preserving Methods Let’s say the points yi are mapped to xi, i=1,2,…,N. Distance preserving methods try to preserve pair wise distances, i.e., d(yi, yj) = d(xi, xj), or the pair wise dot products, <yi, yj> = <xi, xj>. What is a distance? Nondegeneracy: d(a, b) = 0 if and only if a = b Triangular inequality: for any three points a, b, and c, d(a, b)  d(c, a) + d(c, b) Other two properties, nonnegativity and symmetry follows from these two

  6. Metric MDS A multidimensional scaling (MDS) method is a linear generative model like PCA: y’s are d-dimensional observed variable and x’s are p-dimensional latent variable W is a matrix with the property: Let Then So, dot product is preserved. How about Euclidean distances? So, Euclidean distances are preserved too!

  7. Metric MDS Algorithm Center data matrix Y; and compute dot product matrix S = YTY If data matrix is not available, only distance matrix D is available, do double centering to form scalar matrix: Compute eigenvalue decomposition S = UUT Construct p-dimensional representation as: Metric MDS is actually PCA and is a linear method

  8. Metric MDS Result

  9. Sammon’s Nonlinear Mapping (NLM) NLM minimizes the energy function: Start with initial x’s Update x’s by (quasi-Newton update) xk,i is the kth component of vector xi

  10. Sammon’s NLM

  11. A Basic Issue with Metric Distance Preserving Methods Geodesic distances seem to be better suited

  12. Graph Distance: Approximation to Geodesic Distance

  13. ISOMAP ISOMAP = MDS with graph distance Needs to decide how the graph is constructed: who is the neighbor of whom K closest rule or -distance rule can build a graph

  14. KPCA Closely related to MDS algorithm KPCA using Gaussian kernel

  15. Topology Preserving Techniques • Topology  Neighborhood relationship • Topology preservation means two neighboring points in d-dimensions should map to two neighboring points in p-dimension • Distance preservation is too often too rigid; topology preservation techniques can sometimes stretch or shrink point clouds • More flexible; algorithmically more complex

  16. TP Techniques • Can be categorized broadly into • Methods with predefined topology • SOM (Kohonen’s self-organizing map) • Data driven lattice • LLE (locally linear embedding) • Isotop…

  17. Kohonen’s Self-Organizing Maps (SOM) Step 1: Define a 2D lattice indexed by (l, k): l, k =1,…K. Step 2: For a set of data vectors yi, i=1,2,…,N, find a set of prototypes m(l, k). Note that by this indexing (l, k), the prototypes are mapped to the 2D lattice. Step 3: Iterate for each data yi: • Find the closest prototype m (using Euclidean distance in the d-dimensional space): • Update prototypes: (prepared from [HTF] book)

  18. Neighborhood Function for SOM A hard threshold function: Or, a soft threshold function:

  19. Example: Simulated data

  20. SOM for “Swiss Roll” and “Open Box”

  21. Remarks • SOM is actually a constrained k-means • Constrains K-means clusters on a smooth manifold • If only one neighbor (itself) is allowed => K-means • Learning rate () and distance threshold () usually decrease with training iterations • Mostly useful for a visualization tool: typically it cannot map to more than 3 dimensions • Convergence is hard to assess

  22. Locally Linear Embedding • Data driven lattice, unlike SOM on predefined lattice • Topology preserving: it is based on conformal mapping, which is a transformation that preserves angles; LLE is invariant to rotation, translation and scaling • To some extent similar to preserving dot-product • A data point yi is assumed to be a linear combination of its neighbors

  23. LLE Principle Each data point y is a local linear combination: Neighborhood of yi: determined by a graph Constraints on wij: LLE first computes the matrix W by minimizingE. Then it assumes that in the low dimensions the same local linear combination holds: So, it minimizesF with respect to x’s: obtains low dimensional mapping!

  24. LLE Results Let’s visit: http://www.cs.toronto.edu/~roweis/lle/

More Related