1 / 52

Project 11: Determining the Intrinsic Dimensionality of a Distribution

Project 11: Determining the Intrinsic Dimensionality of a Distribution. Okke Formsma, Nicolas Roussis and Per Løwenborg. Outline. About the project What is intrinsic dimensionality? How can we assess the ID? PCA Neural Network Dimensionality Estimators Experimental Results.

adin
Download Presentation

Project 11: Determining the Intrinsic Dimensionality of a Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussisand Per Løwenborg

  2. Outline • About the project • What is intrinsic dimensionality? • How can we assess the ID? • PCA • Neural Network • Dimensionality Estimators • Experimental Results

  3. Why did we chose this Project? • We wanted to learn more about developing and experiment with algorithms for analyzing high-dimensional data • We want to see how we can implement this into an application

  4. Papers N. Kambhatla, T. Leen, “Dimension Reduction by Local Principal Component Analysis” J. Bruske and G. Sommer, “Intrinsic Dimensionality Estimation with Optimally Topology Preserving Maps” P. Verveer, R. Duin, “An evaluation of intrinsic dimensionality estimators”

  5. Howdoesdimensionalityreductioninfluenceour lives? • Compress images, audio and video • Redusingnoise • Editing • Reconstruction

  6. This is a image goingthroughdifferentsteps in a reconstruction

  7. Intrinsic Dimensionality The number of ‘free’ parameters needed to generate a pattern Ex: • f(x) = -x² => 1 dimensional • f(x,y) = -x² => 1 dimensional

  8. LOCAL Principal Component analysis

  9. Principal Component Analysis (PCA) • The classic technique for linear dimension reduction. • It is a vector space transformation which reduce multidimensional data sets to lower dimensions for analysis. • It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences.

  10. Advantages of PCA • Since patterns in data can be hard to find in data of high dimension, where the luxury of graphical representation is not available, PCA is a powerful tool for analysing data. • Once you have found these patterns in the data, you can compress the data, -by reducing the number of dimensions- without much loss of information.

  11. Example

  12. Problems with PCA • Data might be uncorrelated, but PCA relies on second-order statistics (correlation), so sometimes it fails to find the most compact description of the data.

  13. Problems with PCA

  14. First eigenvector

  15. Second eigenvector

  16. A better solution?

  17. Local eigenvector

  18. Local eigenvectors

  19. Local eigenvectors

  20. Another problem

  21. Is this the principal eigenvector?

  22. Or do we need more than one?

  23. Choose

  24. The answer depends on your application Low resolution High resolution

  25. Challenges • How to partition the space? • How many partitions should we use? • How many dimensions should we retain?

  26. How to partition the space? Vector Quantization Lloyd Algorithm Partition the space in k sets Repeat until convergence: Calculate the centroids of each set Associate each point with the nearest centroid

  27. Lloyd Algorithm Step 1: randomly assign Set 1 Set 2

  28. Lloyd Algorithm Step 2: Calculate centriods Set 1 Set 2

  29. Lloyd Algorithm Step 3: Associate points with nearest centroid Set 1 Set 2

  30. Lloyd Algorithm Step 2: Calculate centroids Set 1 Set 2

  31. Lloyd Algorithm Step 3: Associate points with nearest centroid Set 1 Set 2

  32. Lloyd Algorithm Result after 2 iterations: Set 1 Set 2

  33. How many partitions should we use? Bruske & Sommer: “just try them all” For k = 1 to k ≤ dimension(set): • Subdivide the space in k regions • Perform PCA on each region • Retain significant eigenvalues per region

  34. Which eigenvalues are significant? Depends on: • Intrinsic dimensionality • Curvature of the (hyper-)surface • Noise

  35. Which eigenvalues are significant? Discussed in class: • Largest-n In papers: • Cutoff after normalization (Bruske & Summer) • Statistical method (Verveer & Duin)

  36. Which eigenvalues are significant? (Bruske and Summer) Cutoff after normalization µx is the xtheigenvalue With α = 5, 10 or 20.

  37. Which eigenvalues are significant? Statistical method (Verveer & Duin) Calculate the error rate on the reconstructed data if the lowest eigenvalue is dropped Decide whether this error rate is significant

  38. Error distance for local PCA (Kambhatla and Leen) • Euclidean Distance • Reconstruction Distance

  39. Results • One dimensional space, embedded in 256*256 = 65,536 dimensions • 180 images of rotatingcylinder • ID = 1

  40. Results

  41. Neural Network PCA

  42. Basic Computational Element - Neuron • Inputs/Outputs, Synaptic Weights, Activation Function

  43. 3-Layer Autoassociators • N input, N output and M<N hidden neurons. • Drawbacks for this model. The optimal solution remains the PCA projection.

  44. 5-Layer Autoassociators • Neural Network Approximators for principal surfaces using 5-layers of neurons. • Global, non-linear dimension reduction technique. • Succesfull implementation of nonlinear PCA using these networks for image and speech dimension reduction and for obtaining concise representations of color.

  45. Third layer carries the dimension reduced representation, has width M<N • Linear functions used for representation layer. • The networks are trained to minimize MSE training criteria. • Approximators of principal surfaces.

  46. Locally Linear Approach to nonlinear dimension reduction (Local PCA Algorithm) • Much faster than to train five-layer autoassociators and provide superior solutions. • This algorithm attempts to minimize the MSE (like 5-layers autoassociators) between the original data and its reconstruction from a low-dimensional representation. (reconstruction error)

  47. 5-layer Auto-associators vs.Local PCA (VQPCA) • Difficulty to train 5-layer auto-associators. Faster training in VQPCA algorithm. (VQPCA can be accelerated using tree-structured or multistage VQ) • 5-layer auto-associatorsare prone to trapping in poor local optimal. • VQPCA slower for encoding new data but much faster for decoding.

  48. 5-layer Auto-associators vs.Local PCA (VQPCA) • The results of 1st paper indicate that VQPCA is not suitable for real-time applications (e.g videoconferencing) where we need very fast encoding. • For decoding only (e.g image retrieval from databases), VQPCA is a good choice - accurate and fast- .

  49. Estimate new dimensionality 2 Algorithms proposed: • Local Eigenvalue Algorithm: based on the local eigenvalues of the covariance matrix in small regions in feature space • The k-Nearest Neighbor Algorithm: based on the distribution of the distances from an arbitrary data vector to a number (k) of its neighbors. Both work but not always!

More Related