1 / 20

Random Projections in Dimensionality Reduction Applications to image and text data

Random Projections in Dimensionality Reduction Applications to image and text data. Ângelo Cardoso IST/UTL November 2009. Ella Bingham and Heikki Mannila. Outline. Dimensionality Reduction – Motivation Methods for dimensionality reduction PCA DCT Random Projection Results on Image Data

leanna
Download Presentation

Random Projections in Dimensionality Reduction Applications to image and text data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Random Projections in Dimensionality ReductionApplications to image and text data Ângelo Cardoso IST/UTL November 2009 Ella Bingham and Heikki Mannila

  2. Outline • Dimensionality Reduction – Motivation • Methods for dimensionality reduction • PCA • DCT • Random Projection • Results on Image Data • Results on Text Data • Conclusions

  3. Dimensionality Reduction Motivation • Many applications have high dimensional data • Market basket analysis • Wealth of alternative products • Text • Large vocabulary • Image • Large image window • We want to process the data • High dimensionality of data restricts the choice of data processing methods • Time needed to use processing methods is too long • Memory requirements make it impossible to use some methods

  4. Dimensionality Reduction Motivation • We want to visualize high dimensional data • Some features may be irrelevant • Some dimensions may be highly correlated with some other, e.g. height and foot size • “Intrinsic” dimensionality may be smaller than the number of features • The data can be best described and understood by a smaller number dimensions

  5. Methods for dimensionality reduction • Main idea is to project the high-dimensional (d) space into a lower-dimensional (k) space • A statistically optimal way is to project into a lower-dimensional orthogonal subspace that captures as much variation of the data as possible for the chosen k • The best (in terms of mean squared error ) and most widely used way to do this is PCA • How to compare different methods? • Amount of distortion caused • Computational complexity

  6. * * Second principal component * * First principal component * * * * * * * * * * * * * * * * * * * * Data points Principal Components Analysis (PCA) Intuition • Given an original space in 2d • How can we represent that points in a k-dimensional space (k<=d) while preserving as much information as possible Original axes

  7. Principal Components Analysis (PCA)Algorithm • Eigenvalues • A measure of how much data variance is explained by each eigenvector • Singular Value Decomposition (SVD) • Can be used to find the eigenvectors and eigenvalues of the covariance matrix • To project into the lower-dimensional space • Multiply the principal components (PC’s) by X and subtract the mean of X in each dimension • To restore into the original space • Multiply the projection by the principal components and add the mean of X in each dimension • Algorithm • X  Create N x d data matrix, with one row vector xnper data point • X subtract mean x from each dimensionin X • Σ  covariance matrix of X • Find eigenvectors and eigenvalues of Σ • PC’s  the k eigenvectors with largest eigenvalues

  8. Random Projection (RP)Idea • PCA even when calculated using SVD is computationally expensive • Complexity is O(dcN) • Where d is the number of dimensions, c is the average number of non-zero entries per column and N the number of points • Idea • What if we randomly constructed principal component vectors? • Johnson-Lindenstrauss lemma • If points in vector space are projected onto a randomly selected subspace of suitably high dimensions, then the distances between the points are approximately preserved

  9. Random Projection (RP)Idea • Use a random matrix (R) equivalently to the principal components matrix • R is usually Gaussian distributed • Complexity is O(kcn) • The generated random matrix (R) is usually not orthogonal • Making R orthogonal is computationally expensive • However we can rely on a result by Hecht-Nielsen: • In a high-dimensional space, there exists a much larger number of almost orthogonal than orthogonal directions. • Thus vectors with random directions are close enough to orthogonal • Euclidean distance in the projected space can be scaled to the original space by

  10. Random ProjectionSimplified Random Projection (SRP) • Random matrix is usually gaussian distributed • mean: 0; standart deviation: 1 • Achlioptas showed that a much simpler distribution can be used • This implies further computational savings since the matrix is sparse and the computations can be performed using integer arithmetic's

  11. Discrete Cosine Transform (DCT) • Widely used method for image compression • Optimal for human eye • Distortions are introduced at the highest frequencies which humans tend to neglect as noise • DCT is not data-dependent, in contrast to PCA that needs the eigenvalue decomposition • This makes DCT orders of magnitude cheaper to compute

  12. ResultsNoiseless Images

  13. ResultsNoiseless Images

  14. ResultsNoiseless Images • Original space 2500-d • (100 image pairs with 50x50 pixels) • Error Measurement • Average error on euclidean distance between 100 pairs of images in the original and reduced space • Amount of distortion • RP and SRP give accurate results for very small k (k>10) • Distance scaling might be an explanation for the success • PCA gives accurate results for k>600 • In PCA such scaling is not straightforward • DCT still as a significant error even for k > 600 • Computational complexity • Number of floating point operations for RP and SRP is on the order of 100 times less than PCA • RP and SRP clearly outperform PCA and DCT at smallest dimensions

  15. ResultsNoisy Images • Images were corrupted by salt and pepper impulse noise with probability 0.2 • Error is computed in the high-dimensional noiseless space • RP, SRP, PCA and DCT perform quite similarly to the noiseless case

  16. ResultsText Data • Data set • Newsgroups corpus • sci.crypt, sci.med, sci.space, soc.religion • Pre-processing • Term frequency vectors • Some common terms were removed but no stemming was used • Document vectors normalized to unit length • Data was not made zero mean • Size • 5000 terms • 2262 newsgroup documents • Error measurement • 100 pairs of documents were randomly selected and the error between their cosine before and after the dimensionality reduction was calculated

  17. ResultsText Data

  18. ResultsText Data • The cosine was used as similarity measure since it is more common for this task • RP is not as accurate as SVD • The Johnson-Lindenstrauss result states that the euclidean distance are retained well in random projection not the cosine • RP error may be neglected in most applications • RP can be used on large document collections with less computational complexity than SVD

  19. Conclusion • Random Projection is an effective dimensionality reduction method for high-dimensional real-world data sets • RP preserves the similarities even if the data is projected into a moderate number of dimensions • RP is beneficial in applications where the distances of the original space are meaningful • RP is a good alternative for traditional dimensionality reduction methods which are infeasible for high dimensional data since it does not suffer from the curse of dimensionality

  20. Questions

More Related