220 likes | 422 Views
Random Projections in Dimensionality Reduction Applications to image and text data. Ângelo Cardoso IST/UTL November 2009. Ella Bingham and Heikki Mannila. Outline. Dimensionality Reduction – Motivation Methods for dimensionality reduction PCA DCT Random Projection Results on Image Data
E N D
Random Projections in Dimensionality ReductionApplications to image and text data Ângelo Cardoso IST/UTL November 2009 Ella Bingham and Heikki Mannila
Outline • Dimensionality Reduction – Motivation • Methods for dimensionality reduction • PCA • DCT • Random Projection • Results on Image Data • Results on Text Data • Conclusions
Dimensionality Reduction Motivation • Many applications have high dimensional data • Market basket analysis • Wealth of alternative products • Text • Large vocabulary • Image • Large image window • We want to process the data • High dimensionality of data restricts the choice of data processing methods • Time needed to use processing methods is too long • Memory requirements make it impossible to use some methods
Dimensionality Reduction Motivation • We want to visualize high dimensional data • Some features may be irrelevant • Some dimensions may be highly correlated with some other, e.g. height and foot size • “Intrinsic” dimensionality may be smaller than the number of features • The data can be best described and understood by a smaller number dimensions
Methods for dimensionality reduction • Main idea is to project the high-dimensional (d) space into a lower-dimensional (k) space • A statistically optimal way is to project into a lower-dimensional orthogonal subspace that captures as much variation of the data as possible for the chosen k • The best (in terms of mean squared error ) and most widely used way to do this is PCA • How to compare different methods? • Amount of distortion caused • Computational complexity
* * Second principal component * * First principal component * * * * * * * * * * * * * * * * * * * * Data points Principal Components Analysis (PCA) Intuition • Given an original space in 2d • How can we represent that points in a k-dimensional space (k<=d) while preserving as much information as possible Original axes
Principal Components Analysis (PCA)Algorithm • Eigenvalues • A measure of how much data variance is explained by each eigenvector • Singular Value Decomposition (SVD) • Can be used to find the eigenvectors and eigenvalues of the covariance matrix • To project into the lower-dimensional space • Multiply the principal components (PC’s) by X and subtract the mean of X in each dimension • To restore into the original space • Multiply the projection by the principal components and add the mean of X in each dimension • Algorithm • X Create N x d data matrix, with one row vector xnper data point • X subtract mean x from each dimensionin X • Σ covariance matrix of X • Find eigenvectors and eigenvalues of Σ • PC’s the k eigenvectors with largest eigenvalues
Random Projection (RP)Idea • PCA even when calculated using SVD is computationally expensive • Complexity is O(dcN) • Where d is the number of dimensions, c is the average number of non-zero entries per column and N the number of points • Idea • What if we randomly constructed principal component vectors? • Johnson-Lindenstrauss lemma • If points in vector space are projected onto a randomly selected subspace of suitably high dimensions, then the distances between the points are approximately preserved
Random Projection (RP)Idea • Use a random matrix (R) equivalently to the principal components matrix • R is usually Gaussian distributed • Complexity is O(kcn) • The generated random matrix (R) is usually not orthogonal • Making R orthogonal is computationally expensive • However we can rely on a result by Hecht-Nielsen: • In a high-dimensional space, there exists a much larger number of almost orthogonal than orthogonal directions. • Thus vectors with random directions are close enough to orthogonal • Euclidean distance in the projected space can be scaled to the original space by
Random ProjectionSimplified Random Projection (SRP) • Random matrix is usually gaussian distributed • mean: 0; standart deviation: 1 • Achlioptas showed that a much simpler distribution can be used • This implies further computational savings since the matrix is sparse and the computations can be performed using integer arithmetic's
Discrete Cosine Transform (DCT) • Widely used method for image compression • Optimal for human eye • Distortions are introduced at the highest frequencies which humans tend to neglect as noise • DCT is not data-dependent, in contrast to PCA that needs the eigenvalue decomposition • This makes DCT orders of magnitude cheaper to compute
ResultsNoiseless Images • Original space 2500-d • (100 image pairs with 50x50 pixels) • Error Measurement • Average error on euclidean distance between 100 pairs of images in the original and reduced space • Amount of distortion • RP and SRP give accurate results for very small k (k>10) • Distance scaling might be an explanation for the success • PCA gives accurate results for k>600 • In PCA such scaling is not straightforward • DCT still as a significant error even for k > 600 • Computational complexity • Number of floating point operations for RP and SRP is on the order of 100 times less than PCA • RP and SRP clearly outperform PCA and DCT at smallest dimensions
ResultsNoisy Images • Images were corrupted by salt and pepper impulse noise with probability 0.2 • Error is computed in the high-dimensional noiseless space • RP, SRP, PCA and DCT perform quite similarly to the noiseless case
ResultsText Data • Data set • Newsgroups corpus • sci.crypt, sci.med, sci.space, soc.religion • Pre-processing • Term frequency vectors • Some common terms were removed but no stemming was used • Document vectors normalized to unit length • Data was not made zero mean • Size • 5000 terms • 2262 newsgroup documents • Error measurement • 100 pairs of documents were randomly selected and the error between their cosine before and after the dimensionality reduction was calculated
ResultsText Data • The cosine was used as similarity measure since it is more common for this task • RP is not as accurate as SVD • The Johnson-Lindenstrauss result states that the euclidean distance are retained well in random projection not the cosine • RP error may be neglected in most applications • RP can be used on large document collections with less computational complexity than SVD
Conclusion • Random Projection is an effective dimensionality reduction method for high-dimensional real-world data sets • RP preserves the similarities even if the data is projected into a moderate number of dimensions • RP is beneficial in applications where the distances of the original space are meaningful • RP is a good alternative for traditional dimensionality reduction methods which are infeasible for high dimensional data since it does not suffer from the curse of dimensionality