Comparing PLS vs SVD in Dimensionality Reduction

PLS Vs. SVD inDimensionalityReduction Paul Hsiung December 3, 2002 16-811

Problem • Curse of dimensionality • Very sparse data: • A lot of 0’s • Some attributes irrelevant • Others are repeated • Many machine learning algorithm are infeasible at high dimensions. • Compound dataset example…

SVD Quick Review • Find the axis with greatest variance. • Project your data unto this axis. • Let the top n eigenvectors be the space of your new decomposed data. x2 e2 e1 x1

Partial Least Squares: Intuition 1 • SVD max the variance of X, PLS max the covariance of X and Y. • SVD does not factor in Y when decomposing. • A good picture would be…

Linear Regression • Given data output Y, input X • Find a w such that wTx best approximates Y in the least square sense. • The magical formula for w is y  w   1  x

PLS: Intuition 2 • Problem with linear regression is… • PLS does as the name says, it finds the least squares except it’s partial. • As it builds Bpls, it will decompose X into T • We can control how many dimension T has by the number of iterations in PLS

PLS: No Guts, No Glory

PLS: Aftermath • Collect all small t1…tn into T. Same for P, B, and W. • Notice that T is s x n and that’s our decomposed dataset. • We define . R will transform any X into T. • Prediction is done byQ is a column of 1’s.

Dataset • Training set is 26,000 by 6000. Test is 1,400 by 6000. • Single output • Very sparse… lots of 0’s • Used ROC curve to rank results.

PLS: Overfitting

Conclusion • PLS at dim 10 is equivalent to SVD at dim 100. But SVD is slightly better in the high dimensions. • PLS tends to overfit after dim 10. • PLS as a predictor works pretty well.

Comparing PLS vs SVD in Dimensionality Reduction