180 likes | 287 Views
Is PCA enough ?. Irena Váňová. A 1. A 2. A 3. Dot product. B. for i=1...N if then end end. *. *. *. *. *. o. o. *. *. o. *. o. *. o. o. Perceptron algorithm. … labels of classes. repeat until no sample is misclassified.
E N D
Is PCA enough? Irena Váňová
. . . A1 A2 A3 Dot product B
for i=1...N if then end end * * * * * o o * * o * o * o o Perceptron algorithm … labels of classes repeat until no sample is misclassified
Find the coefficients is equivalent to find Gram matrix Rewitten algorithm – dual form repeat for i=1...N if then end end until no sample is misclassified • In the dual representation, the data points only appear inside dot products • Many algorithms have dual form
Mapping to higher dimensions • Perceptron works for linear separable problems • There is a computational problem (very large vectors) • Kernel trick:
Example of kernels • Polynomial kernels • Gaussian kernels • Infinit dimensions • Separated by a hyperplane • Good kernel? • Bad kernel! • Almost diagonal
repeat for i=1...N if then end end until no sample ismisclassified Kernel Perceptron • We precompute • We are in implicitly in higher dimensions (too high?) • Generalization problem - easy to overfit in high dimensional spaces
Kernel trick • Kernel function • Use: replacing dot products with kernels • Implicit mapping to feature space • Solve the computational problem • Can make it possible to use infinite dimensions • Conditons: continuous, symmetric, positive definite • Information ‘bottleneck’: contains all necessaryinformation for the learning algorithm • Fuses information about the data AND the kernel
PCA • Orthogonal linear transformation • The greatest variance = first coordinate, … • Rotation around mean value • Dimensionality reduction • many dimensions = high correlation
Singular value decomposition • W,T – unitary matrix ( ) • Columns of W,V? • Basis vector, eigenvectors of XTX, resp. XXT n n m n m m n
PCA • Data with zero mean, SVD n m n m m m covariance matrix (1/n) eigenvectors
Kernel PCA • Projections of data onto few larger eigenvectors equation for PCA kernel function equation for high-dim. PCA We don’t know eigenvector explicitly - only vector of numbers which identify the vector projection onto k-th eigenvector
If something does wrong PCA is blind
LDA • fundamental assumption: normal distribution • First: same covariance matrix , full rank
LDA • fundamental assumption: normal distribution • First: only full rank • Kernel variant
Example LDA • Face recognition – eigenfaces