680 likes | 1.26k Views
Introduction to Kernel Principal Component Analysis(PCA). Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com. 1. Contents. Basics of PCA Application of PCA in Face Recognition Some Terms in PCA Motivation for KPCA Basics of KPCA Applications of KPCA.
E N D
Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com 1
Contents • Basics of PCA • Application of PCA in Face Recognition • Some Terms in PCA • Motivation for KPCA • Basics of KPCA • Applications of KPCA
High-dimensional Data Gene expression Face images Handwritten digits
Why Feature Reduction? • Most machine learning and data mining techniques may not be effective for high-dimensional data • Curse of Dimensionality • Query accuracy and efficiency degrade rapidly as the dimension increases. • The intrinsic dimension may be small. • For example, the number of genes responsible for a certain type of disease may be small.
Why Reduce Dimensionality? • Reduces time complexity: Less computation • Reduces space complexity: Less parameters • Saves the cost of observing the feature • Simpler models are more robust on small datasets • More interpretable; simpler explanation • Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions
Feature reduction algorithms • Unsupervised • Latent Semantic Indexing (LSI): truncated SVD • Independent Component Analysis (ICA) • Principal Component Analysis (PCA) • Canonical Correlation Analysis (CCA) • Supervised • Linear Discriminant Analysis (LDA) • Semi-supervised • Research topic
Algebraic derivation of PCs • Main steps for computing PCs • Form the covariance matrix S. • Compute its eigenvectors: • Use the first d eigenvectors to form the d PCs. • The transformation G is given by
Optimality property of PCA Reconstruction Dimension reduction Original data
Optimality property of PCA Main theoretical result: The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem: reconstruction error PCA projection minimizes the reconstruction error among all linear projections of size d.
Dimensionality Reduction • One approach to deal with high dimensional data is by reducing their dimensionality. • Project high dimensional data onto a lower dimensional sub-space using linear or non-linear transformations.
Dimensionality Reduction • Linear transformations are simple to compute and tractable. • Classical –linear- approaches: • Principal Component Analysis (PCA) • Fisher Discriminant Analysis (FDA) –Singular Value Decomosition (SVD) --Factor Analysis (FA) --Canonical Correlation(CCA) k x 1 k x d d x 1 (k<<d)
Principal Component Analysis (PCA) • Each dimensionality reduction technique finds an appropriate transformation by satisfying certain criteria (e.g., information loss, data discrimination, etc.) • The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the dataset.
Principal Component Analysis (PCA) • Find a basis in a low dimensional sub-space: • Approximate vectors by projecting them in a low dimensional sub-space: • (1) Original space representation: • (2) Lower-dimensional sub-space representation: • Note:if K=N, then
Principal Component Analysis (PCA) • Example (K=N):
Principal Component Analysis (PCA) • Methodology • Suppose x1, x2, ..., xMare N x 1 vectors
Principal Component Analysis (PCA) • Methodology – cont.
Principal Component Analysis (PCA) • Linear transformation implied by PCA • The linear transformation RN RKthat performs the dimensionality reduction is:
Principal Component Analysis (PCA) • How many principal components to keep? • To choose K, you can use the following criterion: Unfortunately for some data sets to meet this requirement we need K almost equal to N. That is, no effective data reduction is possible.
Principal Component Analysis (PCA) • Eigenvalue spectrum λN K λi Scree plot
Principal Component Analysis (PCA) • Standardization • The principal components are dependent on theunitsused to measure the original variables as well as on therangeof values they assume. • We should always standardize the data prior to using PCA. • A common standardization method is to transform all the data to have zero mean and unit standard deviation:
CS 479/679Pattern Recognition – Spring 2006Dimensionality Reduction Using PCA/LDAChapter 3 (Duda et al.) – Section 3.8 Case Studies: Face Recognition Using Dimensionality Reduction M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), pp. 831-836, 1996. A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, 2001.
Principal Component Analysis (PCA) • Face Recognition • The simplest approach is to think of it as a template matching problem • Problems arise when performing recognition in a high-dimensional space. • Significant improvements can be achieved by first mapping the data into a lower dimensionality space. • How to find this lower-dimensional space?
Principal Component Analysis (PCA) • Main idea behind eigenfaces average face
Principal Component Analysis (PCA) • Computation of the eigenfaces
Principal Component Analysis (PCA) • Computation of the eigenfaces – cont.
Principal Component Analysis (PCA) • Computation of the eigenfaces – cont. Mind that this is normalized.. ui
Principal Component Analysis (PCA) • Computation of the eigenfaces – cont.
Principal Component Analysis (PCA) • Representing faces onto this basis
Principal Component Analysis (PCA) • Representing faces onto this basis – cont.
Principal Component Analysis (PCA) • Face Recognition Using Eigenfaces
Principal Component Analysis (PCA) • Face Recognition Using Eigenfaces – cont. • The distance eris called distance within the face space (difs) • Comment: we can use the common Euclidean distance to compute er, however, it has been reported that the Mahalanobis distance performs better:
Principal Component Analysis (PCA) • Face Detection Using Eigenfaces
Principal Component Analysis (PCA) • Face Detection Using Eigenfaces – cont.
Principal Components Analysis So, principal components are given by: b1 = u11x1 + u12x2 + ... + u1NxN b2 = u21x1 + u22x2 + ... + u2NxN ... bN= aN1x1 + aN2x2 + ... + aNNxN xj’s are standardized if correlation matrix is used (mean 0.0, SD 1.0) • Score of ith unit on jth principal component • bi,j = uj1xi1 + uj2xi2 + ... + ujNxiN
xi2 bi,1 bi,2 xi1 PCA Scores
Principal Components Analysis Amount of variance accounted for by: 1st principal component, λ1, 1st eigenvalue 2nd principal component, λ2, 2ndeigenvalue ... λ1>λ2>λ3> λ4> ... Average λj = 1 (correlation matrix)
λ2 λ1 Principal Components Analysis:Eigenvalues U1
PCA: Terminology • jth principal component is jth eigenvector of correlation/covariance matrix • coefficients, ujk, are elements of eigenvectors and relate original variables (standardized if using correlation matrix) to components • scores are values of units on components (produced using coefficients) • amount of variance accounted for by component is given by eigenvalue, λj • proportion of variance accounted for by component is given by λj / Σ λj • loading of kth original variable on jth component is given by ujk √λj--correlation between variable and component
Principal Components Analysis • Covariance Matrix: • Variables must be in same units • Emphasizes variables with most variance • Mean eigenvalue ≠1.0 • Useful in morphometrics, a few other cases • Correlation Matrix: • Variables are standardized (mean 0.0, SD 1.0) • Variables can be in different units • All variables have same impact on analysis • Mean eigenvalue = 1.0
PCA: Potential Problems • Lack of Independence • NO PROBLEM • Lack of Normality • Normality desirable but not essential • Lack of Precision • Precision desirable but not essential • Many Zeroes in Data Matrix • Problem (use Correspondence Analysis)
Principal Component Analysis (PCA) • PCA and classification (cont’d)
Motivation ???????
Motivation Linear projections will not detect the pattern.
Limitations of linear PCA 1,2,3=1/3
Nonlinear PCA Three popular methods are available: • Neural-network based PCA (E. Oja, 1982) 2)Method of Principal Curves (T.J. Hastie and W. Stuetzle, 1989) 3) Kernel based PCA (B. Schölkopf, A. Smola, and K. Müller, 1998)
PCA NPCA
A Useful Theorem for Hilbert space LetHbe a Hilbert space and x1, ……xn inH. Let M=span{x1, ……xn}. Also u and v in M. <xi,u>=<xi,v>, i=1,……,n implies u=v Proof. Try your self.
Kernel methods in PCA Linear PCA where C is covariance matrix for centered data X: (2) (1) and (2) are equivalent conditions.