640 likes | 708 Views
Feature Extraction. 主講人:虞台文. Content. Principal Component Analysis (PCA) Factor Analysis Fisher’s Linear Discriminant Analysis Multiple Discriminant Analysis. Feature Extraction. Principal Component Analysis (PCA). Principle Component Analysis.
E N D
Feature Extraction 主講人:虞台文
Content • Principal Component Analysis (PCA) • Factor Analysis • Fisher’s Linear Discriminant Analysis • Multiple Discriminant Analysis
Feature Extraction Principal Component Analysis (PCA)
Principle Component Analysis • It is a linear procedure to find the direction in input space where most of the energy of the input lies. • Feature Extraction • Dimension Reduction • It is also called the (discrete) Karhunen-Loève transform, or the Hotelling transform.
x w wTx The Basis Concept Assume data x (random vector) has zero mean. PCA finds a unit vectorw to reflect the largest amount of variance of the data. That is, Demo
Remark: C is symmetric and semipositive definite. The Method Covariance Matrix
The Method maximize subject to The method of Lagrange multiplier: Define The extreme point, say, w* satisfies
The Method maximize subject to Setting
Discussion • Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. • They are called the principal components of C. • Their significance can be ordered according to their eigenvalues. At extreme points w is a eigenvector of C, and is its corresponding eigenvalue.
Discussion • Let w1, w2, …, wd be the eigenvectors of C whose corresponding eigenvalues are 1≧ 2 ≧ … ≧ d. • They are called the principal components of C. • Their significance can be ordered according to their eigenvalues. At extreme points • If C is symmetric and semipositive definite, all their eigenvectors are orthogonal. • They, hence, form a basis of the feature space. • For dimensionality reduction, only choose few of them.
Applications • Image Processing • Signal Processing • Compression • Feature Extraction • Pattern Recognition
Example Projecting the data onto the most significant axis will facilitate classification. This also achieves dimensionality reduction.
Issues The most significant component obtained using PCA. • PCA is effective for identifying the multivariate signal distribution. • Hence, it is good for signal reconstruction. • But, it may be inappropriate for pattern classification. The most significant component for classification
Whitening • Whitening is a process that transforms the random vector, say, x = (x1, x2 , …,xn)T (assumed it is zero mean) to, say, z = (z1, z2 , …,zn)T with zero mean and unit variance. • zis said to be white or sphered. • This implies that all of its elements are uncorrelated. • However, this doesn’t implies its elements are independent.
Clearly, D is a diagonal matrix and E is an orthonormal matrix. Whitening Transform Let V be a whitening transform, then Decompose Cx as Set
Whitening Transform If V is a whitening transform, and U is any orthonormal matrix, show that UV, i.e., rotation, is also a whitening transform. Proof)
Why Whitening? • With PCA, we usually choose several major eigenvectors as the basis for representation. • This basis is efficient for reconstruction, but may be inappropriate for other applications, e.g., classification. • By whitening, we can rotate the basis to get more interesting features.
Feature Extraction Factor Analysis
What is a Factor? • If several variables correlate highly, they might measure aspects of a common underlying dimension. • These dimensions are called factors. • Factors are classification axis along which the measures can be plotted. • The greater the loading of variables on a factor, the more that factor can explain intercorrelations between those variables.
Verbal Skill (F2) +1 +1 1 Quantitative Skill (F1) 1 Graph Representation
What is Factor Analysis? • A method for investigating whether a number of variables of interestY1, Y2, …, Yn, are linearly related to a smaller number of unobservable factorsF1, F2, …, Fm. • For datareduction and summarization. • Statistical approach to analyze interrelationships among the large number of variables & to explain these variables in term of their common underlying dimensions (factors).
What factors influence students’ grades? Quantitative skill? unobservable Example Verbal skill? Observable Data
The Model y: Observation Vector B: Factor-Loading Matrix f: Factor Vector : Gaussian-Noise Matrix
The Model y: Observation Vector B: Factor-Loading Matrix f: Factor Vector : Gaussian-Noise Matrix
The Model Can be obtained from the model Can be estimated from data
The Model Commuality Specific Variance Explained Unexplained
Cy BBT + Q = Example
Goal Our goal is to minimize Hence,
Uniqueness Is the solution unique? There are infinite number of solutions. Since if B* is a solution and T is an orthonormal transformation (rotation), then BT is also a solution.
Cy= Example Which one is better?
i2 i2 i1 i1 Left:each factor have nonzero loading for all variables. Example Right:each factor controls different variables.
The Method • Determine the first set of loadings using principal component method.
Cy Example
Factor Rotation Factor-Loading Matrix Rotation Matrix Factor Rotation:
Factor Rotation • Varimax • Quartimax • Equimax • Orthomax • Oblimin Criteria: Factor-Loading Matrix Factor Rotation:
. . . Criterion: Maxmize Varimax Subject to Let
Criterion: Maxmize Varimax Subject to Construct the Lagrangian
Varimax dk cjk bjk
Varimax Define is the kth column of
Varimax is the kth column of
Varimax Goal: reaches maximum once
Iteratively execute the following procedure: evaluate and You need information of B1. find and such that Next slide if stop Repeat Varimax Goal: • Initially, • obtain B0 by whatever method, e.g., PCA. • set T0 as the approximation rotation matrix, e.g., T0=I.
Initially, • obtain B0 by whatever method, e.g., PCA. • set T0 as the approximation rotation matrix, e.g., T0=I. Iteratively execute the following procedure: evaluate and You need information of B1. find and such that Next slide if stop Repeat Varimax Goal: Pre-multiplying each side by its transpose.
. . . Varimax Criterion: Maximize
Maximize Varimax Let
Feature Extraction Fisher’s Linear Discriminant Analysis
Main Concept • PCA seeks directions that are efficient for representation. • Discriminant analysis seeks directions that are efficient for discrimination.
m m 1 2 Criterion Two-Category ||w|| = 1 w
m m 1 2 Between-Class Scatter Matrix Scatter ||w|| = 1 w Between-Class Scatter The larger the better