640 likes | 992 Views
12 Discriminant Analysis. Discriminant Analysis. Introduction ✔ Two-group discriminant analysis Mahalanobis approach Linear discriminant analysis Discussions with LDA. Potential Applications. One of main methods for feature extraction in pattern recognition. Introduction.
E N D
Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis • Mahalanobis approach • Linear discriminant analysis • Discussions with LDA
Potential Applications One of main methods for feature extraction in pattern recognition
Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis ✔ • Mahalanobis approach • Linear discriminant analysis • Discussions with LDA
Ronald Aylmer Fisher(1890-1962) • Brilliant biologists • Development of methods suitable for small samples • Discovery of the precise distributions of many sample statistics • Invention of analysis of variance.
Fisher’s approach (1936) Choose k to maximize L: where Cb=dd’, d=μ2-μ1 is a vector describing the difference between the two group means, Cw is the pooled within-group covariance matrix of X. Solution: k∝Cw-1d
Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis ✔ • Mahalanobis approach ✔ • Linear discriminant analysis • Discussions with LDA
Discriminant Analysis • Introduction ✔ • Two-group discriminant analysis ✔ • Mahalanobis approach ✔ • Linear discriminant analysis ✔ • Discussions with LDA
Linear discriminant analysis • Fisher vector [1936] • Linear discriminant analysis [1962] • Foley-Sammon optimal discriminant vectors [1975] • Uncorrelated optimal discriminant vectors [1999]
d_opt Dimensionality problem The accuracy of statistical pattern classifiers increases as the number of features increases, and decreases as the number of features becomes too large. Hughes, 1968
Feature selection and feature extraction • Reducing the number of features • Increasing discriminantinformation of features
Fisher vector In 1936, Fisher proposed to construct a 1D feature space by projecting the high-dimensional feature vector on a discriminant direction (vector). Fisher criterion:
Fisher vector (cont’d) Fisher criterion: where
Fisher vector (cont’d) Suppose Xij be the jth sample of the class ωi (i=1,…,L, j=1,…, Ni)
Fisher vector (cont’d) Fisher vector is the eigen-vector corresponding to maximum eigen-value of the following eigen-equation: • Problem • One discriminant vector is not enough.
Linear discriminant analysis (LDA) Linear transformation fromntom: Discriminant classifiability criterion
Linear discriminant analysis (LDA) (cont’d) Wilks (1962): • m=L-1 for L-class problems • The optimal transformation matrix is composed of (L-1) eigen-vectors of the matrix (Sw-1Sb) Question: Can we extract more than (L-1) discriminant vectors for L-class problems
Foley-Sammon optimal discriminant vectors[1975] Let 1be Fisher vector. Suppose r directions (1 ,2, …,r)are obtained. We can obtain the (r+1)th direction r+1which maximizes the Fisher criterion function with the following orthogonality constraints:
Uncorrelated optimal discriminant vectors[1999] Let 1be Fisher vector. Suppose r directions (1 , 2, …,r)are obtained. We can obtain the (r+1)th direction r+1which maximizes the Fisher criterion function with the following conjugateorthogonality constraints:
Uncorrelated optimal discriminant vectors[cont’d] • UODV is shown to be much more powerful than FSODV • Arguments • From 1985 to 1991, Okada ,Tomita, Hamamoto, Kanaoka, et al. claimed that the orthgonal set of discriminant vectors is more powerful. • In 1977, Kittler proposed a method based on conjugate orthogonality constraints, which was showed to be powerful than FSODV[1975].
Uncorrelated optimal discriminant vectors[cont’d] • There are (L-1) uncorrelated optimal discriminant vectors for L-class problems • UODV is equivalent to LDA (Pattern Recognition, 34(10):2041-2047, 2001)
Uncorrelated optimal discriminant vectors[cont’d] Significance • A better understanding to LDA. • A link between two discriminant criteria:
人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/
人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/
人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/人脸识别方向科研影响力统计http://sciencewatch.com/ana/st/face/
Problems with LDA • For a large number of L, the dimensionality may be much less than (L-1). More work is needed to discuss the relationship among the dimensionality, the size of database, etc. • LDA fails for a simple example as follows. More work is needed to combine discriminant analysis with unsupervised clustering techniques.
Key assumption for LDA • Each class has a mean vector around which the samples are distributed . • All the classes have similar covariance matrices. ?
PCA mixture model + LDA Problem: Distribution in high dimensional space by using only a small number of samples.