1 / 48

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis. Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007. Clustering. Clustering : grouping of data based on similarity measures. Classification.

larya
Download Presentation

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis Haesun Park Georgia Institute of Technology, Atlanta, GA, USA (joint work with C. Park) KAIST, Korea, June 2007

  2. Clustering

  3. Clustering : • grouping of data based on similarity measures

  4. Classification • Classification: • assign a class label to new unseen data

  5. Data Mining • Mining or discovery of new information - patterns • or rules - from large databases Data Preparation Data Reduction • Dimension reduction • Feature Selection • - Preprocessing Feature Extraction • Association Analysis • Regression • Probabilistic modeling … Classification Clustering

  6. Feature Extraction • Optimal feature extraction • - Reduce the dimensionality of data space • - Minimize effects of redundant features and noise Curse of dimensionality number of features new data .. .. .. feature extraction Apply a classifier to predict a class label of new data .. .. ..

  7. Linear dimension reduction Maximize class separability in the reduced dimensional space

  8. Linear dimension reduction Maximize class separability in the reduced dimensional space

  9. What if data is not linear separable? Nonlinear Dimension Reduction

  10. Contents • Linear Discriminant Analysis • Nonlinear Dimension Reduction based on Kernel Methods - Nonlinear Discriminant Analysis • Application to Fingerprint Classification

  11. Linear Discriminant Analysis (LDA) For a given data set {a1,┉,an } Centroids : • Within-class scatter matrix • trace(Sw)

  12. Between-class scatter matrix • trace(Sb) a1┉ an GTa1┉ GTan GT → trace(GTSbG) maximize minimize trace(GTSwG)

  13. Eigenvalue problem G = Sw-1 Sb Sw-1Sb X =  X rank(Sb) number of classes - 1

  14. Face Recognition dimension reduction to maximize the distances among classes. … 92 x 112 ? 10304 GT … … …

  15. Text Classification • A bag of words: each document is represented with frequencies of words contained Education Recreation Faculty Student Syllabus Grade Tuition …. Movie Music Sport Hollywood Theater ….. GT

  16. Generalized LDA Algorithms • Undersampled problems: • high dimensionality & small number of data •  Can’t compute Sw-1Sb Sb Sw

  17. Nonlinear Dimension Reductionbased on Kernel Methods

  18. Nonlinear Dimension Reduction nonlinear mapping linear dimension reduction GT

  19. Kernel Method • If a kernel function k(x,y) satisfies Mercer’s condition, then there exists a mapping  for which <(x),(y)>= k(x,y) holds  A (A) < x, y > < (x), (y) > = k(x,y) • For a finite data set A=[a1,…,an], Mercer’s condition can be rephrased as the kernel matrix • is positive semi-definite.

  20. Nonlinear Dimension Reduction by Kernel Methods Given a kernel function k(x,y) linear dimension reduction GT

  21. Positive Definite Kernel Functions • Gaussian kernel • Polynomial kernel

  22. Nonlinear Discriminant Analysis using Kernel Methods {a1,a2,…,an} {(a1),…,(an)}  Want to apply LDA <(x),(y)>= k(x,y) Sb x= Sw x

  23. Nonlinear Discriminant Analysis using Kernel Methods {a1,a2,…,an} {(a1),…,(an)}  k(a1,a1) k(a1,an) … ,…, … k(an,a1) k(an,an) Sbu= Swu Sb x= Sw x Apply Generalized LDA Algorithms

  24. Generalized LDA Algorithms Sb Sw Minimizetrace(xT Sw x) xT Sw x = 0 x null(Sw) Maximizetrace(xT Sb x) xT Sb x ≠ 0 x range(Sb)

  25. Generalized LDA algorithms RLDA • Add a positive diagonal matrix I to Swso that Sw+I is nonsingular • Apply the generalized singular value • decomposition (GSVD) to {Hw , Hb} • in Sb = Hb HbT and Sw=Hw HwT LDA/GSVD To-N(Sw) • Projection to null space of Sw • Maximize between-class scatter • in the projected space

  26. Generalized LDAAlgorithms To-R(Sb) • Transformation to range space of Sb • Diagonalize within-class scatter matrix in the transformed space • Reduce data dimension by PCA • Maximize between-class scatter • in range(Sw) and null(Sw) To-NR(Sw)

  27. Data sets From Machine Learning Repository Database Data dim no. of data no. of classes Musk 166 6599 2 Isolet 617 7797 26 Car 6 1728 4 Mfeature 649 2000 10 Bcancer 9 699 2 Bscale 4 625 3

  28. Experimental Settings Original data Split Training data Test data kernel function k and a linear transf. GT Dimension reducing Predict class labels of test data using training data

  29. Prediction accuracies methods • Each color represents different data sets

  30. Linear and Nonlinear Discriminant Analysis Data sets

  31. Face Recognition

  32. Application of Nonlinear Discriminant Analysis to Fingerprint Classification

  33. Fingerprint Classification Left Loop Right Loop Whorl Arch Tented Arch From NIST Fingerprint database 4

  34. Previous Works in Fingerprint Classification Apply Classifiers: Neural Networks Support Vector Machines Probabilistic NN Feature representation Minutiae Gabor filtering Directional partitioning Our Approach Construct core directional images by DFT Dimension Reduction by Nonlinear Discriminant Analysis

  35. Construction of Core Directional Images Left Loop Right Loop Whorl

  36. Construction of Core Directional Images Core Point

  37. Discrete Fourier transform (DFT)

  38. Discrete Fourier transform (DFT)

  39. Construction of Directional Images • Computation of local dominant directions by DFT and directional filtering • Core point detection • Reconstruction of core directional images • Fast computation of DFT by FFT • Reliable for low quality images

  40. Computation of local dominant directions by DFT and directional filtering

  41. Construction of Directional Images 512 x 512 105 x 105

  42. Nonlinear discriminant Analysis 105 x 105 Maximizing class separability in the reduced dimensional space … Right loop Whorl Left loop … GT Tented arch Arch 4-dim. space 11025-dim. space

  43. Comparison of Experimental Results NIST Database 4 Rejection rate (%) 0 1.8 8.5 20.0 Nonlinear LDA/GSVD90.791.392.8 95.3 PCASYS +  89.7 90.5 92.895.6 Jain et.al. [1999,TPAMI] - 90.0 91.2 93.5 Yao et al. [2003,PR] - 90.0 92.2 95.6 prediction accuracies (%)

  44. Summary • Nonlinear Feature Extraction based on Kernel Methods - Nonlinear Discriminant Analysis - Kernel Orthogonal Centroid Method (KOC) • A comparison of Generalized Linear and Nonlinear Discriminant Analysis Algorithms • Application to Fingerprint Classification

  45. Dimension reduction - feature transformation : linear combination of original features • Feature selection : select a part of original features gene expression microarray data anaysis -- gene selection • Visualization of high dimensional data • Visual data mining

  46. Core point detection • θi,j:dominant direction on the neighborhood centered at (i, j) • Measure consistency of local dominant directions | ΣΣi,j=-1,0,1[cos(2θi,j), sin(2θi,j)] | :distance from the starting point to finishing point • the lowest value -> Core point

  47. References • L.Chen et al., A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recognition, 33:1713-1726, 2000 • P.Howland et al., Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition, SIMAX, 25(1):165-179, 2003 • H.Yu and J.Yang, A direct LDA algorithm for high-dimensional data-with application to face recognition, Pattern Recognition, 34:2067-2070, 2001 • J.Yang and J.-Y.Yang, Why can LDA be performed in PCA transformed space?, Pattern Recognition, 36:563-566, 2003 • H. Park et al., Lower dimensional representation of text data based on centroids and least squares, BIT Numerical Mathematics, 43(2):1-22, 2003 • S. Mika et al., Fisher discriminant analysis with kernels, Neural networks for signal processing IX, J.Larsen and S.Douglas, pp.41-48, IEEE, 1999 • B. Scholkopf et al., Nonlinear component analysis as a kernel eigenvalue problem, Neural computation, 10:1299-1319, 1998 • G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural computation, 12:2385-2404, 2000 • V. Roth and V. Steinhage, Nonlinear discriminant analysis using a kernel functions, Advances in neural information processing functions, 12:568-574, 2000 ..

  48. S.A. Billings and K.L. Lee, Nonlinear fisher discriminant analysis using a minimum squared error cost function and the orthogonal least squares algorithm, Neural networks, 15(2):263-270, 2002 • C.H. Park and H. Park, Nonlinear discriminant analysis based on generalized singular value decomposition, SIMAX, 27-1, pp. 98-102, 2005 • A.K.Jain et al., A multichannel approach to fingerprint classification, IEEE transactions on Pattern Analysis and Machine Intelligence, 21(4):348-359,1999 • Y.Yao et al., Combining flat and structural representations for fingerprint classifiaction with recursive neural networks and support vector machines, Pattern recognition, 36(2):397-406,2003 • C.H.Park and H.Park, Nonlinear feature extraction based on cetroids and kernel functions, Pattern recognition, 37(4):801-810 • C.H.Park and H.Park, A Comparison of Generalized LDA algorithms for undersampled problems, Pattern Recognition, to appear • C.H.Park and H.Park, Fingerprint classification using fast fourier transform and nonlinear discriminant analysis, Pattern recognition, 2006

More Related