230 likes | 250 Views
A Fast Iterative Algorithm for Fisher Discriminant using Heterogeneous Kernels. Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi Computer Aided Diagnosis, Siemens Medical Solutions. Outline . Linear Fisher’s Discriminant (LFD) Traditional Formulation Mathematical Programming Formulation
E N D
A Fast Iterative Algorithm for Fisher Discriminant usingHeterogeneous Kernels Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi Computer Aided Diagnosis, Siemens Medical Solutions
Outline • Linear Fisher’s Discriminant (LFD) • Traditional Formulation • Mathematical Programming Formulation • Kernel Fisher’s Discriminant • Automatic heterogeneous kernel selection Formulation • Automatic kernel selection KFD Algorithm • Numerical experience • Publicly available datasets • Real life CAD Colon Cancer detection dataset • Conclusions and Outlook
Linear Fisher’s Discriminant (LFD) Taken from: http://espresso.ee.sun.ac.za/~schwardt/
Such that Notation • Given m points in n dimensional space • Represented by an m-by-n matrix A • Membership of each data point (row of A) in class +1 or –1. • We want to find a separating hyperplane:
LFD Classical Formulation We want to find the projection that maximizes the quotient: Where, Which are the the between and within class scatter matrices is the mean of class and is an dimensional vector of ones.
LFD Mathematical programming formulation • The LFD problem can be also be formulated as a Quadratic programming program (QP) • Where • The variable is a positive constant introduced in (Mika et al., 2000) to address the problem of ill-conditioning of the estimated covariance matrices.
Kernel Fischer Discriminant (KFD) • From the KKT conditions for the LFD mathematical programming formulation we obtain the following relation: • Thus, we have: • Applying the “kernel trick”:
The nonlinear separating surface is given by: • Commonly used kernels: • Gaussian: • Polynomial: Kernel Functions • It is well known that kernels are very powerful but difficult to choose and tune for an specific classification task.
The set can be seen as as a predefined set of initial ``guesses" of the kernel matrix. • Note that S could contain very different kernel matrix models, e.g., linear, Gaussian, polynomial, all with different parameter values. Heterogeneous Kernels • Instead of using a predefined kernel a suitable kernel the problem of choosing a kernel can be incorporated in the optimization problem. • In this work we consider nonnegative linear combinations of kernels belonging to a given family of positive semidefinite kernels:
For capacity control we add an extra regularization term for the coefficients : Heterogeneous kernels KFD Formulation • Instead of pre-selecting and tuning the kernel we optimize the set of values in order to obtain a PSD linear combination of elements of S. • The problem becomes: • The above formulation is a convex optimization problem
Heterogeneous kernels KFD as a biconvex program (I). • The optimization can be seen as a biconvex program of the form • Where: and
Heterogeneous kernels KFD as a biconvex program (II). • When is fixed the problem becomes • With • Which is an unconstrained quadratic optimization problem that can be solved by solving a simple system of linear equations.
Heterogeneous kernels KFD as a biconvex program (III). • When is fixed the problem becomes • With • Which is a constrained quadratic optimization problem (QP) in only k variables.
Input For Calculate: Solve unconstrained convex QP: For Calculate: Solve second QP convex problem: Output Heterogeneous KFD algorithm
Heterogeneous KFD algorithm:Convergence • The Heterogeneous KFD algorithm can be seen as an Alternate Optimization problem (AO) (Fuzzy c-means clustering is another example of AO problems) • Our algorithm inherits the convergence properties and characteristics of (AO) problems. • Local q-linear convergence, in practice is very fast and it converges in a few iterations. • Can converge to a saddle point (local minimizer in a subset of the variables) but it is very unlikely to happen.
Siemens Colon CAD System • Colorectal cancer is the third most common cancer in both men and women. • Recent studies (Yee et al., 2003) have estimated that in 2003, nearly 150,000 cases of colon and rectal cancer would be diagnosed in the US, and more than 57,000 people would die from the disease, accounting for about 10% of all cancer deaths. • A polyp is an small tumor that projects from the inner walls of the intestine or rectum. • Early detection of polyps in the colon is critical because polyps can turn into cancerous tumors if they are not detected in the polyp stage. • dataset consisting of 300 candidates, 145 candidates are labeled as a polyp and 155 as non-polyps. Each candidate is represented by a vector of 14 features that have the most discriminating power according to a feature selection pre-processing stage.
Colon CAD CAD marker only
Numerical results: Colon CAD dataset • The standard KFD performed in an average time of 122.0 seconds over ten runs and an average test set correctness of 73.4 %. • The A-KFD performed in an average time of 41.21 seconds with an average test set correctness of 72.4 %. • paired t-test at 95% confidence indicates that there is no significant difference between both methods in this dataset.
Conclusions • We proposed a simple procedure for generating heterogeneous KFD classifier where the kernel model is defined to be a linear combination of members of a potentially larger pre-defined family of heterogeneous kernels • our proposed algorithm only requires solving: • a simple nonsingular system of linear equations of the size of the number of training points m • solving a quadratic programming problem that is usually very small since it depends on the predefined number of kernels on the kernel family (5 in our experiments). • Empirical results show that the proposed method compared to the standard KFD where the kernel is selected by a cross-validation tuning procedure, is several times faster with no significant impact on generalization performance.
Future Directions • Extension to regularized networks: SVM, LS-SVM,KFD • Generalized convergence analysis • Extension to transduction