Polyhedral Classifier for Target Detection A Case Study: Colorectal Cancer

Polyhedral Classifier for Target DetectionA Case Study: Colorectal Cancer Murat Dundar, Matthias Wolf, Sarang Lakare, Marcos Salganicoff, Vikas C. Raykar Siemens Medical Solutions, Inc. USA Malvern, PA 19355

Computer Aided Diagnosis (CAD) for Colon Cancer • Identify suspicious regions (candidates) • Extract features for each candidate • Classify candidates as a polyp or non-polyp

Multi-mode nature of CAD data • The only ground truth available is the location of the polyp. • All other candidates that are not pointing to a known polyp are pooled into the negative class. • Variation among the different negatives is large.

A CAD Example: Colorectal CancerPolyps vs. common false positives Stool Sessile polyp Noise Rectal tube Pedunculated polyp Fold

State-of-the-Art – Finite Mixture Models • Model class distribution by a mixture model, one mode for each subclass, then design a maximum a posteriori or maximum likelihood classifier • Too few positives, too many features with redundancy! Robust estimation of model parameters for positive class is very difficult, if not impractical

State-of-the-Art – Discriminative Techniques • Pool all negative candidates into a single class and learn a binary classifier, i.e. polyps vs. negatives • A kernel-based discriminative technique (SVM, RVM, KFD) can yield nonlinear decision boundaries suitable for classifying multi-mode data. • Too few positive candidates, too many features with redundancy! Data can be easily overfit by a nonlinear classifier

State-of-the-Art – One-Class Classifiers • Omits the negative class, learns a model with positive samples only. • Kernel-based and neural network implementation yield nonlinear decision boundaries suitable for classifying multi-mode data. • Like other nonlinear classifiers susceptible to overfitting

State-of-the-art in a Nutshell • Linear classifiers • less prone to overfitting • not enough capacity to deal with multi-mode data • Finite mixture models • Parameter estimation is an issue! • Discriminative & One-class Classifiers • good capacity • more prone to overfitting

A Viable solution • A series of linear classifiers one for each subclass of the negatives • More capacity than a linear classifier, yet less prone to overfitting than a nonlinear classifier • An unseen sample is classified as positive if all the classifier classifies it as positive

Training Multiple Linear Classifiers • Train each classifier independently: Negative subclass k vs. Positives, for k=1,…,K. • Inefficient! Potentially excessive penalization due to a misclassified positive sample

Proposed Approach • Optimize classifiers jointly • One classifier for each subclass of negative data • Objective function is penalized once due to a misclassified positive sample • Yields a polyhedral decision surface

A Toy Example

TP+ ξ ξ FP- Hyperplane Classifiers with Hinge Loss

Polyhedral Classifier with AND Framework If the hinge loss = 0, the example is correctly classified, If the hinge loss > 0, the example is mis-classified Let be the hinge loss of i-th example induced by the classifier k i-th Positive example: -- “AND” i-th Negative example:

Objective Function with the AND Framework Error on Negative Examples Error on Positive Examples Regularization to Control Complexity Convex Problem!

Incomplete Ground Truth for Subclasses • AND algorithm assumes the subclass membership is known for all samples. Not Realistic! • Annotate a small portion of the negatives • identify potential subclasses • pool training samples for each subgroup. • Three different types of samples in the training data • Positives • Negatives with known and unknown subclass membership

Objective Function with the AND-OR Framework Error on Negative Examples with known subclasses Error on Negative Examples with unknown subclasses, OR operation Error on Positive Examples AND operation Regularization to Control Complexity Not Convex!

Alternating Optimization Iterative Algorithm Each iteration contains K steps, and each step optimizes a single classifier At the k-th step, Fix all classifiers (α’s) but the classifier k Minimize J(α1,…, αk ,… αK) for optimalαk

Cascaded Design Candidates T1 T2 TK-1 TK …. TP 1 2 K F2 FK F1 rejected candidates

Cascade Design with Sparse Linear Classifiers • Setting P(k)=| k | yields K sparse classifiers, each with varying number of non-zero coefficients • Run-time order does not change the outcome • Start with the classifier that has the least number of nonzero coefficients • Classify the sample, if negative reject, if positive pass it to the next classifier that requires computation of least number of additional features. Continue until all K classifiers are run

Experiments – Automatic Polyp Detection Data 98 numerical image features are computed, out of 1249 negatives, 177 are annotated, 9 subclasses are identified

ROC plots

Run-time Performance 25 % gain in execution time over SVDD and RBF-SVM

Conclusions • Polyhedral classifier for multi-mode data • AND framework when subclass information is fully available • AND-OR framework when subclass information is partially available • Cascade design as a by-product to speed-up online execution Thank you! Questions and Comments

Polyhedral Classifier for Target Detection A Case Study: Colorectal Cancer