90 likes | 106 Views
Parameter Optimized Vertical, Nearest Neighbor- Vote and Boundary Based Classification. Outline Introduction Background Our Approach Experimental Results Conclusions. Introduction. Computer Aided Detection (CAD): Interesting data mining applications Typical Medical Image Data Sets are
E N D
Parameter OptimizedVertical, Nearest Neighbor- Vote and Boundary Based Classification Outline Introduction Background Our Approach Experimental Results Conclusions
Introduction • Computer Aided Detection (CAD): • Interesting data mining applications • Typical Medical Image Data Sets are • Large • Extremely unbalanced between + & - classes • Large number of “irrelevant” features • Noisy Labels based on human decisions that take only a few features in to consideration • due to the human mind limitation: ~ 5 ± 2 contexts. • Major Requirement: • Extremely high performance thresholds for clinical acceptance (High Negative Prediction Values). Parameter Optimized Vertical Classification
Introduction (Cont.) • Pulmonary Embolism (PE): 650,000 cases per year in US (root cause can be anything that stresses cardiovascular system). • Condition that occurs when • thromboses (blood clots), usually from the legs, • move thru ever enlargening vein system, to and through heart • into the ever narrowing pulmonary arterial system, • where they lodge and block lung arteries. • Highly lethal condition • symptoms are often detected in an emergency room setting • diagnosis of true positives has to be followed by swift treatment • treatment usually involves a blood thinner (e.g., warfarin) • False negatives are very bad • symptoms can resemble brain aneurysm where the immediate treatment is opposite to that for an embolism, but giving warfarin to a patient with a brain aneurysm will cause death!) • Holy Grail of PE CAD is fast, accurate detection of negatives (High Negative Predictive Value or NPV ) Parameter Optimized Vertical Classification
Introduction (Cont.) • Several hundred classification attributes are automatically generated from a large number of radiological or magnetic images (e.g., Computed Tomography Angiography (CTA) images). • Objective of a PE CAD system: • Identify the sick patients from the available descriptive features with high accuracy (especially NPV accuracy). • We applied: • Parameter Optimized Vertical, Nearest Neighbor-Vote and Boundary Based Classification • The approach was successfully used in ACM 2006 KDD Cup data mining competition (won the NPV task with a score that was twice as high as the nearest competitor). Parameter Optimized Vertical Classification
KDD 2006 PE Data 67 CTA Cases (patients) 4424 PE candidates (lung spots) 116 Features generated from Computed Tomography Angiography Parameter Optimized Vertical Classification
Our Approach • OurAttribute Selection (AS) step was followed by a combination of Gaussian Nearest Neighbor (GNN) and Local Class Boundary (LCB) based classification. • Classification parameters were optimized with Genetic Algorithm. • Training Set structured vertically into Predicate-trees or P-trees1 (losslessly compressed, data-mining-ready vertical structures). • attribute relevance analysis was done, • nearest neighbor sets were created, • class boundary analysis was done. • With compressed P-trees, processing can be done in compressed form (no need to uncompress, process and then compress again). Parameter Optimized Vertical Classification
P-tree* Vertical Data Structure • Predicate-trees (P-trees) • Lossless , Compressed, Data-mining-ready • Successfully used in KNN, ARM, Bayesian Classification, SVM... • P-tree processing speed allowed for multiple rounds of attribute relevance analysis, including: • Information gain based rounds, • statistics based rounds, • heuristic rounds. * Predicate Tree (Ptree) technology is patented by North Dakota State University (William Perrizo, primary inventor of record); patent number 6,941,303 issued September 6, 2005. Parameter Optimized Vertical Classification
Method Overview Horizontal Training Data Vertical Training Data (P-trees) Genetic Algorithm Attribute Relevance Analysis Param. Fitness Relevant Attributes Gaussian Near Neighbor Local Class Boundary Combination Classifier Test Data Final Results Optimized Classifier Parameter Optimized Vertical Classification
Results : Quality • KDD 2006 Data Set Best submission - 2006 KDD Cup NPV task by a factor of 2. Parameter Optimized Vertical Classification