A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns. M.A.Sc. Candidate:. Qianren (Tim) Xu. Supervisors:. Dr. M. Kamel Dr. M. M. A. Salama. Neural Networks. STFS. ROC analysis. Highlight.
Neural Networks STFS ROC analysis Highlight Proteomic Pattern Analysis for Prostate Cancer Detection Significance Test-Based Feature Selection (STFS): • STFS can be generally used for any problems of supervised pattern recognition • Very good performances have been obtained on several benchmark datasets, especially with a large number of features • Sensitivity 97.1%, Specificity 96.8% • Suggestion of mistaken label by prostatic biopsy
Outline of Part I Significance Test-Based Feature Selection (STFS) on Supervised Pattern Recognition • Introduction • Methodology • Experiment Results on Benchmark Datasets • Comparison with MIFS
Introduction Problems on Features Increasing computational complexity • Large number • Irrelevant • Noise • Correlation Reducing recognition rate
Mutual Information Feature Selection • One of most important heuristic feature selection methods, it can be very useful in any classification systems. • But estimation of the mutual information is difficult: • Large number of features and the large number of classes • Continuous data
Problems on Feature Selection Methods Two key issues: • Computational complexity • Optimal deficiency
Proposed Method Criterion of Feature Selection Significance of feature Significant difference = X Independence Pattern separabilityon individual candidate features Noncorrelation betweencandidate feature and already-selected features
Measurement of Pattern Separability of Individual Features Statistical Significant Difference Continuous data with normal distribution Continuous data with non-normal distribution or rank data Categorical data Chi-squaretest Two classes More than two classes Two classes More than two classes t-test ANOVA Mann-Whitneytest Kruskal-Wallistest
Independence Independence Continuous data with normal distribution Continuous data with non-normal distribution or rank data Categorical data Pearson contingency coefficient Spearman rank correlation Pearson correlation
Selecting Procedure MSDI: Maximum Significant Differenceand Independence Algorithm MIC: Monotonically IncreasingCurve Strategy
Maximum Significant Difference and Independence (MSDI) Algorithm Compute the significance difference (sd) of every initial features Select the feature with maximum sd as the first feature Computer the independence level (ind) between every candidate feature and the already-selected feature(s) Select the feature with maximum feature significance (sf = sd x ind) as the new feature
Monotonically Increasing Curve (MIC) Strategy Performance Curve The feature subset selected by MSDI 1 Plot performance curve 0.8 Rate of recognition Delete the features that have “no good” contribution to the increasing of recognition 0.6 0.4 0 10 20 30 Number of features Until the curve is monotonically increasing
Example I: Handwritten Digit Recognition • 32-by-32 bitmaps are divided into 8X8=64 blocks • The pixels in each block is counted • Thus 8x8 matrix is generated, that is 64 features
MSDI MIFS(β=0.2) MIFS(β=0.4) MIFS(β=0.6) MIFS(β=0.8) MIFS(β=1.0) MSDI: Maximum Significant Difference and Independence MIFS: Mutual Information Feature Selector Performance Curve 1 0.9 Battiti’s MIFS: 0.8 Rate of recognition 0.7 It is need to determined β 0.6 Random ranking 0.5 0.4 0 10 20 30 40 50 60 Number of features
Computational Complexity • Selecting 15 features from the 64 original feature set • MSDI: 24 seconds • Battiti’s MIFS: 1110 seconds (5 vales of β are searched in the range of 0-1)
Example II: Handwritten digit recognition The 649 features that distribute over the following six feature sets: • 76 Fourier coefficients of the character shapes, • 216 profile correlations, • 64 Karhunen-Love coefficients, • 240 pixel averages in 2 x 3 windows, • 47 Zernike moments, • 6 morphological features.
MSDI + MIC Random ranking MSDI: Maximum Significant difference and independence MIC: Monotonically Increasing Curve Performance Curve 1 0.8 Rate of recognition MSDI 0.6 0.4 0.2 0 10 20 30 40 50 Number of features
MSDI: Maximum Significant Difference and Independence MIFS: Mutual Information Feature Selector Comparison with MIFS MSDI is much better on large number of features 1 0.9 MSDI 0.8 MIFS (β=0.2) Rate of recognition MIFS (β=0.5) 0.7 0.6 0.5 MIFS is better on small number of features 0.4 0 10 20 30 40 50 Number of features
Summary on Comparing MSDI with MIFS • MSDI is much more computational effective • MIFS need to calculate the pdfs • The computational effective criterion (Battiti’s MIFS) still need to determine β • MSDI only involves the simple statistical calculation • MSDI can select more optimal feature subset from a large number of feature, because it is based on relevant statistical models • MIFS is more suitable on small volume of data and small feature subset
Outline of Part II Mass Spectrometry-Based Proteomic Pattern Analysis for Detection of Prostate Cancer • Problem Statement • Methods • Feature • Classification • optimization • Results and Discussion
Problem Statement 15154 points (features) • Very large number of features • Electronic and chemical noise • Biological variability of human disease • Little knowledge in the proteomic mass spectrum
The system of Proteomic Pattern Analysis STFS: Significance Test-Based Feature Selection PNN: Probabilistic Neural Network RBFNN: Radial Basis Function Neural Network Training dataset (initial features > 104) Most significant featuresselected by STFS Optimization of the size of featuresubset and the parameters of classifierby minimizing ROC distance RBFNN / PNN learning Trained neural classifier Mature classifier
Feature Selection: STFS Significanceof feature Significantdifference MSDI = x Independence StudentTest Pearsoncorrelation MIC STFS: Significance Test-Based Feature Selection MSDI: Maximum Significant Difference and Independence Algorithm MIC: Monotonically Increasing Curve Strategy
Classification: PNN / RBFNN RBFNN is a modifiedfour-layer structure PNN is a standard structure with four layers x y yd 1 y(1) x S1 x1 2 Pool 1 x 3 x2 y(2) x n xn Pool 2 S2 PNN: Probabilistic Neural Network RBFNN: Radial Basis Function Neural Network
Optimization: ROC Distance 1 dROC a b True positive rate(sensitivity) Minimizing the ROC distanceto optimize: - Feature subset numbers m - Gaussian spread σ - RBFNN pattern decision weight λ 0 0 False positive rate(1-specificity) 1 ROC: Receiver Operating Characteristic
Pattern recognizedby RBFNN Non-Cancer Cancer 70 60 True negative 96.8% False negative 2.9% 50 60 40 30 50 Non-Cancer 20 40 10 30 Labelled byBiopsies 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 20 10 True positive 97.1% False positive 3.2% Cancer 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Pattern Distribution Cut-point
The possible causes onthe unrecognizable samples • The algorithm of the classifier is not able to recognize all the samples • The proteomics is not able to provide enough information • Prostatic biopsies mistakenly label the cancer
70 60 50 60 40 30 50 20 40 10 30 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 20 10 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Possibility of Mistaken Diagnosis of Prostatic Biopsy • Biopsy has limited sensitivity and specificity • Proteomic classifier has very high sensitivity and specificity correlated with biopsy • The results of proteomic classifier are not exactly the same as biopsy • All unrecognizable sample are outliers True non-cancer False non-cancer False cancer True cancer Cut-point
Summary (1) Significance Test-Based Feature Selection (STFS): • STFS selects features by maximum significant difference and independence (MSDI), it aims to determine minimum possible feature subset to achieve maximum recognition rate • Feature significance (selecting criterion ) is estimated based on the optimal statistical models in accordance with the properties of the data • Advantages: • Computationally effective • Optimality
Summary (2) Proteomic Pattern Analysis for Detection of Prostate Cancer • The system consists of three parts: feature selection by STFS, classification by PNN/RBFNN, optimization and evaluation by minimum ROC distance • Sensitivity 97.1%, Specificity 96.8%, it would be an asset to early and accurately detect prostate, and to prevent a large number of aging men from undergoing unnecessary prostatic biopsies • Suggestion of mistaken label by prostatic biopsy through pattern analysis may lead to a novel direction in the diagnostic research of prostate cancer
