1 / 32

Qianren (Tim) Xu

A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns. M.A.Sc. Candidate:. Qianren (Tim) Xu. Supervisors:. Dr. M. Kamel Dr. M. M. A. Salama. Neural Networks. STFS. ROC analysis. Highlight.

howe
Download Presentation

Qianren (Tim) Xu

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu Supervisors: Dr. M. Kamel Dr. M. M. A. Salama

  2. Neural Networks STFS ROC analysis Highlight Proteomic Pattern Analysis for Prostate Cancer Detection Significance Test-Based Feature Selection (STFS): • STFS can be generally used for any problems of supervised pattern recognition • Very good performances have been obtained on several benchmark datasets, especially with a large number of features • Sensitivity 97.1%, Specificity 96.8% • Suggestion of mistaken label by prostatic biopsy

  3. Outline of Part I Significance Test-Based Feature Selection (STFS) on Supervised Pattern Recognition • Introduction • Methodology • Experiment Results on Benchmark Datasets • Comparison with MIFS

  4. Introduction Problems on Features Increasing computational complexity • Large number • Irrelevant • Noise • Correlation Reducing recognition rate

  5. Mutual Information Feature Selection • One of most important heuristic feature selection methods, it can be very useful in any classification systems. • But estimation of the mutual information is difficult: • Large number of features and the large number of classes • Continuous data

  6. Problems on Feature Selection Methods Two key issues: • Computational complexity • Optimal deficiency

  7. Proposed Method Criterion of Feature Selection Significance of feature Significant difference = X Independence Pattern separabilityon individual candidate features Noncorrelation betweencandidate feature and already-selected features

  8. Measurement of Pattern Separability of Individual Features Statistical Significant Difference Continuous data with normal distribution Continuous data with non-normal distribution or rank data Categorical data Chi-squaretest Two classes More than two classes Two classes More than two classes t-test ANOVA Mann-Whitneytest Kruskal-Wallistest

  9. Independence Independence Continuous data with normal distribution Continuous data with non-normal distribution or rank data Categorical data Pearson contingency coefficient Spearman rank correlation Pearson correlation

  10. Selecting Procedure MSDI: Maximum Significant Differenceand Independence Algorithm MIC: Monotonically IncreasingCurve Strategy

  11. Maximum Significant Difference and Independence (MSDI) Algorithm Compute the significance difference (sd) of every initial features Select the feature with maximum sd as the first feature Computer the independence level (ind) between every candidate feature and the already-selected feature(s) Select the feature with maximum feature significance (sf = sd x ind) as the new feature

  12. Monotonically Increasing Curve (MIC) Strategy Performance Curve The feature subset selected by MSDI 1 Plot performance curve 0.8 Rate of recognition Delete the features that have “no good” contribution to the increasing of recognition 0.6 0.4 0 10 20 30 Number of features Until the curve is monotonically increasing

  13. Example I: Handwritten Digit Recognition • 32-by-32 bitmaps are divided into 8X8=64 blocks • The pixels in each block is counted • Thus 8x8 matrix is generated, that is 64 features

  14. MSDI MIFS(β=0.2) MIFS(β=0.4) MIFS(β=0.6) MIFS(β=0.8) MIFS(β=1.0) MSDI: Maximum Significant Difference and Independence MIFS: Mutual Information Feature Selector Performance Curve 1 0.9 Battiti’s MIFS: 0.8 Rate of recognition 0.7 It is need to determined β 0.6 Random ranking 0.5 0.4 0 10 20 30 40 50 60 Number of features

  15. Computational Complexity • Selecting 15 features from the 64 original feature set • MSDI: 24 seconds • Battiti’s MIFS: 1110 seconds (5 vales of β are searched in the range of 0-1)

  16. Example II: Handwritten digit recognition The 649 features that distribute over the following six feature sets: • 76 Fourier coefficients of the character shapes, • 216 profile correlations, • 64 Karhunen-Love coefficients, • 240 pixel averages in 2 x 3 windows, • 47 Zernike moments, • 6 morphological features.

  17. MSDI + MIC Random ranking MSDI: Maximum Significant difference and independence MIC: Monotonically Increasing Curve Performance Curve 1 0.8 Rate of recognition MSDI 0.6 0.4 0.2 0 10 20 30 40 50 Number of features

  18. MSDI: Maximum Significant Difference and Independence MIFS: Mutual Information Feature Selector Comparison with MIFS MSDI is much better on large number of features 1 0.9 MSDI 0.8 MIFS (β=0.2) Rate of recognition MIFS (β=0.5) 0.7 0.6 0.5 MIFS is better on small number of features 0.4 0 10 20 30 40 50 Number of features

  19. Summary on Comparing MSDI with MIFS • MSDI is much more computational effective • MIFS need to calculate the pdfs • The computational effective criterion (Battiti’s MIFS) still need to determine β • MSDI only involves the simple statistical calculation • MSDI can select more optimal feature subset from a large number of feature, because it is based on relevant statistical models • MIFS is more suitable on small volume of data and small feature subset

  20. Outline of Part II Mass Spectrometry-Based Proteomic Pattern Analysis for Detection of Prostate Cancer • Problem Statement • Methods • Feature • Classification • optimization • Results and Discussion

  21. Problem Statement 15154 points (features) • Very large number of features • Electronic and chemical noise • Biological variability of human disease • Little knowledge in the proteomic mass spectrum

  22. The system of Proteomic Pattern Analysis STFS: Significance Test-Based Feature Selection PNN: Probabilistic Neural Network RBFNN: Radial Basis Function Neural Network Training dataset (initial features > 104) Most significant featuresselected by STFS Optimization of the size of featuresubset and the parameters of classifierby minimizing ROC distance RBFNN / PNN learning Trained neural classifier Mature classifier

  23. Feature Selection: STFS Significanceof feature Significantdifference MSDI = x Independence StudentTest Pearsoncorrelation MIC STFS: Significance Test-Based Feature Selection MSDI: Maximum Significant Difference and Independence Algorithm MIC: Monotonically Increasing Curve Strategy

  24. Classification: PNN / RBFNN RBFNN is a modifiedfour-layer structure PNN is a standard structure with four layers x y yd 1 y(1) x S1 x1 2 Pool 1 x 3 x2 y(2) x n xn Pool 2 S2 PNN: Probabilistic Neural Network RBFNN: Radial Basis Function Neural Network

  25. Optimization: ROC Distance 1 dROC a b True positive rate(sensitivity) Minimizing the ROC distanceto optimize: - Feature subset numbers m - Gaussian spread σ - RBFNN pattern decision weight λ 0 0 False positive rate(1-specificity) 1 ROC: Receiver Operating Characteristic

  26. Results: Sensitivity and Specificity

  27. Pattern recognizedby RBFNN Non-Cancer Cancer 70 60 True negative 96.8% False negative 2.9% 50 60 40 30 50 Non-Cancer 20 40 10 30 Labelled byBiopsies 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 20 10 True positive 97.1% False positive 3.2% Cancer 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Pattern Distribution Cut-point

  28. The possible causes onthe unrecognizable samples • The algorithm of the classifier is not able to recognize all the samples • The proteomics is not able to provide enough information • Prostatic biopsies mistakenly label the cancer

  29. 70 60 50 60 40 30 50 20 40 10 30 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 20 10 0 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Possibility of Mistaken Diagnosis of Prostatic Biopsy • Biopsy has limited sensitivity and specificity • Proteomic classifier has very high sensitivity and specificity correlated with biopsy • The results of proteomic classifier are not exactly the same as biopsy • All unrecognizable sample are outliers True non-cancer False non-cancer False cancer True cancer Cut-point

  30. Summary (1) Significance Test-Based Feature Selection (STFS): • STFS selects features by maximum significant difference and independence (MSDI), it aims to determine minimum possible feature subset to achieve maximum recognition rate • Feature significance (selecting criterion ) is estimated based on the optimal statistical models in accordance with the properties of the data • Advantages: • Computationally effective • Optimality

  31. Summary (2) Proteomic Pattern Analysis for Detection of Prostate Cancer • The system consists of three parts: feature selection by STFS, classification by PNN/RBFNN, optimization and evaluation by minimum ROC distance • Sensitivity 97.1%, Specificity 96.8%, it would be an asset to early and accurately detect prostate, and to prevent a large number of aging men from undergoing unnecessary prostatic biopsies • Suggestion of mistaken label by prostatic biopsy through pattern analysis may lead to a novel direction in the diagnostic research of prostate cancer

  32. Thanks for your time Questions?

More Related