1 / 16

GMDH-based feature ranking and selection for improved classification of medical data

GMDH-based feature ranking and selection for improved classification of medical data. Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : R.E. Abdel-Aal. 2005. BI.456-468. Outline. Motivation Objective Method Material Results Conclusions. Motivation.

xenos
Download Presentation

GMDH-based feature ranking and selection for improved classification of medical data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GMDH-based feature ranking and selection for improved classification of medical data Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : R.E. Abdel-Aal 2005. BI.456-468

  2. Outline • Motivation • Objective • Method • Material • Results • Conclusions

  3. Motivation • Accuracy is very important in classifiers used for medical application.

  4. Objective • Improved classification performance of medical data.

  5. 2.Selection and stopping Method Square error An increasing rmin:model becoming complex, 1.Overfitting the estimation data 2.Performing poorly on the new selection data. • First stage – ranked feature • GMDH algorithm 1. representation Iteration z1 r12 x1 r22 x2 y rmin x3 x4 rm(m-1)2 Zm(m-1)/2

  6. 2.Selection and stopping 2.Selection and stopping Method Avoid overfitting Using CPM control 1.CPM>1,simpler model that are less accurate but generalize. 2.CPM<1,complex model, overfit training data and decrease actual prediction performance. • First stage – ranked feature • AIM abductive network • First stage – ranked feature • AIM abductive network 1.repesentation 1.repesentation

  7. Method • Second stage – selected feature • Selected k, performance on an evaluation dataset would first improve and starts to deteriorate due to the model overfitting the training data. • A compact m-feature subset can be obtained by taking the first m features starting from top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, selected 6-features is {2,6,7,8,1,5}. • The optimum subset of features is determined by repeatedly forming subset of k features, starting from the top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, {2,6,7,8,1,5},{6,7,8,1,5,3}…中選出最佳的subset

  8. Material • Two standard medical diagnosis datasets from the UCI Machine Learning Repository were used for this study. • Wisconsin breast cancer dataset • Cleveland heart disease dataset 70% 30%

  9. Results • The breast cancer data • Ranking for the feature set{2,6,7,8,1,5,3,4,9} Feature selected Feature ranked 7 5 9

  10. Results Rough set data analysis of dataset Overfitting Overfitting 3% 3%

  11. Results Standard error↓ Standard error↓ AUC↑ 3% 3%

  12. Results • The heart disease data • Ranking for the feature set{13,12,9,3,2,10,8,4,5,11,1,7,6} Feature selected Feature ranked

  13. Results Overfitting 6% 3%

  14. Results AUC↑ AUC↑ Requires less than half the number of input features Models using the reduced feature set will be more efficient.

  15. Conclusions • Improved implementation and performance of classifiers for medical screening and diagnosis. • Feature reduction is particularly useful with high-dimensional data characterized by a large number of feature and a relatively few training example.

  16. My opinion • Advantage: Preprocess • Disadvantage: • Apply:Clustering, Association Rule……

More Related