GMDH-based feature ranking and selection for improved classification of medical data

GMDH-based feature ranking and selection for improved classification of medical data Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : R.E. Abdel-Aal 2005. BI.456-468

Outline • Motivation • Objective • Method • Material • Results • Conclusions

Motivation • Accuracy is very important in classifiers used for medical application.

Objective • Improved classification performance of medical data.

2.Selection and stopping Method Square error An increasing rmin：model becoming complex, 1.Overfitting the estimation data 2.Performing poorly on the new selection data. • First stage – ranked feature • GMDH algorithm 1. representation Iteration z1 r12 x1 r22 x2 y rmin x3 x4 rm(m-1)2 Zm(m-1)/2

2.Selection and stopping 2.Selection and stopping Method Avoid overfitting Using CPM control 1.CPM>1,simpler model that are less accurate but generalize. 2.CPM<1,complex model, overfit training data and decrease actual prediction performance. • First stage – ranked feature • AIM abductive network • First stage – ranked feature • AIM abductive network 1.repesentation 1.repesentation

Method • Second stage – selected feature • Selected k, performance on an evaluation dataset would first improve and starts to deteriorate due to the model overfitting the training data. • A compact m-feature subset can be obtained by taking the first m features starting from top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, selected 6-features is {2,6,7,8,1,5}. • The optimum subset of features is determined by repeatedly forming subset of k features, starting from the top of the ranking list.Ex: ranking list{2,6,7,8,1,5,3,4,9}, {2,6,7,8,1,5},{6,7,8,1,5,3}…中選出最佳的subset

Material • Two standard medical diagnosis datasets from the UCI Machine Learning Repository were used for this study. • Wisconsin breast cancer dataset • Cleveland heart disease dataset 70% 30%

Results • The breast cancer data • Ranking for the feature set{2,6,7,8,1,5,3,4,9} Feature selected Feature ranked 7 5 9

Results Rough set data analysis of dataset Overfitting Overfitting 3% 3%

Results Standard error↓ Standard error↓ AUC↑ 3% 3%

Results • The heart disease data • Ranking for the feature set{13,12,9,3,2,10,8,4,5,11,1,7,6} Feature selected Feature ranked

Results Overfitting 6% 3%

Results AUC↑ AUC↑ Requires less than half the number of input features Models using the reduced feature set will be more efficient.

Conclusions • Improved implementation and performance of classifiers for medical screening and diagnosis. • Feature reduction is particularly useful with high-dimensional data characterized by a large number of feature and a relatively few training example.

My opinion • Advantage: Preprocess • Disadvantage: • Apply：Clustering, Association Rule……

GMDH-based feature ranking and selection for improved classification of medical data

GMDH-based feature ranking and selection for improved classification of medical data

Presentation Transcript

Feature Selection in Nonlinear Kernel Classification

Feature Selection of DNA Micrroarray Data

Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal Department of Civil Engineerin

Feature Selection in Nonlinear Kernel Classification

Supervised Classification of Feature-based Instances

Classification and Feature Selection for Craniosynostosis

Classification and Novel Class Detection of Feature Based Stream Data.

Semi-Supervised Feature Selection for Graph Classification

Feature Grouping-Based Fuzzy-Rough Feature Selection

Multi-Label Feature Selection for Graph Classification

Feature Selection in Classification and R Packages

Data Mining Feature Selection

Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation

Feature selection, SVM-based classification and application to mass spectrometry data analysis

Dual Active Feature and Sample Selection for Graph Classification

Feature Selection Stability Analysis for Classification Using Microarray Data

A Survey on Classification of Feature Selection Strategies

Fuzzy Entropy based feature selection for classification of hyperspectral data Mahesh Pal

Classification and Feature Selection Algorithms for Multi-class CGH data