70 likes | 209 Views
Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation. Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang. Responsibilities.
E N D
Microarrays: A Comparison of Classification and Feature Selection Algorithms for Interpretation Lynn H. Lee, Hiram Shaish, Eric A. Smith, Min C. Zhang
Responsibilities • Lynn Lee studied and described the classification methods, and performed all the experiments that use KNN as the classification method. • Hiram Shaish studied and described the background of microarrays, and compiled and analyzed the experimental results. • Eric Smith programmed, tested, and described the data parser. • Min Zhang studied and described the feature selection methods, and performed all the experiments that use SVM as the classification method. • Each team member contributed to the writing and editing process.
The Parser • Written in Perl • 100 lines of code, plus 90 lines of comments and blank lines • 2 phases: • Parse SOFT headers to generate some ARFF headers • Parse SOFT matrix, generating the rest of the ARFF headers and the ARFF matrix
The Data • 75 samples • 22215 genes • 3 classes: smokers, non-smokers, those who quit smoking • Easy phenotype to verify • Caveats?
Feature Selection • Info Gain • Chi Square • 1, 2, 5, 10, 20, 50, 100, 200, 300, 400, 500 features selected • Results: almost identical features selected for both algorithms • Reflects ‘partitionability’ of data set
Classification • ECOC • KNN • Paired the 2 classification algorithms with 2 feature selection algorithms • Results: -KNN ‘out-classifies’ ECOC with less features (70% with 1) -Highest accuracy as a function of feature selection algorithm
Classification • Accuracy does not increase beyond a maximum potential, regardless of feature # • Suggests an inherent characteristic of the data