A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION

A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION

A Flowchart of GASVM

A Flowchart of GASVM • The overall hybrid method consists of two main components: • GA and • SVM classifier. • The GA selects the subsets of features and then the SVM classifier evaluates the subsets during a classification process. • The result of the classification is used for the fitness value of GA. • where accuracy(x) is the leave one out cross validation (LOOCV) accuracy of the classifier with the features subset selection which represented by x.

GASVM for Genes Selection and Classification

Chromosome Representation in GASVM • Let n be the total number of genes available for representing the data to be classified. • Hence, the chromosome is represented by binary vector of dimension n.

Chromosome Representation in GASVM • A chromosome = a solution or a gene subset. • If bit is 1,gene is selected. If bit is 0,gene is unselected. An example of chromosome representation in GASVM for gene selection.

Investigation of GASVM Limitation • It demonstrated an exponential nature of subsets that exist as the number of features (genes) increases -> NP-complete

Drawback of GASVM • GASVM - search space is too large due to high dimensional data • complexity of search space • low accuracy • high number of selected genes

Proposed Solution N/2 N • Correlations between number of subset y and number of selected features x from total of features n.

Chromosome representation in GASVM-II An example of chromosome representation in GASVM-II for genes selection.

A Flowchart of GASVM-II

GASVM-II for Genes Selection and Classification

Drawback of GASVM-II • GASVM-II • selected gene manually. • overfitting - High LOOCV accuracy, but low test accuracy – inconsistent result

Case Study: GASVM Versus GASVM for Gene Selection • Leukemia Dataset • The first benchmark gene expression microarray dataset is Leukemia Cancer. The data contains examples of human acute leukemia, originally analyzed by Golub et al. • The dataset containing expression levels of 7129 genes can be obtained at http://www.genome.wi.mit.edu/mpr. • The bone marrow or blood samples were taken from 72 patients, 25 with acute myeloid leukemia (AML) and 47 with acute lymphoblastic leukemia (ALL). • The training data consists of 38 samples and the remaining 34 samples were used as testing data.

Colon Dataset • The second benchmark dataset is Colon Cancer. The data contains expression levels of 2000 genes from 40 tumor and 22 normal colon tissues. • The dataset only has 62 samples for training data, originally analyzed by Alon et al.12 and downloaded from http://microarray.princeton.edu/oncology/affydata/index.html.

Experimental environment • Parameters of the GASVM and GASVM-II for the Leukemia and Colon Cancer datasets

Results analysis and discussions • Classification accuracies for different gene subsets using GASVM-II method

Results analysis and discussions • Benchmark of GASVM, GASVM-II and SVM performances and current best of previous methods on Leukemia Cancer dataset

Results analysis and discussions • Benchmark of GASVM, GASVM-II and SVM performances and current best of previous methods on the Colon Cancer dataset

Biological plausibility for informative genes in datasets • List of the same informative genes in the Leukemia Cancer dataset produced by GASVM-II and previous works

Biological plausibility for informative genes in datasets • List of the same informative genes in the Leukemia Colon dataset produced by GASVM-II and previous works

A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION

A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION

Presentation Transcript

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines