140 likes | 271 Views
Gist 2.3. John H. Phan MIBLab Summer Workshop June 28th, 2006. Overview. Gist 2.3 Tools Support Vector Machine (SVM) classification Kernel Principal Component Analysis (KPCA). Gist 2.3 Overview. Gist is a set of command line programs written in C Primary programs SVM and KPCA
E N D
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006
Overview • Gist 2.3 Tools • Support Vector Machine (SVM) classification • Kernel Principal Component Analysis (KPCA)
Gist 2.3 Overview • Gist is a set of command line programs written in C • Primary programs • SVM and KPCA • Auxiliary programs • Ranking and feature selection • Web interface for the SVM component
Support Vector Machines • Supervised classification method • Maximal margin hyperplane http://www.dtreg.com/svm.htm
Primary Gist Programs • gist-train-svm – train support vector machine • gist-classify – classify points with a trained support vector machine • gist-fast-classify – linear optimized classification • gist-kpca – kernel principal component analysis • gist-project – project points onto KPCA components
Auxiliary Gist Programs • gist-fselect – linear feature selection • gist-matrix – basic matrix manipulations • gist-score-svm – performance of gist-train-svm and gist-classify • gist-rfe – recursive feature elimination • gist-sigmoid – classification probabilities • gist2html – convert output to HTML • gist-kernel – create a square kernel matrix
gist-train-svm • Train a support vector machine • Input file is tab delimited but transposed • Output file contains 5 columns • Label, binary classification, SVM weights, predicted classification, discriminant value
gist-fselect – Feature Selection • Fisher Criterion Score • t-test • Welch t-test • Mann-Whitney • SAM (significance analysis of microarrays) • Threshold number of mis-classifications
gist-score-svm • Compute False and true positives on training and test sets • Compute area under the ROC curves for training and test sets
gist-rfe • Recursive feature elimination – SVM • Initialize the data to contain all features • Train an SVM on the data • Rank features according to SVM weights • Eliminate lower 50% of features • Repeat until 1 feature is left
Gist SVM Web Interface • SVM Training and Testing • Normalize data by mean centering or z-score • Adjust kernel settings (linear, polynomial, or radial basis) • Demo (http://svm.sdsc.edu/svm-intro.html)
Comparison to MAGMA MAGMA Gist (Web) • Normalizations • Row (gene) mean center • Row (gene) median center • Column mean center • Column median center • Row z-score • Column z-score • Quantile • Handles missing values • Normalizations • Column (sample) mean center • Column (sample) z-score
Comparison to MAGMA Classifiers SVM Fisher’s Discriminant SDF Data Representation Visualization of classifiers Database storage MAGMA Gist (Web) • Classifiers • SVM • Data Representation • Text files • HTML output
Comparison to MAGMA Ranking Methods Resubstitution Cross validation Bootstrap Bolstering MAGMA Gist (Web) • Ranking Methods • Fisher criterion • T-test • SAM • Mann-Whitney • Welch t-test