180 likes | 445 Views
Classification of microarray gene expression data using support vector machines ( SVM ). A presentation on the topic For CIS 595 Bioinformatics course by Despina Kontos Spring 2003 – Temple University. Overview… . What are microarray gene expression data ?
E N D
Classification of microarraygene expression data using support vector machines (SVM) A presentation on the topic For CIS 595 Bioinformatics course by Despina Kontos Spring 2003 – Temple University
Overview… • What are microarraygene expression data? • What are Support Vectors Machines? • How can we use them to utilize these gene expression data? CLASSIFICATION EXPERIMENTS !!!
Microarrays… • What are they anyway??Gene expression levels on tissue or cell for varying environment conditions
Microarrays… • From a machine learning point of view… Tissue classification Function classification
Support Vector Machines (SVM) • Linear classifiers • Attempt to avoid overfitting by finding the optimal hyperplane that separates the data HOW??? By maximizing the Margin.. Support Vectors Introduced by V.Vapnic and co-workers in 1995
Support Vector Machines (SVM) • And what about datasets that are not linearly separable?? Map the data into higher dimensional space and make linear classification there (theorem!!)
Support Vector Machines (SVM) Some mathematical formulations… We need ONLYthe support vectors for computations!! We can use KERNELfunctions to avoidcomputations in higher dimensional space
Some experiments… M.P.S.Brown, W.N.Grundy, D.Lin, N.Cristianini, C.W.Sugnet, T.S.Furey, M.Ares Jr. and D.Haussler,“Knowledge-based analysis of microarray gene expression data by using support vector machines", Proc.Natl.Acad.Sci.USA,97, 1, pp.262-267, 2000. Classification of gene function from microarray data using SVM 2,476 genes 79 DNA hybridization experiments 6 gene function families SVM providedoptimal classification!!! F1 F2 F3 ... Function Classification
More experiments… T.furey, N.Cristianini, N. Duffy, D. Bednarski, M. Schummer and D Haussler, “Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expressioin Data”, Bioinformatics, 2000. Gene expression data on tissue 97,802 DNA clones 31 tissue samples Cancer ovarian Normal ovarian Normal non-ovarian Cancer Not Cancer ... ... Cancer Tissue Classification
Conclusions • Microarray gene expression data are a very useful format of biological information (..expensive to obtain!!) • SVMnew and very promising classification apprach • A lot of research still to be done on Biologicalinformation processing using techniques developed in fields such as Machine Learning, Data Mining, etc..
Additional resources.. Osuna, R. Freund, and F. Girosi. Support vector machines: Training and applications. In A.I. Memo. MIT A.I. Lab, 1996 N. Cristianini. ICML'01 tutorial, 2001 http://www.kernel-machines.org/ http://research.microsoft.com/users/jplatt/svm.html http://www.isis.ecs.soton.ac.uk/resources/svminfo/