1 / 28

A gene expression analysis system for medical diagnosis

A gene expression analysis system for medical diagnosis. D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas University of Athens Dept. of Informatics and Telecommuncations. Objectives. A system to support medical diagnosis using molecular level information

lilli
Download Presentation

A gene expression analysis system for medical diagnosis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas University of Athens Dept. of Informatics and Telecommuncations

  2. Objectives • A system to support medical diagnosis using molecular level information • Efficient classification of pathological conditions into multiple classes • A user friendly interface for physicians and biologists

  3. DNA Microarrays Microscope glasses Thousands of spots Spot cDNA part

  4. DNA Microarrays Gene expression level (feature)

  5. DNA Microarrays Gene expression vector (feature vector)

  6. DNA Microarrays Gene expression matrix (data set)

  7. Gene expression analysis tools • Image processing & analysis for microarray spot detection • Visualization & clustering for discovery of unknown classes of pathological conditions • Gene ranking for identification of differentially expressed marker genes • Supervised classification of gene expression vectors into known classes

  8. Gene expression analysis tools • GeneClust Do et al, 2000 • dChip Li & Wong, 2001 • Clusfavor Peterson, 2002 • Genesis Sturn et al, 2002 • Snomad Collantuoni et al, 2002 • Base Saal et al, 2002 • TM4 Suite Saeed et al, 2003 • RankGene Yang et al, 2003 • Excavator Xu et al, 2003 • KnowledgeEditor Toyoda & Konagaya, 2003 • ArrayNorm Pieler et al, 2004

  9. Today’s challenge • None of the existent tools takes into account the usability profile of a physician or a biologist • Such tools could hardly be used in everyday medical practice

  10. Supervised approaches • Most known supervised approaches have been applied to classification of gene expression vectors • Linear discriminant analysis • k-nearest neighbors • Parzen windows • Decision trees • Neural networks, etc. • Support Vector Machines (Brown et al, 2000; Furey et al, 2000; Ryu & Cho, 2000; Dudoit et al, 2002; Lu & Han, 2003; Aliferis et al, 2003)

  11. Support Vector Machines • Robust binary classifiers • Not easily affected by the dimensionality of the feature vectors • SVM methods for classification into multiple classes • One vs one • One vs all • Directed Acyclic Graph (DAG) • Weston & Watkins • Cramer & Singer (Weston & Watkins, 1999; Platt, 2000; Yeang et al, 2001; Cramer & Singer, 2001; Hsu & Lin, 2002)

  12. About multiclass SVM classifiers • They all lead to comparable results • They utilize a common, constant set of genes as input in each SVM node • They assume that the various pathological conditions correspond to separable clusters in the same gene space (Hsu et al, 2002; Lee et al, 2003; Statnikov et al, 2004)

  13. The proposed approach • We consider the fact that • Only a small subset of genes is differentially expressed for each type or subtype of a pathological condition • We propose • The combination of SVMs in a cascading architecture that embodies gene selection in its structure

  14. Cascading architecture Diagnostic Unit Pre-processing Unit Classifies input vector x into ω1, ω2,… ωΝ

  15. Cascading architecture Diagnostic Unit Pre-processing Unit Poor quality cDNA targets generate missing values (Trovanskaya et al, 2001)

  16. Cascading architecture Diagnostic Unit Pre-processing Unit Normalization facilitates comparability of samples (Zhang & Shmulevich, 2002)

  17. Cascading architecture Diagnostic Unit Pre-processing Unit • A subset of genes is selected by ranking for each block • Three ranking criteria are available

  18. Gene ranking criteria

  19. Cascading architecture The classification module Cj is autonomously trained using a subset Xj of the available training samples

  20. Cascading architecture A standard binary SVM classifier implements each classification module

  21. Model selection • The best architecture is determined by leave one out cross validation • Selection bias is minimized • Gene selection and parameter tuning take place on the training samples during each iteration of the leave one out (Ambroise & McLahian, 2002; Varma & Simon, 2006)

  22. Graphical User Interface

  23. Results • Prostate cancer data • 112 samples (patients) • Classes • 62 primary prostate tumors • 41 normal prostate specimens • 9 pelvic lymph node metastases • 44016 gene expressions per sample • (Lapointe et al, 2004)

  24. Results Minimum error 6.3% using 1 input gene

  25. Results • Colon cancer dataset (Alon et al, 1999) • Minimum classification error 9.7% • Lung cancer dataset (Bhattacharjee et al, 2001) • Minimum classification error 1.5%

  26. Conclusions • We presented a user friendly system that implements a cascading SVM architecture • It aims to the classification of gene expression data into known classes • The cascading architecture automatically tunes its parameters and determines its optimal configuration • In most cases leads to a diagnostic accuracy that exceeds 90%

  27. Conclusions • Its performance is usually better than one-vs-one SVM combination method • It utilizes N-1 binary SVM classifiers, whereas one-vs-one utilizes N(N-1)/2 • It could be used in everyday clinical practice • Within our future perspectives is the adoption of incremental learning approaches

  28. Thank you

More Related