1 / 40

Case-Based Reasoning with Bayesian Model Averaging: Improved Survival Analysis on Microarray Data

This talk explores how Case-Based Reasoning (CBR) can contribute to bioinformatics research, focusing on microarray data analysis. It introduces the Bayesian Model Averaging (BMA) approach and its application in feature selection for microarray data classification and prediction. The talk also discusses the challenges and advantages of using CBR and BMA in this context.

lsylvester
Download Presentation

Case-Based Reasoning with Bayesian Model Averaging: Improved Survival Analysis on Microarray Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Based Reasoning with Bayesian Model Averaging: an Improved Method for Survival Analysis on Microarray Data University of Washington Institute of Technology Tacoma, WA, USA Isabelle Bichindaritz Ecole des Hautes Etudes en Santé Publique Département Infobiostat Rennes, France

  2. Purpose of this Talk • Once upon a time … • There was biology (~1800), and • There were computers (~1920) • Of their common interests was born bioinformatics (~1979) … • Question: • How can CBR contribute to bioinformatics research ? • An example to microarray data analysis ICCBR '10

  3. ICCBR '10

  4. NCBI, 2004 ICCBR '10

  5. Bioinformatics Challenges • Frequent tasks in bioinformatics • Similarity search in genetic sequences • Microarray data analysis • Macromolecule shape prediction • Evolutionary tree construction • Gene regulatory network mining ICCBR '10

  6. Bioinformatics Challenges • Microarray data analysis • Microarrays are made from a collection of purified DNA’s. A drop of each type of DNA in solution is placed onto a specially-prepared glass microscope slide by an arraying machine. • Please note that … • … the human genome contains about 30,000 genes. • … a microarray can contain thousands or tens of thousands relatively short nucleotides of known sequences. ICCBR '10

  7. Bioinformatics Challenges • The end product of a comparative hybridization experiment is a scanned array image. ICCBR '10

  8. Bioinformatics Challenges ICCBR '10

  9. Bioinformatics Challenges • Microarray applications • Determine relative DNA levels associated with huge number of known and predicted genes in a single experiment. • The most attractive application of microarrays is in the study of differential gene expression in disease. • The up– or down-regulation of gene activity can either be the cause of the pathophysiology or the result of the disease. • Accurate measurement of every single gene is assessed. • Sensitivity: very high – detect the presence of one transcript in one-tenth of a cell. ICCBR '10

  10. Bioinformatics Challenges • Data mining challenges • Volume of data (Giga bytes, number of features) • Characteristics of data (specific constraints) • Domain specific knowledge (expert interpretation) ICCBR '10

  11. BMA-CBR System ICCBR '10

  12. BMA-CBR System • BMA-CBR system performs feature selection through BMA before using CBR for microarray data classification and prediction (survival analysis) • Introduction and motivation of variable selection • What is Bayesian Model Averaging (BMA)? • One approach: the iterative BMA algorithm • Application 1: Chronic Myeloid Leukemia (CML) • Application 2: Survivalanalysis • Presentation of CBR ICCBR '10

  13. Bayesian Model Averaging • Feature selection • Used to select a subset of relevant features for building robust learning models in machine learning. • Often used in supervised learning. • Select relevant features from the training set (for which class labels are known). • Apply the selected features in the test set. ICCBR '10

  14. Bayesian Model Averaging • Feature selection • A minimal set of relevant genes for future prediction or assay development ICCBR '10

  15. Bayesian Model Averaging • Typical variable selection methods – one variable at a time • Examples: • T-test • Between group to within group sum of squares (BSS/ WSS) [Dudoit et al. 2001] ICCBR '10

  16. Bayesian Model Averaging • Multivariate gene selection • Our goal: consider multiple genes • Simultaneously to exploit the interdependence between genes to reduce # relevant genes ICCBR '10

  17. Bayesian Model Averaging • Bayesian Model Averaging (BMA) [Raftery 1995], [Hoeting et. al. 1999] • A multivariate variable selection technique. • Typical model selection approaches select a model and then proceed as if the selected model has generated the data --> overconfident inferences • Advantages of BMA: • Fewer selected genes • Can be generalized to any number of classes • Posterior probabilities for selected genes and selected models ICCBR '10

  18. Bayesian Model Averaging • BMA • Average over predictions from several models • What do we need? • Prediction with a given model k --> logistic regression • How to choose a set of “good” models? --> variable selection ICCBR '10

  19. Bayesian Model Averaging • What models to average over? • All possible models --> way too many!! • Eg. 2^30~1 billion, 2^50~10^15 etc… • The BMA solution: 1. “leaps and bounds” [Furnival and Wilson 1974] : when #variables (genes) <= 30, we can efficiently produce a reduced set of good models (branch and bound). 2. Cut down the # models? Discard models that are much less likely than the best model. ICCBR '10

  20. Bayesian Model Averaging • Iterative BMA algorithm [Yeung, Bumgarner, Raftery 2005] • Pre-processing step: Rank genes using BSS/WSS ratio. • Initial step: • Repeat until all genes are processed: • Output: selected genes and models with their posterior probabilities ICCBR '10

  21. Bayesian Model Averaging • Application 1: Classification of progression of chronic myeloid leukemia (CML) • Motivation: New Candidates for Prognostic studies in CML ICCBR '10

  22. Bayesian Model Averaging • Progression of CML • CML usually presents in chronic phase (CP), but in the absence of effective therapy, CP CML invariably transforms to accelerated phase (AP) disease, and then to an acute leukemia, blast crisis (BC). • BC is highly resistant to treatment, and all treatments are more successful when administered during CP. • Imatinib is most effective in early CP patients with excellent survival (86% at 7 years). • Currently there are limited clinical markers and no molecular tests that can predict the “clock” of CML progression for individual patients at the time of CP diagnosis, making it difficult to adapt therapy to the risk level of each patient. ICCBR '10

  23. Bayesian Model Averaging • Why predictors for CML progression? ICCBR '10

  24. Bayesian Model Averaging • Identification of CML progression biomarkers ICCBR '10

  25. Bayesian Model Averaging • Genes associated with CML progression ICCBR '10

  26. Bayesian Model Averaging • BMA selected genes using microarray data • Selected 6 genes over 21 models • Repeat CV 100 times • Average Brier Score = 0.21 • Averagepredictionaccuracy = 99.17% ICCBR '10

  27. Bayesian Model Averaging • PCR data: CP-early vs CP-late ICCBR '10

  28. Bayesian Model Averaging • Summary: CML data • BMA applied to a microarray data consisting of patient samples in different phases of CML identified 6 signature genes (ART4, DDX47, IGSF2,LTB4R, SCARB1, SLC25A3). • Results validated the gene signature using quantitative PCR: 6-gene signature is highly predictive of CP-early vsCP-late. • Whatisnext? • To identify biologically meaningful biomarkers for CML progression and response to therapy. • Biomarkers that are functionally related (connected in an underlying network) to known reference genes. ICCBR '10

  29. Bayesian Model Averaging • Application 2: Survival analysis ICCBR '10

  30. Bayesian Model Averaging • Results: Breast cancer data ICCBR '10

  31. Bayesian Model Averaging • Results: Breast cancer data - Annest, Bumgarner, Raftery, Yeung. BMC Bioinformatics 2009 ICCBR '10

  32. CBR • Classification task • Similarity measure • Weights provided by BMA for selected features ICCBR '10

  33. CBR • Classification task • Choose the class for which the average similar score is highest ICCBR '10

  34. CBR • Survival analysis task • Similarity measure • Weights provided by BMA for selected features ICCBR '10

  35. CBR • Survival analysis task • Choose the class for which the average similar score is highest ICCBR '10

  36. Evaluation / Classification ICCBR '10

  37. Evaluation / Prediction ICCBR '10

  38. Conclusion • The combination of BMA and CBR provides excellent classification and prediction results. • It provides promising results for the application of CBR to bioinformatics tasks and data. ICCBR '10

  39. Conclusion • Future developments • Refine risk classes into more than two risk groups. • Refine CBR algorithm. • Test on additional datasets. • Provide automatic interpretation of the classification / prediction both for gene selection and for case-based reasoning. ICCBR '10

  40. Thank you for your attention !!! Questions ? ICCBR '10

More Related