1 / 24

Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,

Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West

Download Presentation

Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Diagnostics based on Large Scale Gene Expression Profiles using MCMC Rainer Spang, Max Planck Institute for Molecular Genetics, Berlin Harry Zuzan, Carrie Blanchette, Erich Huang, Holly Dressman, Jeff Marks, Joe Nevins, Mike West Duke Medical Center & Duke University

  2. Estrogen Receptor Status • 7000 genes • 49 breast tumors • 25ER+ • 24ER-

  3. Tumor – Chip - 7000 Numbers

  4. Given Wanted 89% The probability that the tumor is ER+ 7000 Numbers

  5. 7000 Numbers Are More Numbers Than We Need Predict ER status based on the expression levels of super-genes

  6. Singular Value Decomposition Singular values Loadings Data Expression levels of super genes, orthogonal matrix

  7. Probit Model Class of tumor i Distribution Function of a Standard Normal Regression weight for super gene i Expression Level of super gene i

  8. Overfitting • Using only a small number of super genes is not robust at all • When using many (all) supergenes, the linear model can be easily saturated, i.e. we have several models that fit perfectly well • Consequence: For a new patient we find among these models some that support that she is ER+ and others that predict she is ER-

  9. Given the Few Profiles With Known Diagnosis: • The uncertainty on the right model is high • The variance of the model-weights is large • The likelihood landscape is flat • We need additional model assumptions to solve the problem

  10. Informative Priors Likelihood Prior Posterior

  11. If the Prior Is Chosen Badly: • We can not reproduce the diagnosis of the training profiles any more • We still can not identify the model • The diagnosis is driven mostly by the additional assumptions and not by the data

  12. The Prior Needs to Be designed in 49 Dimensions • Shape? • Center? • Orientation? • Not to narrow ... not to wide

  13. Shape multidimensional normal for simplicity

  14. Center Assumptions on the model correspond to assumptions on the diagnosis

  15. Orientation orthogonal super-genes !

  16. Not to Narrow ... Not to Wide Auto adjusting model Scales are hyper parameters with their own priors

  17. Prior given the hyper parameter Rescaling by singular values Hyper parameter Independent super genes Unbiased prior

  18. A prior for the hyper parameters • Conjugate prior • Flexibility for • Symmetric U-Shaped prior for k=2 or k=3

  19. Latent Variable Albert & Chip 1993

  20. MCMC - Gibbs Sampler - Sequential updates of conditional distributions All conditional posteriors can be calculated analytically West 2001, Albert & Chip 1993

  21. What are theadditional assumptionsthat came in by the prior? • The model can not be dominated by only a few super-genes ( genes! ) • The diagnosis is done based onglobal changes in the expression profiles influenced by many genes • The assumptions are neutral with respect to the individual diagnosis

  22. Which Genes Have Driven the Prediction ?

  23. Thank you!

More Related