1 / 33

From Sequence to Expression: A Probabilistic Framework

From Sequence to Expression: A Probabilistic Framework. Joint work with:. Eran Segal (Stanford). Nir Friedman (Hebrew U.) Daphne Koller (Stanford). Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.). G1. S. M. G2. Understanding Cellular Processes.

infinity
Download Presentation

From Sequence to Expression: A Probabilistic Framework

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Sequence to Expression:A Probabilistic Framework Joint work with: Eran Segal(Stanford) Nir Friedman (Hebrew U.) Daphne Koller (Stanford) Yoseph Barash(Hebrew U.) Itamar Simon (Whitehead Inst.)

  2. G1 S M G2 Understanding Cellular Processes • Complex biological processes (e.g. cell cycle) • Coordination of multiple events • Each event requires different modules Can we recover the regulatory circuits that control such processes?

  3. Coding Region CTAGTAGATATCGATCAG mRNA Promoter Region Protein Gene Structure

  4. Gene 1 Gene 2 Sequence Motif AGACTTCAGA Gene 3 Gene 4 Gene 5 Gene Regulation A mRNA

  5. - Transcription Factor Swi5 Gene 1 Gene 2 A Gene 3 Gene 4 A Gene 5 Gene Regulation A mRNA

  6. Swi5 Gene 1 A Swi5 Gene 2 A Gene 3 Swi5 Gene 4 A More mRNA(higher expression) Gene 5 Activated A Gene Regulation Swi5 mRNA

  7. Swi5 Gene 1 A B Swi5 Gene 2 A AGTTGA Gene 3 B Swi5 Gene 4 A B Gene 5 B Activated A Gene Regulation Swi5 mRNA

  8. Ndd1 Swi5 Gene 1 A B Swi5 Gene 2 A Ndd1 Gene 3 B Swi5 Ndd1 Gene 4 A B Ndd1 Gene 5 B + Activated B A Gene Regulation mRNA

  9. G1 G2 t2 Motif t1 Motif AGCTAGCTGAGACTGCACACTGATCGAGCCCCACCATAGCTTCGGACTGCGCTATATAGACTGCAGCTAGTAGAGCTCTGCTAGAGCTCTATGACTGCCGATTGCGGGGCGTCTGAGCTCTTTGCTCTTGACTGCCGCTTATTGATATTATCTCTCTTGCTCGTGACTGCTTTATTGTGGGGGGGACTGCTGATTATGCTGCTCATAGGAGAGACTGCGAGAGTCGTCGTAGGACTGCGTCGTCGTGATGATGCTGCTGATCGATCGGACTGCCTAGCTAGTAGATCGATGTGACTGCAGAAGAGAGAGGGTTTTTTCGCGCCGCCCCGCGCGACTGCTCGAGAGGAAGTATATATGACTGCGCGCGCCGCGCGCCGGACTGCAGCTGATGCATGCATGCTAGTAGACTGCCTAGTCAGCTGCGATCGACTCGTAGCATGCATCGACTGCAGTCGATCGATGCTAGTTATTGGACTGCGTAGTAGTGCGACTGCTCGTAGCTGTAG R(t2) R(t1) Goal ACTAGTGCTGA + CTATTATTGCA CTGATGCTAGC

  10. Model of Gene Regulation Probabilistic Relational Models (PRMs) Pfeffer and Koller (1998)Friedman et al (1999)Segal et al (2001) Sequence Promoter sequences Gene Experiment Regulation by transcription factors • Context • Cluster Expression measurements Expression

  11. Regulation to Expression Gene Experiment R(t2) R(t1) Exp. type Exp. cluster Level Expression R(t1) = yes t1 regulates geneR(t1) = no t1 does not regulate gene

  12. CPD R(t1)R(t2) Etype P(Level) P(Level) 0 0 I -0.7 1.2 0 1 II 0.8 0.6 … Level Level -0.7 0.8 Regulation to Expression Gene Experiment R(t2) R(t1) Exp. type Exp. cluster Level Expression

  13. Exp. type = G1 true false R(t1) = Yes R(t2)=yes • Gaussian decision tree • T1 only relevant in G1 • T2 only relevant in G2 . . . P(Level) true false true false Level 2 P(Level) P(Level) Level 0 Level 3 Modeling Context Specificity Gene Experiment R(t2) R(t1) Exp. type Exp. cluster Level Expression

  14. PSSM: • Background distribution • Motif distribution • Discriminative training where From Sequence to Regulation • Assumptions: • Binding site is of length k • Binding may occur at any k-mer • TF regulates gene if binding occurs anywhere

  15. Localization Assay • Localization data: measure TF binding to promoter of each gene (assign binding confidence) Simon et al (2001)

  16. Is Regulation Observed? • Not quite… • Localization is measured for specific conditions • Localization is measured for large DNA regions • Localization is noisy

  17. Localization Model Gene R(t1) L(t1) Observed • Localization p-value is noisy sensor of actual regulation • If regulation occurs, p-value likely to be low • If no regulation, p-value likely to be high

  18. Bayesian score • Heuristic search • Expectation Maximization • Discriminative training(conjugate gradient) Model Learning • Structure Learning: • Tree structure • Missing Data: • Experiment cluster • Regulation variables • Motif Model: • Parameter estimation

  19. Experimental Details + LocalizationData ACGCCTA Model Learning promoter … s1 sk Gene Experiment R(t2) R(t1) Exp. type L(t1) Exp. cluster L(t1) Level Expression

  20. Resulting Bayesian Network Exp. type2 Exp. type sk1 s11 Exp. cluster Exp. cluster R(t2)1 Level1,1 Level1,2 L(t2)1 R(t1)1 L(t1)1 s12 sk2 R(t2)2 Level2,1 Level2,2 L(t2)2 R(t1)2 L(t1)2 s13 sk3 R(t2)3 Level3,1 Level3,2 L(t2)3 R(t1)3 L(t1)3

  21. Model Learning: E-Step Exp. type2 Exp. type sk1 s11 Exp. cluster Exp. cluster R(t2)1 Level1,1 Level1,2 L(t2)1 R(t1)1 L(t1)1 s12 sk2 R(t2)2 Level2,1 Level2,2 L(t2)2 R(t1)2 L(t1)2 s13 sk3 R(t2)3 Level3,1 Level3,2 L(t2)3 R(t1)3 Loopy belief propagation L(t1)3

  22. Model Learning: M-Step Exp. type2 Exp. type sk1 s11 Exp. cluster Exp. cluster R(t2)1 Level1,1 Level1,2 L(t2)1 R(t1)1 L(t1)1 s12 sk2 R(t2)2 Level2,1 Level2,2 L(t2)2 R(t1)2 L(t1)2 s13 sk3 R(t2)3 ConjugateGradient Level3,1 Level3,2 L(t2)3 R(t1)3 Standard ML estimation L(t1)3

  23. Experimental Results Yeast • Cell Cycle expression data (Spellman et al) • Localization data for 9 TFs (Simon et al) • Yeast genome (promoters)

  24. Clustering genes -112.24 Generalization Gene log-likelihood -112.24 Experiment Gene R(t2) R(t1) Exp. Cluster Level Expression

  25. Clustering genes -112.24 Generalization Gene log-likelihood -121.48 • Localization -112.24 Experiment Gene L(t2) L(t1) Exp. type Level Expression

  26. Clustering genes -112.24 -121.48 • Localization • Localization + exp. cluster -103.76 L(t1) L(t3) Generalization Gene log-likelihood -112.24 Experiment Gene R(t2) R(t1) Exp. type Exp. Cluster Level Expression

  27. Clustering genes -112.24 -121.48 • Localization • Localization + exp. cluster -103.76 • + Sequence -94.59 promoter … s1 sk L(t1) L(t3) Generalization Gene log-likelihood -112.24 Experiment Gene R(t2) R(t1) Exp. type Exp. Cluster Level Expression

  28. Example: Genes regulated by Swi6, notby Mcm1 and not by Fkh2, exhibit unique expression pattern in phase G1in the cell cycle Gene functions: DNA repair [P 3e-09] DNA synthesis [P 7e-05] Generating Hypotheses

  29. Phase Swi5 regulated Swi5 expression Expression vs Regulation 1 0.5 Genes predicted to be regulated by Swi5 are probably real Swi5 targets 0 -0.5 -1 0 21 42 63 84 105 10 70 100 130 160 190 220 250 0 30 60 90 120 150 0 90 180 270 360 cdc15 cdc28 elu alpha

  30. Combinatorial Effects 1 Phase 0.5 Mcm1 & Ndd1 Mcm1 & Ace2 Mcm1 & Swi5 0 -0.5 -1 0 21 42 63 84 105 10 70 100 130 160 190 220 250 0 30 60 90 120 150 0 90 180 270 360 cdc15 cdc28 elu alpha

  31. Motifs Found • Ndd1 Simonet al. 17 Expanded set identified additional genes regulated by Ndd1 ExpandedSet 28 1 Remaining Genes

  32. Conclusions • Unified probabilistic model explaining gene regulation using sequence, localization and expression data • Models complex interactions between regulators • Discriminative model maximizing P(Expr. | Seq.) • Sequence data helps explain expression patterns

More Related