350 likes | 532 Views
Reg. ACGTGC. Outline. Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work?. State 1. State 2. State 3. Repressor. Regulated gene. Activator. Activator. Activator. Activator. Repressor. Activator. Repressor.
E N D
Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?
State 1 State 2 State 3 Repressor Regulated gene Activator Activator Activator Activator Repressor Activator Repressor Repressor Regulators Regulators DNA Microarray DNA Microarray Regulated gene Regulated gene Regulated gene Gene Regulation: Simple Example
Regulation program Module genes Regulation Tree Activator? Activator expression false true true Repressor? Repressor expression false true Genes in the same module share the same regulation program State 1 State 2 State 3
false true HAP4 true false CMK1 Module Networks Modules Goal: Discover regulatory modules and their regulators • Module genes: set of genes that are similarly controlled • Regulation program: expression as function of regulators
P(Level | Module, Regulators) Module HAP4 Expression level of Regulator1 in experiment CMK1 1 What module does gene “g” belong to? 0 Regulator1 0 0 BMH1 Regulator2 GIC2 2 Regulator3 0 0 0 Expression level in each module is a function of expression of regulators Level Module Network Probabilistic Model Experiment Module Gene Expression
Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?
Goal: Find gene module assignments and tree structures that maximize P(M|D) Hard Gene module assignments Regulator1 Tree structures Regulator2 Regulator3 HAP4 CMK1 Level 0 0 0 Learning Problem • Genes: 5000-10000 • Regulators: ~500 Experiment Module Gene Expression
clustering Gene module assignment Learn regulation programs Relearn gene assignments to modules Regulatory modules HAP4 CMK1 Learning Algorithm Overview
Experiments sorted in original order Regulator HAP4 CMK1 SIP4 HAP4 Hap4 expression Experiments sorted by Hap4 expression log P(M|D) log P(DHAP4 |HAP4 ,HAP4 ) + log P(DHAP4 |HAP4 ,HAP4 ) + log P(HAP4,HAP4, HAP4 ,HAP4) log P(M|D) log P(DSIP4 |SIP4 ,SIP4 ) + log P(DSIP4 |SIP4 ,SIP4 ) + log P(SIP4,SIP4, SIP4 ,SIP4) log P(M|D) log P(DHAP4 |HAP4 ,HAP4 ) + log P(DCMK1 |CMK1 ,CMK1 ) + log P(DCMK1 |CMK1 ,CMK1 ) + … Learning Regulation Programs Experiments Module genes log P(M|D) log P(D|,) + log P(,) Module genes
-128 -129 Bayesian score (avg. per gene) -130 Algorithm iterations -131 0 5 10 15 20 50 40 Gene module assignment changes (% from total) 30 20 10 Algorithm iterations 0 0 5 10 15 20 Learning Algorithm Performance Significant improvements across learning iterations Many genes (50%) change module assignment in learning
Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?
Yeast Stress Data • Genes • Selected 2355 that showed activity • Experiments (173) • Diverse environmental stress conditions: heat shock, nitrogen depletion,…
Bayesian NetworkFriedman et al ’00Hartemink et al. ’01 Hap4 Expression level of each gene is a function of expression of regulators Mig1 Yap1 Cmk1 Ste12 Gic1 Fragment of learned Bayesian network • 2355 variables (genes) • 173 instances (experiments) Comparison to Bayesian Networks Problems • Robustness • Interpretability
Regulator1 Regulator2 Regulator3 Module Solutions • Robustness sharing parameters • Interpretability module-level model Level Comparison to Bayesian Networks Bayesian NetworkFriedman et al ’00Hartemink et al. ’01 Module NetworkSPRKF ’03 (UAI) Hap4 Mig1 Yap1 Cmk1 Ste12 Gic1 Problems • Robustness • Interpretability
150 Test Data Log-Likelihood(gain per instance) 100 50 Learn which parameters are shared(by learning which genes are in the same module) Bayesian Network performance 0 -50 Number of modules -100 -150 0 100 200 300 400 500 Comparison to Bayesian Networks Problems • Robustness • Interpretability Solutions • Robustness sharing parameters • Interpretability module-level model
HAP4 CMK1 HAP4 CMK1 0 0 0 Regulator1 Regulator2 Regulator3 Module Biologically relevant? Level From Model to Regulatory Modules
Regulation program Module genes Respiration Module • Module genes functionally coherent? • Module genes known targets of predicted regulators? Predicted regulator Energy production (oxid. phos. 26/55 P<10-30) Hap4+Msn4 known to regulate module genes
Tpk1: • Regulation by non-TFs (Tpk1 is a catalytic unit of cAMP dependent protein kinase) • Module contains known Tpk1 targets (e.g. Tps1) • Tpk1-mediated STRE motif (50/64 genes; p<3x10-11) Energy, Osomlarity, & cAMP Signaling
45 40 35 30 25 Negative log p-value (module network) 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 Negative log p-value (standard clustering) EM: Biological Improvement
48 Inferred regulation Module (number) Experimentally tested regulator Regulator (transcription factor) Enriched cis-Regulatory Motif Regulation supported in literature Regulator (Signaling molecule) Not3 Gcn20 Bmh1 Ime4 Ypl230w Yap6 Gac1 Tpk2 Pph3 Gis1 Lsg1 Ppt1 Cmk1 Yer184c Tpk1 Kin82 Sip2 Xbp1 Msn4 Hap4 Gat1 36 47 39 26 17 14 25 9 11 8 31 5 16 30 42 18 13 15 41 33 10 4 3 2 1 N36 N30 N26 N18 N13 N14 N41 N11 HSF MIG1 CAT8 XBP1 HAC1 STRE GATA ADR1 GCR1 GCN4 MCM1 ABF_C HAP234 CBF1_B REPCAR DNA and RNAprocessing Energy andcAMP signaling Amino acidmetabolism nuclear
Are the module genes functionally coherent? Are some module genes known targets of the predicted regulators? Biological Evaluation Summary 46/50 Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses) 30/50 Known targets = direct biological experiments reported in the literature
Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?
HAP4 Ypl230w ? From Model to Detailed Predictions • Prediction: • Experiment: Regulator ‘X’ regulates process ‘Y’ Knock out ‘X’ and repeat experiment X
wild-type mutant 1334 regulated genes(312 expected by chance) Modules predicted to be regulated by Ypl230w >4x Regulated genes Does ‘X’ Regulate Predicted Genes? Experiment: knock out Ypl230w (stationary phase) Rank modules by regulated genes Ypl230w regulates computationally predicted genes Predicted modules
wild-type mutant wild-type mutant Does ‘X’ Regulate Predicted Genes? Ppt1 knockout(hypo-osmotic stress) Kin82 knockout (heat shock) Regulated genes(1014) Regulated genes(1034)
New yeast biology suggested • Ypl230w activates protein-folding, cell wall and ATP-binding genes • Ppt1 represses phosphate metabolism and rRNA processing • Kin82 activates energy and osmotic stress genes Wet Lab Experiments Summary 3/3 regulators regulate computationally predicted genes
Reg. ACGTGC Outline • Who regulates whom and when? • Model • Learning algorithm • Evaluation • Wet lab experiments • Perspective: why does it work?
Statistical methods can detect associations between regulators and their targets Why does it work? • Underlying assumption: Regulators are transcriptionally regulated Regulators are part of regulatory structures in which they are themselves regulated* * [Shen-Orr et al., ’02] find many such structures
Regulator Chain • Respiration module Phd1 Phd1 (TF) Activeproteinlevel Hap4 Targets Hap4 (TF) Hap4 mRNAexpressionlevel Targets Phd1 Cox4 Cox6 Atp17 Time • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets
Auto Regulation • Snf kinase regulated processes module Yap6 (TF) Vid24 Tor1 Gut2 • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets
Positive Signaling Loop • Sporulation and cAMP pathway module Sip2 (SM) Msn4 (TF) Vid24 Tor1 Gut2 • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets
Negative Signaling Loop • Energy and osmotic stress module Tpk1 (SM) Msn4 (TF) Nth1 Tps1 Glo1 • Black: regulators that cannot be detected • Red: correctly predicted regulator • Blue: targets
Some transcription factors and signal transduction molecules have a detectable expression signature Module Networks infers their regulatory relationships Why Does it Work? Feed-forward and feedback loops
Assignment • Download the yeast stress expression dataset • Download the list of transcription factor regulators • Randomly partition the dataset in a 5-fold cross validation scheme • For k=50: • Create a hard-clustering model (use code from earlier exercise). At each array, this model has a separate Gaussian distribution for each of the 50 values of the cluster variable • Use the assignment of genes to clusters that you learned in the hard-clustering, and for each cluster, learn a decision tree with at most: (1) one split (2) two splits (3) three splits • Note 1: allow only splits with >=5 arrays in each side of the split • Note 2: split question is whether the expression level of the transcription factor is greater than some value
Assignment Continued • Note 3: at each leaf of the resulting model, there is a single Gaussian distribution that is used for all arrays that map to that leaf • Compute the log-likelihood of the test data for each model (hard-clustering, and each of the three regulation models) • Plot the avg. and std. test log-likelihood for each model • For the model with two splits on each cluster, use the Gaussian distribution at each array to sample a new expression dataset with exactly the same number of genes and number of arrays. For each original gene and array, you sample from the Gaussian distribution associated with that gene and that array • Learn a model with two splits for each cluster • Plot the number of regulation tree splits that are identical between the model that sampled the data and the new model that you learned