630 likes | 926 Views
Deciphering Gene Regulatory Networks by in silico approaches . Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania. Transcriptional Regulation. Transcription Start Site. Interactions and Modules. TF-DNA binding. Overview.
E N D
Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania
Transcriptional Regulation Transcription Start Site Interactions and Modules TF-DNA binding
Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications
Overview Identification Representation Discovery (motif-discovery) Search Ambiguity/Redundancy Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications
Binding site identification SELEX ATACGGT ATACCGT ATCGGCA AAAGGCT CONSENSUS A T A S G S T ChIP-chip Deletion/Mutation Specificity WEIGHT MATRIX +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96
Binding site search • TFs often bind to short and degenerate DNA sequences, leading to false positives • Evolutionary conservation (phylogenetic footprinting/shadowing) can help reduce the false positives • About half of the functional binding sites are not conserved • A combination of evolutionary conservation and binding site score can detects ~70% of the experimentally verified binding sites at a “False Positive” rate of 1/50kb per PWM (Levy and Hannenhalli, Mammalian Genome, 2002) TRANSFAC/JASPAR PWM Human genome Multi-species conservation
Non-Independence of binding site positions • Bacteriophage Mnt prefers binding to C, instead of wild-type A, at position 16 when wild-type C at position 17 is changed to other bases. (Man and Stormo, 2001, NAR) • Barash, Elidan, Freidman, Kaplan, 2003, RECOMB • Osada, Zaslavsky and Singh, 2004, Bioinformatics
Binding site representation ATACGGT ATACCGT CGCGGCA CGAGCCT WEIGHT MATRIX +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96 Assumption of positional independence ATACGGT ATACCGT CGCGGCA CGAGCCT A PSPA or Variable length Markov Model of binding sites is superior to the PWM model • For 95 JASPAR PWMs, PSPAM is better in 48 cases and worse in 6 cases at significant level of 0.05.
Conservation patterns in cis-elements reveal inter-position dependence Human ……….ACCGTGT……….ACCTTCT………….. Chimp ……….AGCGTGT……….ACCTTGT………….. Mouse ……….TCGGTGA……….TGCTTCT………….. Rat ……….CCCGTGA……….AGCTTGT………….. Dog ……….TCGGTCT……….ACCCTCT………….. G G G G C C C G C G
3 N (binding sites) 2 1 X Y X Y X Y X Y Pr(X) = probability of X using standard tree Markov process Pr(X|Y) = probability of X dependent on corresponding Y branches Compensatory Mutation SXY = fraction of sites for which Pr(X | Y) > Pr(X) Scope = |X – Y|
SX,X+1 for 79 vertebrate PWMs from JASPAR Control-1 Randomly select i, j pairs. Control-2 Randomly select i and then select j=i+s. Control-3 constructs PWM Mr with same width as M by randomly sampling columns from the 79 vertebrate PWMs in JASPAR. Control-4 Construct PWM Mr from M by randomly shuffling the compositions at each column (position).
SX,X+s decreases with increasing scope s. However it remains significantly greater than the respective control-4 up to scope = 6
Functional relevance of positions with compensatory mutation
Evans, Donahue, Hannenhalli, RECOMB-Comparative Genomics 2006
Binding site Ambiguity/Redundancy • Several transcription factors have distinct PWMs • Several distinct transcription factors have very similar PWMs ACCGTGTTT ACCGACTTT ACCGTGAAT ACCGTGTTT TCCGTGTTT TCAGTGTTT TCTGTGTTT TCGGTGTTT PWM1 PWM PWM2
Enhancing Positional Weight Matrices using Mixture models A mixture model allowing an arbitrary number of base PWM Given mixture the probability of observing sequence Xi = (Xi1,…, Xin) is Use EM algorithm to estimate subclasses We use k=2 base class PWMs (due to lack of data and lack of knowledge of appropriate number of classes) Hannenhalli and Wang, Bioinformatics, 2005
Sequence conservation of binding sites using Mixture model Based on 64 Vertebrate TF entries in JASPAR database 48 39 23
Subclass Dissimilarity vs Prediction Improvement Less dissimilar More dissimilar
39 36 30 23 15 13 64 57 44 32 20 16 Relative entropy between two base PWMs
Expression Coherence of target genes using mixture model EC of a set of genes is the fraction of gene-pairs whose expressions across several tissues/conditions are “very” similar PWM1 PWM2 Is the intra-class EC higher than inter-class EC? • In 44 of the 55 (80%) cases, the average expression coherence within subclass-PWM targets was higher than expression coherence of across subclass targets. • In all but one cases (98%) at least one of the two subclass PWMs had a coherence score higher than the cross coherence score. Hannenhalli and Wang, Bioinformatics, 2005
LEU3 Dataset [Liu et al., 2002] • Free energy of binding available for 46 observed binding sites of LEU3 [Liu et al., 2002] • The two clusters from the EM algorithm have significantly different binding energies.
Bi-clustering based modeling Vertical Partitioning Vertical partitioning ACCGTCTCAA ACCGTGTGAA AGCGTGCCCT ACGGTGCCCA TGGCCGCCGA TCGCACTCTT TGCCCCTGCT TGGCCCTCTT ATACGGT ATACCGT CGCGGCA CGAGCCT III I IV Horizontal Partitioning ACCGTGTTT ACCGACTTT ACCGTGAAT ACCGTGTTT TCCGTGTTT TCAGTGTTT TCTGTGTTT TCGGTGTTT II V Horizontal partitioning
Context-dependent binding specificity X Y X Z X
Binding site Ambiguity/Redundancy • Several transcription factors have distinct PWMs • Several distinct transcription factors have very similar PWMs
32 Class +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96 +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96 80 Family 117 Subfamily 1034 factors
Once upon a time a transcription factor gene was duplicated DNA Binding Domain Interaction Domain Promoter Conserved DBD Divergent nDBD Redundant paralogs Divergent Expression Divergent Promoter
Hypothesis: Homologous TF-pairs with similar DBD have diverged in expression. Control: Homologous nonTF-pairs Homologous TF-pairs with dissimilar DBD D(X,Y) = |EX – EY| Ti T158 T1 TF X TF Y
416 homologous TF-pairs (BLAST E-value <= E-10) 125 with similar binding (p-value <= 0.02) TFs with similar binding are more similar overall. Thus a greater expression divergence is surprising. In thyroid tissue the hypothesis holds (Mann-Whitney p-value = 0.00156)
In Human, 416 homologous TFs, 125 with similar binding In a total of 158 samples (Novartis) In Yeast, 219 homologous TFs, 35 with similar binding In a total of 57 samples (Spellman)
Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications
Transcription Factor cooperation/interaction Expression Coherence Pilpel et al. (2001). Nat Genet, Banerjee and Zhang (2003) NAR Positional Coherence Hannenhalli and Levy (2002). NAR. Interaction-dependent binding
Interaction-dependent binding ChIP-chip Set of gene promoters bound by F DNA binding motif M of F Transcription Factor F Can M discriminate between P and B? Bound promoters (P) Unbound promoters (B) The answer is NO for a large fraction of transcription factors Perhaps binding of F depends (synergistic or antagonistic) on other motifs
PWM based occupancy probability PWM based occupancy probability Binding probability (ChIP) Interaction coefficient • The ChIP-chip data for a majority of TFs is better explained using interaction-dependent binding. • Almost all of the Yeast cell cycle interactions were detected at 10% prediction rate • When applied to genome-wide CREB binding in rat, 15 of the 18 detected interactions have varying degree of support. • Wang, Jensen, Hannenhalli RECOMB-Regulation 2005
Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications
Co-regulated genes have common binding sites in their promoters Apoptosis Pathway 68 TFs BCL2-antagonist(BAD) 37 TFs in common B-cell CLL/lymphoma 2(BCL2) 89 TFs AP-2, CREB, E2F, cMyc, NF-Kappa-b, c-ETS, Egr-1 etc. 374 Hypergeometric p-val = E-11 68 37 89
Interacting proteins have greater similarity in their promoter regions Hannenhalli and Levy (2003). Mamm Genome
Genes TFs Transcriptional module discovery TFs Singular Value Decomposition 1 0 0 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 1 Genes Clique enumeration in bipartite graphs Cluster of genes and discriminating TF Distance Matrix K-means Clustering
TF Tissue Tissue Gene Tissue-Specific Transcriptional Module Tissue specificityby expression level[Schug et al 2005] Binding prediction Transcriptional-Module specific to a tissue type Everett, Wang, Hannenhalli, ISMB 2006
Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications
Transcriptional Regulation in Cardiac Myocytes Frey N, Olson EN. Annu Rev Physiol. 2003;65:45-79.
Expression profiling in advanced heart failure • Large tissue bank from Temple and Penn • Failing explanted hearts (n=173) • Non-failing hearts from unused donors (n=16) • Each hybridized with an HU133A (n=189) • Conservative analysis: RMA (bioconductor), SAM ~3000 dysregulated genes in advanced human HF with FDR < 5%. Is there any evidence that specific transcription factors are directing these changes?
Differentially expressed Genes (G) Score(x) = freq(x) in G / freq(x) in B Statistical Significance is computed using 1000 random sampling of genes from background set Background Set (B)
Transcription Factors enriched in differentially up-regulated genes
What about early events? The differentially upregulated genes have a greater number (32) of enriched TFs compared to downregulated genes (6). The ischemic and idiopathic cases are consistent Validation of GATA, MEF2, NKx, NFAT transcription factors in human heart failure Potential role for FOX factors and IRF Mice with infarcts and sham operated controls sacrificed at varying times after surgery (1, 4, 8, 24 hrs, 8 wks) Analysis of differentially co-regulated gene clusters reveal consistent set of transcription factors.
FOX factor Summary • FOX targets change substantially in advanced human HF and in early HF in mice. • FOX factors are present in human heart at physiologic levels: FOXP1, P4, C1, C2, J2 • FOXP1 is localized to nuclei of human cardiac myocytes. • Do FOX factors mediate cardiac hypertrophy? Hannenhalli et al. Circulation, 2006
Gene Regulation in Learning and Memory Naïve (N) Conditioned Stimulus only (CS) Fear Conditioned (FC) Hippocampus Amygdala Keeley et al. Memory and Learning, 2006
Immediate Early Gene Expression is Regulated by Many Transcription Factors http://web1.tch.harvard.edu/research/greenberg/oldsite/Pathways.html
50 Most Significantly Regulated Genes were Used for Further Analysis Hippocampus Amygdala