1 / 62

Deciphering Gene Regulatory Networks by in silico approaches

Deciphering Gene Regulatory Networks by in silico approaches . Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania. Transcriptional Regulation. Transcription Start Site. Interactions and Modules. TF-DNA binding. Overview.

africa
Download Presentation

Deciphering Gene Regulatory Networks by in silico approaches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deciphering Gene Regulatory Networks by in silico approaches Sridhar Hannenhalli Penn Center for Bioinformatics Department of Genetics University of Pennsylvania

  2. Transcriptional Regulation Transcription Start Site Interactions and Modules TF-DNA binding

  3. Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications

  4. Overview Identification Representation Discovery (motif-discovery) Search Ambiguity/Redundancy Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications

  5. Binding site identification SELEX ATACGGT ATACCGT ATCGGCA AAAGGCT CONSENSUS A T A S G S T ChIP-chip Deletion/Mutation Specificity WEIGHT MATRIX +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96

  6. Binding site search • TFs often bind to short and degenerate DNA sequences, leading to false positives • Evolutionary conservation (phylogenetic footprinting/shadowing) can help reduce the false positives • About half of the functional binding sites are not conserved • A combination of evolutionary conservation and binding site score can detects ~70% of the experimentally verified binding sites at a “False Positive” rate of 1/50kb per PWM (Levy and Hannenhalli, Mammalian Genome, 2002) TRANSFAC/JASPAR PWM Human genome Multi-species conservation

  7. Non-Independence of binding site positions • Bacteriophage Mnt prefers binding to C, instead of wild-type A, at position 16 when wild-type C at position 17 is changed to other bases. (Man and Stormo, 2001, NAR) • Barash, Elidan, Freidman, Kaplan, 2003, RECOMB • Osada, Zaslavsky and Singh, 2004, Bioinformatics

  8. Binding site representation ATACGGT ATACCGT CGCGGCA CGAGCCT WEIGHT MATRIX +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96 Assumption of positional independence ATACGGT ATACCGT CGCGGCA CGAGCCT A PSPA or Variable length Markov Model of binding sites is superior to the PWM model • For 95 JASPAR PWMs, PSPAM is better in 48 cases and worse in 6 cases at significant level of 0.05.

  9. Conservation patterns in cis-elements reveal inter-position dependence Human ……….ACCGTGT……….ACCTTCT………….. Chimp ……….AGCGTGT……….ACCTTGT………….. Mouse ……….TCGGTGA……….TGCTTCT………….. Rat ……….CCCGTGA……….AGCTTGT………….. Dog ……….TCGGTCT……….ACCCTCT………….. G G G G C C C G C G

  10. 3 N (binding sites) 2 1 X Y X Y X Y X Y Pr(X) = probability of X using standard tree Markov process Pr(X|Y) = probability of X dependent on corresponding Y branches Compensatory Mutation SXY = fraction of sites for which Pr(X | Y) > Pr(X) Scope = |X – Y|

  11. SX,X+1 for 79 vertebrate PWMs from JASPAR Control-1 Randomly select i, j pairs. Control-2 Randomly select i and then select j=i+s. Control-3 constructs PWM Mr with same width as M by randomly sampling columns from the 79 vertebrate PWMs in JASPAR. Control-4 Construct PWM Mr from M by randomly shuffling the compositions at each column (position).

  12. SX,X+s decreases with increasing scope s. However it remains significantly greater than the respective control-4 up to scope = 6

  13. Functional relevance of positions with compensatory mutation

  14. Evans, Donahue, Hannenhalli, RECOMB-Comparative Genomics 2006

  15. Binding site Ambiguity/Redundancy • Several transcription factors have distinct PWMs • Several distinct transcription factors have very similar PWMs ACCGTGTTT ACCGACTTT ACCGTGAAT ACCGTGTTT TCCGTGTTT TCAGTGTTT TCTGTGTTT TCGGTGTTT PWM1 PWM PWM2

  16. Enhancing Positional Weight Matrices using Mixture models A mixture model allowing an arbitrary number of base PWM Given mixture the probability of observing sequence Xi = (Xi1,…, Xin) is Use EM algorithm to estimate subclasses We use k=2 base class PWMs (due to lack of data and lack of knowledge of appropriate number of classes) Hannenhalli and Wang, Bioinformatics, 2005

  17. Sequence conservation of binding sites using Mixture model Based on 64 Vertebrate TF entries in JASPAR database 48 39 23

  18. Subclass Dissimilarity vs Prediction Improvement Less dissimilar More dissimilar

  19. 39 36 30 23 15 13 64 57 44 32 20 16 Relative entropy between two base PWMs

  20. Expression Coherence of target genes using mixture model EC of a set of genes is the fraction of gene-pairs whose expressions across several tissues/conditions are “very” similar PWM1 PWM2 Is the intra-class EC higher than inter-class EC? • In 44 of the 55 (80%) cases, the average expression coherence within subclass-PWM targets was higher than expression coherence of across subclass targets. • In all but one cases (98%) at least one of the two subclass PWMs had a coherence score higher than the cross coherence score. Hannenhalli and Wang, Bioinformatics, 2005

  21. LEU3 Dataset [Liu et al., 2002] • Free energy of binding available for 46 observed binding sites of LEU3 [Liu et al., 2002] • The two clusters from the EM algorithm have significantly different binding energies.

  22. Bi-clustering based modeling Vertical Partitioning Vertical partitioning ACCGTCTCAA ACCGTGTGAA AGCGTGCCCT ACGGTGCCCA TGGCCGCCGA TCGCACTCTT TGCCCCTGCT TGGCCCTCTT ATACGGT ATACCGT CGCGGCA CGAGCCT III I IV Horizontal Partitioning ACCGTGTTT ACCGACTTT ACCGTGAAT ACCGTGTTT TCCGTGTTT TCAGTGTTT TCTGTGTTT TCGGTGTTT II V Horizontal partitioning

  23. Context-dependent binding specificity X Y X Z X

  24. Binding site Ambiguity/Redundancy • Several transcription factors have distinct PWMs • Several distinct transcription factors have very similar PWMs

  25. TESS

  26. 32 Class +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96 +1.2 0.0 0.96 -1.6 -1.6 -1.6 0.0 -1.6 -1.6 0.0 0.59 0.0 0.59 -1.6 -1.6 -1.6 -1.6 0.59 0.96 0.59 -1.6 -1.6 0.96 -1.6 -1.6 -1.6 -1.6 0.96 80 Family 117 Subfamily 1034 factors

  27. Once upon a time a transcription factor gene was duplicated DNA Binding Domain Interaction Domain Promoter Conserved DBD Divergent nDBD Redundant paralogs Divergent Expression Divergent Promoter

  28. Hypothesis: Homologous TF-pairs with similar DBD have diverged in expression. Control: Homologous nonTF-pairs Homologous TF-pairs with dissimilar DBD D(X,Y) = |EX – EY| Ti T158 T1 TF X TF Y

  29. 416 homologous TF-pairs (BLAST E-value <= E-10) 125 with similar binding (p-value <= 0.02) TFs with similar binding are more similar overall. Thus a greater expression divergence is surprising. In thyroid tissue the hypothesis holds (Mann-Whitney p-value = 0.00156)

  30. In Human, 416 homologous TFs, 125 with similar binding In a total of 158 samples (Novartis) In Yeast, 219 homologous TFs, 35 with similar binding In a total of 57 samples (Spellman)

  31. Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications

  32. Transcription Factor cooperation/interaction Expression Coherence Pilpel et al. (2001). Nat Genet, Banerjee and Zhang (2003) NAR Positional Coherence Hannenhalli and Levy (2002). NAR. Interaction-dependent binding

  33. Interaction-dependent binding ChIP-chip Set of gene promoters bound by F DNA binding motif M of F Transcription Factor F Can M discriminate between P and B? Bound promoters (P) Unbound promoters (B) The answer is NO for a large fraction of transcription factors Perhaps binding of F depends (synergistic or antagonistic) on other motifs

  34. PWM based occupancy probability PWM based occupancy probability Binding probability (ChIP) Interaction coefficient • The ChIP-chip data for a majority of TFs is better explained using interaction-dependent binding. • Almost all of the Yeast cell cycle interactions were detected at 10% prediction rate • When applied to genome-wide CREB binding in rat, 15 of the 18 detected interactions have varying degree of support. • Wang, Jensen, Hannenhalli RECOMB-Regulation 2005

  35. Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications

  36. Co-regulated genes have common binding sites in their promoters Apoptosis Pathway 68 TFs BCL2-antagonist(BAD) 37 TFs in common B-cell CLL/lymphoma 2(BCL2) 89 TFs AP-2, CREB, E2F, cMyc, NF-Kappa-b, c-ETS, Egr-1 etc. 374 Hypergeometric p-val = E-11 68 37 89

  37. Interacting proteins have greater similarity in their promoter regions Hannenhalli and Levy (2003). Mamm Genome

  38. Genes TFs Transcriptional module discovery TFs Singular Value Decomposition 1 0 0 0 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 1 Genes Clique enumeration in bipartite graphs Cluster of genes and discriminating TF Distance Matrix K-means Clustering

  39. TF Tissue Tissue Gene Tissue-Specific Transcriptional Module Tissue specificityby expression level[Schug et al 2005] Binding prediction Transcriptional-Module specific to a tissue type Everett, Wang, Hannenhalli, ISMB 2006

  40. Overview Core promoter prediction TF-DNA binding TF-TF interactions Transcriptional Modules Applications

  41. Transcriptional Regulation in Cardiac Myocytes Frey N, Olson EN. Annu Rev Physiol. 2003;65:45-79.

  42. Expression profiling in advanced heart failure • Large tissue bank from Temple and Penn • Failing explanted hearts (n=173) • Non-failing hearts from unused donors (n=16) • Each hybridized with an HU133A (n=189) • Conservative analysis: RMA (bioconductor), SAM ~3000 dysregulated genes in advanced human HF with FDR < 5%. Is there any evidence that specific transcription factors are directing these changes?

  43. Transcriptional Genomics

  44. Differentially expressed Genes (G) Score(x) = freq(x) in G / freq(x) in B Statistical Significance is computed using 1000 random sampling of genes from background set Background Set (B)

  45. Transcription Factors enriched in differentially up-regulated genes

  46. What about early events? The differentially upregulated genes have a greater number (32) of enriched TFs compared to downregulated genes (6). The ischemic and idiopathic cases are consistent Validation of GATA, MEF2, NKx, NFAT transcription factors in human heart failure Potential role for FOX factors and IRF Mice with infarcts and sham operated controls sacrificed at varying times after surgery (1, 4, 8, 24 hrs, 8 wks) Analysis of differentially co-regulated gene clusters reveal consistent set of transcription factors.

  47. FOX factor Summary • FOX targets change substantially in advanced human HF and in early HF in mice. • FOX factors are present in human heart at physiologic levels: FOXP1, P4, C1, C2, J2 • FOXP1 is localized to nuclei of human cardiac myocytes. • Do FOX factors mediate cardiac hypertrophy? Hannenhalli et al. Circulation, 2006

  48. Gene Regulation in Learning and Memory Naïve (N) Conditioned Stimulus only (CS) Fear Conditioned (FC) Hippocampus Amygdala Keeley et al. Memory and Learning, 2006

  49. Immediate Early Gene Expression is Regulated by Many Transcription Factors http://web1.tch.harvard.edu/research/greenberg/oldsite/Pathways.html

  50. 50 Most Significantly Regulated Genes were Used for Further Analysis Hippocampus Amygdala

More Related