1 / 47

Regulatory Code, Networks, Long Range Interactions and Chromosome Structure

This summary explores factors co-associations, regulatory networks, & long-range interactions influencing chromosome organization. Focus on genome-wide co-association matrix, TF-centric modules, and functional annotations of regulatory modules. Ongoing work includes bicluster discovery, discriminative TF co-association models, and differential gene expression predictions. An integrated regulatory network, chromatin models, and epigenomic features of chromatin interaction models are analyzed, along with ENCODE project findings on looping interactions between promoters & regulatory elements.

knott
Download Presentation

Regulatory Code, Networks, Long Range Interactions and Chromosome Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regulatory Code, Networks, Long Range Interactions and Chromosome Structure Summary by Michael Snyder Bill Noble March 7, 2011

  2. Association of factors with one another at each • gene and expression outcomes • 2) Organization into Regulatory Networks • 3) Long range interactions: CHIA-PET/5C/HiC • 4) Higher order organization of chromosomes Areas

  3. Regulatory Co-associations(Aka Regulatory Code) ManojHariharanAnshulKundaje

  4. Genome-wide Co-association Matrix ManojH, Kevin Y. has similar results

  5. TF-centric regulatory modules Find biclusters of a focus-TF’s peaks (e.g. GATA1) that co-associate with distinct combinations of other TFs Low co-occupancy GATA1 peaks in K562 High co-occupancy (> 1 bp peak overlap ~800 bp) 75 TFs in K562 Manoj H, Anshul K

  6. GATA1-centric regulatory modules Focus TF: GATA1 in K562 Core Module Manoj H, Anshul K

  7. GATA1-centric regulatory modules Focus TF: GATA1 in K562 Core + Ccnt2/Hmgn3 Gabp/Nrsf/Hdac2 Myc/Max P300/Cebp/Brg1 Manoj H, Anshul K

  8. GATA1-centric regulatory modules Focus TF: GATA1 in K562 cJun/Junb/Jund Promoter Associated TFs (Taf1/Tbp/ Yy1/Hey1/ E2Fs cFos + Fosl1 cJun/Junb/Jund Ctcf/Rad21/Smc3 Nfya/Nfyb Manoj H, Anshul K

  9. E2F4-centric regulatory modules Egr1 Ccnt2 Hmgn3 + E2f4 E2f6 Max Myc + Yy1 Tbp Taf Hey Pol2 Irf1 Elf1 Gabp Chd2 Nfya Nfyb Focus TF: E2F4 in K562 Ctcf++ Gata++ Fos/jun++ Manoj H, Anshul K

  10. Gata1 subcluster Chd2 subcluster E2F4-centric Functional annotations of regulatory modules E2Fs + Chd2, Nfy-a,b containing subcluster in Functions Distinct regulatory modules target genes with different functions Cell Cycle E2Fs + Chd2, Zbtb containing subcluster in Functions E2Fs + Gata containing subcluster in Functions Hypoxia response

  11. Regulatory modules ---> expression Work in progress: • Robust bicluster discovery • Discriminative models of TF co-association (integrate motifs + OpenChrom + Histone marks) • Predicting differential expression of genes across cell-lines using differential binding + expression of regulatory modules Manoj H, Anshul K

  12. Construction and Analysis of an Integrated Regulatory Network Koon-Kiu Yan, Chao Cheng Gerstein Lab

  13. Construction of an integrated network

  14. Hair ball structure of the integrated network

  15. Building hierarchy (To find the hidden intrinsic direction in the presence of cycles) human worm

  16. Network motifs analysis Some enriched motifs

  17. Long Range Associations CHIA-PET Guoliang Li, KuljeetSandhu, Wang Ping, XiaoanRuan, YijunRuan Raymond Auerbach, Michael Snyder 5C/HiC Job Dekker

  18. Chromatin models for transcription regulation Identified by ChIA-PET A Basal Promoter Model B Enhancer Model C Chroperon Model p g e p g e p g p g chr1:173829000-173844000 chr6:28850000-28961000 chr2:220000000-220180000 1 2 3 F Number of chromatin structures D 930 (15%) E BP e p g p g g p g p e chr17:80150000-80430000 C 3967 (64%) 1329 (21%) 1 Number of genes involved 2 1103 (6%) E 11778 (66%) C 3 BP 4973 (28%) E 200 C E 150 Frequency 100 50 0 1M 10K 100K 10M Genomic span

  19. Association of TFs with chromatin architectures A B chr10:3810000..3860000 30 RNAPII E2F4 150 RNAPII ChIA-PET TSS No-TSS C E BP 400 RNAPII 20 E2F4 GATA2 100 10 GATA2 STAT1 100 STAT1 Brg1 100 100 100 Ini1 Rad21 100 100 CTCF 0 10 15 chr16:28820000..28886000 Brg1 Ini1 RNAPII ChIA-PET 400 RNAPII 100 E2F4 GATA2 10 10 Rad21 CTCF 100 100 STAT1 Brg1 100 100 Ini1 Rad21 100 100 CTCF 1250 0 1250 0 1250 0

  20. Epigenomic features of chromatin interaction models 1500 1000 B A 500 10 50 H3K27me3 H3K27ac 400 chr2:70270000-70430000 RNA-Seq 0 90 RNAPII CHIA-PET 500 TSS No-TSS C E 0 Ratio 3.14 1.48 -0.87 2.55 BP 200 0 0 H3K4me3 30 1250 0 1250 1250 0 1250 H3K4me1 70 H3K9ac 30 H3K14ac 20 40 10 FAIRE H3K4me3 H3K4me1 7 H3K9me3 7 H3K27me3 C BP E C 0 0 1250 0 1250 1250 0 1250 1 3 5 TSS 20 3 DNase I RNAPII binding frequency 2 4 6 Non-TSS H3K36me3 -5 0 5 -5 0 5 -5 0 5 0 1 Normalized H3K4me3/H3K4me1 log2 ratio 1250 0 1250 TSS TES

  21. Reporter gene assay shown combinatorial property of promoters in chroperons A B 2000 70 RNA-Seq chr7:1535000-1582000 INTS1 MAFK C14orf102 100 H3K4me3 20 200 H3K4me1 RNA-Seq Ratio 200 0.36 1.13 -3.18 1000 chr14:90750000-90880000 RNAPII ChIA-PET C14orf102 CALM1 100 H3K4me3 H3K4me1 100 Ratio 1.55 1.68 3.33 Luc Reporter assay 300 RNAPII ChIA-PET Reporter assay Luc C14orf102 CALM1 INTS1 MAFK Relative intensity Relative intensity C C14orf102 MAFK INTS1 CALM1

  22. 5C ENCODE Pilot Project Design Scheme Parallel amplification and detection of looping interactions between all promoters and distal regulatory elements

  23. 5C analysis of ENCODE regions - statusAmartya Sanyal and Bryan Lajoie • 44 regions; 731 TSS interaction profiles • Multiplexing with 6,302 5C primers to detect 2,746,340 pair-wise interactions • Solexa Paired End Sequencing: >40 million reads per cell line per repeat • 2 repeats per cell lines • GM12878 – finished (January 2010 freeze) • K562 – finished (January 2010 freeze) • H1 ES – finished (January 2011 freeze) • HeLa S3 – finished (January 2011 freeze)

  24. Analysis • Peak calling • Quality assessment: detection of gold standards • Alpha-globin, Beta-globin, Igf2 • Integration with other annotations and gene expression data • Network analysis

  25. The active alpha-globin genes interact with distant enhancers Enhancer Alpha globin genes Looping contact Expected interaction frequency

  26. The silent alpha-globin genes do not interact with distant enhancers Enhancer Alpha globin genes Loop absent Expected interaction frequency

  27. Looping elements are enriched in DHSs, CTCF, Histone marks and TFs CTCF DHS TFs Depletion for “none” Active marks Fold enrichment

  28. High confidence looping interactions show tissue specificity K562 GM12878 2884 1350 1534 1256 2790

  29. Connectivity profile changes in different cell types ENm009 GM12878 HBB gene cluster OFF UBQLN3 TRIM genes K562 HBB gene cluster ON UBQLN3 TRIM genes Interrogated interaction High confidence interaction Non-promoter regions Gene promoters

  30. Summary • 5C raw data (2 replicates) for K562, GM12878, HeLa S3 and H1 ES cells was submitted for the January 2011 data freeze • Initial peak calling for K562 and GM12878 was performed • All gold standard long-range interactions are detected in the correct cell type • Long-range interacting elements have all hallmarks of “regulatory elements” • Many interactions are cell type specific • Complex networks of long-range interactions are apparent • A new round of peak calling for all 4 cell lines is ongoing

  31. Status of 5C and Hi-C data • “Our pipelines for peakcalling should be done today [Sunday] and I hope final peak calls will be available for the group during the meeting.” • “For large scale chromosome behavior we have genome-wide hi-c data for K562 (and Hela coming soon).”

  32. Large-scale segmentation • Pre-processing methods • Interpolation and smoothing (HMMSeg) • Windowed averaging (ChromHMM) • Direct methods • Hard constraints • Prior on segment length • Hierarchical model • Post-processing methods • Post hoc clustering (SOM, hierarchical clustering) • Multi-pass segmentation

  33. Hierarchical models • Two-level model topology • Mixture of large-scale and punctate labels “open” “intermediate” “closed”

  34. Post hoc clustering TSS GS gene start GM gene middle GE gene end E enhancer I insulator R repression D dead

  35. Post hoc clustering Active Repressed Dead

  36. Multi-pass segmentation Perform an initial segmentation While the segments are too short • Perform a segmentation on the labels • Compute a new segmentation • Alternatively, could use label posteriors instead of Viterbi segmentation.

  37. Large-scale segmentation • Pre-processing methods • Interpolation and smoothing (HMMSeg) • Windowed averaging • Direct methods • Hard constraints • Prior on segment length • Hierarchical model • Post-processing methods • Post hoc clustering (SOM, hierarchical clustering) • Multi-pass segmentation

  38. Finding domains from 3D data Avinash Sahu

  39. Finding domains from 3D data • Summarize observed n-by-n binary interaction matrix with an m-by-m matrix, m << n. • Interactions in the compressed matrix correpond to regions of increased interaction in the original matrix. • Simultaneously find domain boundaries and interaction labels. Avinash Sahu

  40. Plan • Create a series of large-scale 1D segmentations • Directly compare large- and small-scale 1D segmentations to 3D data • Enrichment of interactions among pairs of labels • Infer domain segmentations from 3D data • Compare 1D and 3D segmentations using our existing framework

  41. Regulatory Code: • ManojHariharan, AnshulKundale • Networks: • Koon-Kiu Yan, Chao Cheng, Mark Gerstein • CHIA-PET • GuoliangLi, KuljeetSandhu, Wang Ping, XiaoanRuan, YijunRuan • Raymond Auerbach, Michael Snyder • 5C/HiC • Job Dekker • Long Range Chromosome • Bill Noble, AvinashSahu Acknowledgements

  42. Extra Slides

  43. C • TS • HK E BP 0.3 Transcriptional activities in chromatin architectures 0.2 • Density p< 2.2e-16 0.1 0.0 0 20 40 60 80 • Expression breadth (# tissues) p = 0.003 1.2 0.8 • CpG content • HCG 0.4 • 0.35 • LCG 0.0 R BP E C Specific Random 2.0 BP E C p< 2.2e-16 • p < 2.2e-16 • Density 1.0 0.0 -0.5 0.0 0.5 1.0 • Pearson’s CC

  44. Phase 2: TF Coassociation centered on specific TFs – Methods Rank Normalize every peak in all datasets based on Signal Merge datasets where cell-line and TF are same Get average of the normalized rank, if datasets are merged Assign peaks to gene regulatory region & get expression value: avRPKM Clustering – Hierarchical & Biclustering

  45. Peak calling Identify signals that are significantly above expected for a given genomic distance. LOESS moving average 5C signal Kb

  46. 5C interaction data • Identification of distal regulatory elements of each gene • Integration of distal elements with other datatypes • Mapping the network connectivity between genes and distal elements • Do genes interact with multiple elements? • Do regulatory element act on more than one gene? • Approach • 5C mapping of TSS-to-element interactions throughout the ENCODE pilot regions • Cell lines: 5C data for January 2011 data freeze • K562, GM12878, HeLa S3, H1 ES

More Related