470 likes | 486 Views
This summary explores factors co-associations, regulatory networks, & long-range interactions influencing chromosome organization. Focus on genome-wide co-association matrix, TF-centric modules, and functional annotations of regulatory modules. Ongoing work includes bicluster discovery, discriminative TF co-association models, and differential gene expression predictions. An integrated regulatory network, chromatin models, and epigenomic features of chromatin interaction models are analyzed, along with ENCODE project findings on looping interactions between promoters & regulatory elements.
E N D
Regulatory Code, Networks, Long Range Interactions and Chromosome Structure Summary by Michael Snyder Bill Noble March 7, 2011
Association of factors with one another at each • gene and expression outcomes • 2) Organization into Regulatory Networks • 3) Long range interactions: CHIA-PET/5C/HiC • 4) Higher order organization of chromosomes Areas
Regulatory Co-associations(Aka Regulatory Code) ManojHariharanAnshulKundaje
Genome-wide Co-association Matrix ManojH, Kevin Y. has similar results
TF-centric regulatory modules Find biclusters of a focus-TF’s peaks (e.g. GATA1) that co-associate with distinct combinations of other TFs Low co-occupancy GATA1 peaks in K562 High co-occupancy (> 1 bp peak overlap ~800 bp) 75 TFs in K562 Manoj H, Anshul K
GATA1-centric regulatory modules Focus TF: GATA1 in K562 Core Module Manoj H, Anshul K
GATA1-centric regulatory modules Focus TF: GATA1 in K562 Core + Ccnt2/Hmgn3 Gabp/Nrsf/Hdac2 Myc/Max P300/Cebp/Brg1 Manoj H, Anshul K
GATA1-centric regulatory modules Focus TF: GATA1 in K562 cJun/Junb/Jund Promoter Associated TFs (Taf1/Tbp/ Yy1/Hey1/ E2Fs cFos + Fosl1 cJun/Junb/Jund Ctcf/Rad21/Smc3 Nfya/Nfyb Manoj H, Anshul K
E2F4-centric regulatory modules Egr1 Ccnt2 Hmgn3 + E2f4 E2f6 Max Myc + Yy1 Tbp Taf Hey Pol2 Irf1 Elf1 Gabp Chd2 Nfya Nfyb Focus TF: E2F4 in K562 Ctcf++ Gata++ Fos/jun++ Manoj H, Anshul K
Gata1 subcluster Chd2 subcluster E2F4-centric Functional annotations of regulatory modules E2Fs + Chd2, Nfy-a,b containing subcluster in Functions Distinct regulatory modules target genes with different functions Cell Cycle E2Fs + Chd2, Zbtb containing subcluster in Functions E2Fs + Gata containing subcluster in Functions Hypoxia response
Regulatory modules ---> expression Work in progress: • Robust bicluster discovery • Discriminative models of TF co-association (integrate motifs + OpenChrom + Histone marks) • Predicting differential expression of genes across cell-lines using differential binding + expression of regulatory modules Manoj H, Anshul K
Construction and Analysis of an Integrated Regulatory Network Koon-Kiu Yan, Chao Cheng Gerstein Lab
Building hierarchy (To find the hidden intrinsic direction in the presence of cycles) human worm
Network motifs analysis Some enriched motifs
Long Range Associations CHIA-PET Guoliang Li, KuljeetSandhu, Wang Ping, XiaoanRuan, YijunRuan Raymond Auerbach, Michael Snyder 5C/HiC Job Dekker
Chromatin models for transcription regulation Identified by ChIA-PET A Basal Promoter Model B Enhancer Model C Chroperon Model p g e p g e p g p g chr1:173829000-173844000 chr6:28850000-28961000 chr2:220000000-220180000 1 2 3 F Number of chromatin structures D 930 (15%) E BP e p g p g g p g p e chr17:80150000-80430000 C 3967 (64%) 1329 (21%) 1 Number of genes involved 2 1103 (6%) E 11778 (66%) C 3 BP 4973 (28%) E 200 C E 150 Frequency 100 50 0 1M 10K 100K 10M Genomic span
Association of TFs with chromatin architectures A B chr10:3810000..3860000 30 RNAPII E2F4 150 RNAPII ChIA-PET TSS No-TSS C E BP 400 RNAPII 20 E2F4 GATA2 100 10 GATA2 STAT1 100 STAT1 Brg1 100 100 100 Ini1 Rad21 100 100 CTCF 0 10 15 chr16:28820000..28886000 Brg1 Ini1 RNAPII ChIA-PET 400 RNAPII 100 E2F4 GATA2 10 10 Rad21 CTCF 100 100 STAT1 Brg1 100 100 Ini1 Rad21 100 100 CTCF 1250 0 1250 0 1250 0
Epigenomic features of chromatin interaction models 1500 1000 B A 500 10 50 H3K27me3 H3K27ac 400 chr2:70270000-70430000 RNA-Seq 0 90 RNAPII CHIA-PET 500 TSS No-TSS C E 0 Ratio 3.14 1.48 -0.87 2.55 BP 200 0 0 H3K4me3 30 1250 0 1250 1250 0 1250 H3K4me1 70 H3K9ac 30 H3K14ac 20 40 10 FAIRE H3K4me3 H3K4me1 7 H3K9me3 7 H3K27me3 C BP E C 0 0 1250 0 1250 1250 0 1250 1 3 5 TSS 20 3 DNase I RNAPII binding frequency 2 4 6 Non-TSS H3K36me3 -5 0 5 -5 0 5 -5 0 5 0 1 Normalized H3K4me3/H3K4me1 log2 ratio 1250 0 1250 TSS TES
Reporter gene assay shown combinatorial property of promoters in chroperons A B 2000 70 RNA-Seq chr7:1535000-1582000 INTS1 MAFK C14orf102 100 H3K4me3 20 200 H3K4me1 RNA-Seq Ratio 200 0.36 1.13 -3.18 1000 chr14:90750000-90880000 RNAPII ChIA-PET C14orf102 CALM1 100 H3K4me3 H3K4me1 100 Ratio 1.55 1.68 3.33 Luc Reporter assay 300 RNAPII ChIA-PET Reporter assay Luc C14orf102 CALM1 INTS1 MAFK Relative intensity Relative intensity C C14orf102 MAFK INTS1 CALM1
5C ENCODE Pilot Project Design Scheme Parallel amplification and detection of looping interactions between all promoters and distal regulatory elements
5C analysis of ENCODE regions - statusAmartya Sanyal and Bryan Lajoie • 44 regions; 731 TSS interaction profiles • Multiplexing with 6,302 5C primers to detect 2,746,340 pair-wise interactions • Solexa Paired End Sequencing: >40 million reads per cell line per repeat • 2 repeats per cell lines • GM12878 – finished (January 2010 freeze) • K562 – finished (January 2010 freeze) • H1 ES – finished (January 2011 freeze) • HeLa S3 – finished (January 2011 freeze)
Analysis • Peak calling • Quality assessment: detection of gold standards • Alpha-globin, Beta-globin, Igf2 • Integration with other annotations and gene expression data • Network analysis
The active alpha-globin genes interact with distant enhancers Enhancer Alpha globin genes Looping contact Expected interaction frequency
The silent alpha-globin genes do not interact with distant enhancers Enhancer Alpha globin genes Loop absent Expected interaction frequency
Looping elements are enriched in DHSs, CTCF, Histone marks and TFs CTCF DHS TFs Depletion for “none” Active marks Fold enrichment
High confidence looping interactions show tissue specificity K562 GM12878 2884 1350 1534 1256 2790
Connectivity profile changes in different cell types ENm009 GM12878 HBB gene cluster OFF UBQLN3 TRIM genes K562 HBB gene cluster ON UBQLN3 TRIM genes Interrogated interaction High confidence interaction Non-promoter regions Gene promoters
Summary • 5C raw data (2 replicates) for K562, GM12878, HeLa S3 and H1 ES cells was submitted for the January 2011 data freeze • Initial peak calling for K562 and GM12878 was performed • All gold standard long-range interactions are detected in the correct cell type • Long-range interacting elements have all hallmarks of “regulatory elements” • Many interactions are cell type specific • Complex networks of long-range interactions are apparent • A new round of peak calling for all 4 cell lines is ongoing
Status of 5C and Hi-C data • “Our pipelines for peakcalling should be done today [Sunday] and I hope final peak calls will be available for the group during the meeting.” • “For large scale chromosome behavior we have genome-wide hi-c data for K562 (and Hela coming soon).”
Large-scale segmentation • Pre-processing methods • Interpolation and smoothing (HMMSeg) • Windowed averaging (ChromHMM) • Direct methods • Hard constraints • Prior on segment length • Hierarchical model • Post-processing methods • Post hoc clustering (SOM, hierarchical clustering) • Multi-pass segmentation
Hierarchical models • Two-level model topology • Mixture of large-scale and punctate labels “open” “intermediate” “closed”
Post hoc clustering TSS GS gene start GM gene middle GE gene end E enhancer I insulator R repression D dead
Post hoc clustering Active Repressed Dead
Multi-pass segmentation Perform an initial segmentation While the segments are too short • Perform a segmentation on the labels • Compute a new segmentation • Alternatively, could use label posteriors instead of Viterbi segmentation.
Large-scale segmentation • Pre-processing methods • Interpolation and smoothing (HMMSeg) • Windowed averaging • Direct methods • Hard constraints • Prior on segment length • Hierarchical model • Post-processing methods • Post hoc clustering (SOM, hierarchical clustering) • Multi-pass segmentation
Finding domains from 3D data Avinash Sahu
Finding domains from 3D data • Summarize observed n-by-n binary interaction matrix with an m-by-m matrix, m << n. • Interactions in the compressed matrix correpond to regions of increased interaction in the original matrix. • Simultaneously find domain boundaries and interaction labels. Avinash Sahu
Plan • Create a series of large-scale 1D segmentations • Directly compare large- and small-scale 1D segmentations to 3D data • Enrichment of interactions among pairs of labels • Infer domain segmentations from 3D data • Compare 1D and 3D segmentations using our existing framework
Regulatory Code: • ManojHariharan, AnshulKundale • Networks: • Koon-Kiu Yan, Chao Cheng, Mark Gerstein • CHIA-PET • GuoliangLi, KuljeetSandhu, Wang Ping, XiaoanRuan, YijunRuan • Raymond Auerbach, Michael Snyder • 5C/HiC • Job Dekker • Long Range Chromosome • Bill Noble, AvinashSahu Acknowledgements
C • TS • HK E BP 0.3 Transcriptional activities in chromatin architectures 0.2 • Density p< 2.2e-16 0.1 0.0 0 20 40 60 80 • Expression breadth (# tissues) p = 0.003 1.2 0.8 • CpG content • HCG 0.4 • 0.35 • LCG 0.0 R BP E C Specific Random 2.0 BP E C p< 2.2e-16 • p < 2.2e-16 • Density 1.0 0.0 -0.5 0.0 0.5 1.0 • Pearson’s CC
Phase 2: TF Coassociation centered on specific TFs – Methods Rank Normalize every peak in all datasets based on Signal Merge datasets where cell-line and TF are same Get average of the normalized rank, if datasets are merged Assign peaks to gene regulatory region & get expression value: avRPKM Clustering – Hierarchical & Biclustering
Peak calling Identify signals that are significantly above expected for a given genomic distance. LOESS moving average 5C signal Kb
5C interaction data • Identification of distal regulatory elements of each gene • Integration of distal elements with other datatypes • Mapping the network connectivity between genes and distal elements • Do genes interact with multiple elements? • Do regulatory element act on more than one gene? • Approach • 5C mapping of TSS-to-element interactions throughout the ENCODE pilot regions • Cell lines: 5C data for January 2011 data freeze • K562, GM12878, HeLa S3, H1 ES