280 likes | 289 Views
Discover the landscape of chromatin states using ChromHMM software applied to diverse datasets from different cell types and genomic features. Learn about chromatin marks, models, and integrative analysis with ENCODE data.
E N D
Genome-wide Discovery and Characterization of Chromatin States Jason Ernst Kellis Lab/Bernstein ENCODE group CSAIL MIT and Broad Institute
Outline • Overview of ChromHMM for discovering chromatin states Application to 41 chromatin marks in CD4+T cells • Extending ChromHMM to multiple cell types Application to Bernstein ENCODE Data • Extending ChromHMM to integrate diverse data types Application to ENCODE Consortium Data
Outline • Overview of ChromHMM for discovering chromatin states Application to 41 chromatin marks in CD4+T cells • Extending ChromHMM to multiple cell types Application to Bernstein ENCODE Data • Extending ChromHMM to integrate diverse data types Application to ENCODE Consortium Data
Cartoon Illustration of ChromHMM Transcription Start Site Enhancer DNA Transcribed Region Observed Chromatin Marks K4me3 K4me3 K4me1 K4me1 K36me3 K36me3 K36me3 K36me3 e.g. H3K4me3 K27ac K4me1 Most likely Hidden State 5 2 1 3 5 5 6 6 6 6 4 6 High Probability Chromatin Marks in State 0.8 0.8 1: 0.7 200bp intervals 4: All probabilities are learned from the data K27ac K4me1 K4me1 0.9 0.8 2: 5: K4me1 K4me3 3: 6: 0.9 0.9 K4me3 K36me3
Chromatin Marks from (Barski et al, Cell 2007; Wang et al Nature Genetics, 2008); DNAseI hypersensitivity from (Boyle et al, Cell 2008); TF binding enrichment computed based on 14 published TF binding experiments; Expression Data from (Su et al, PNAS 2005)
Transition Matrix State To State From
Core Promoter States Proportion of State Different promoter states show distinct functional enrichment Distance to TSS
Transcribed Regions Enrichment (decr.) Enrichment (decr.) Percentage of Gene Length Position relative to exon start Enrichment (incr.) Enrichment (incr.)
Intergenic Active Regions Fold Enrichment Percentage of Gene Length Relative to TSS
Large-scale repressed and repetitive regions Specific repeat elements enriched for specific states Transition Matrix snapshot State From State To Probability of transition
Marks that have been profiled in several ENCODE cell types by the Broad Institute (PI: Bernstein)
Outline • Overview of ChromHMM for discovering chromatin states Application to 41 chromatin marks in CD4+T cells • Extending ChromHMM to multiple cell types Application to Bernstein ENCODE Data • Extending ChromHMM to integrate diverse data types Application to ENCODE Consortium Data
10 State learned from three ENCODE cell lines Enrichment for HUVEC (other cell types similar) Model learned from K562, HUVEC, NHEK data
Comparing chromatin states across cell types HUVEC NHEK K562 Pairwise state fold enrichments Proportion of genome K562 CTCF island state (State 9) highly stable across cell types HUVEC NHEK
Comparing chromatin states across cell types HUVEC NHEK K562 K562 HUVEC GO Enrichment for TSS in Active promoter state (1) in NHEK and unmodified state (7) in HUVEC NHEK NHEK HUVEC
Extending ChromHMM to Multiple Cell Types Concatenate Genomes Stack Features Independent Models K562 Genome Genome Gm12878 K562 H3K4me3 in K562 H3K4me3 H3K4me3 H3K27me3 H3K4me3 in Gm12878 H3K27me3 H3K27me3 in K562 Gm12878 Genome H3K27me3 in Gm12878 H3K4me3 H3K27me3
Outline • Overview of ChromHMM for discovering chromatin states Application to 41 chromatin marks in CD4+T cells • Extending ChromHMM to multiple cell types Application to Bernstein ENCODE Data • Extending ChromHMM to integrate diverse data types Application to ENCODE Consortium Data
Integrating Diverse Data with ChromHMM Transcription Start Site Enhancer DNA Transcribed Region Observed Chromatin Marks K4me3 K4me3 K4me1 K4me1 K36me3 K36me3 K36me3 K36me3 e.g. H3K4me3 K27ac K4me1 Additional Genomic Datasets cMyc e.g. cMyc DNaseI DNaseI DNaseI Most likely Hidden State 5 3 5 5 6 6 6 1 2 6 4 6 High Probability Chromatin Marks in State 0.8 0.8 0.7 1: 0.7 4: 200bp intervals K27ac K4me1 DNaseI K4me1 0.9 0.8 2: 0.7 5: DNaseI K4me3 K4me1 3: 6: 0.9 0.8 0.9 0.9 cMyc DNaseI K4me3 K36me3
State 21 Concentrated near Transcription Termination Sites Distribution Relative to nearest RefSeq TTS
State 0 Highly Specific to TSS Distribution Relative to nearest RefSeq TSS
TSS of Genes of Distinct Function Enriched in Different States
TSS of Genes of Distinct Function Enriched in Different States
TSS of Genes of Distinct Function Enriched in Different States
TSS of Genes of Distinct Function Enriched in Different States
Summary • ChromHMM learns de-novo chromatin states from a large number of chromatin marks • Applications: • Single cell type, lots of marks Chromatin states • Multiple cell types Chromatin dynamics • Diverse input tracks Data integration • Going forward: • Integration with regulatory motifs • Sequence determinants of chromatin / expression
Acknowledgements Members of the Kellis Lab and Bernstein ENCODE group • Manolis Kellis • Bradley Bernstein • Chuck Epstein • Pouya Kheradpour • Michael Lin • Tarjei Mikkelsen • Noam Shoresh Funding: NIH/NHGRI