500 likes | 664 Views
Genome wide analysis methods for cis -regulatory element identification and network linkage validation. MBL GRN course Ellen Rothenberg Division of Biology, California Institute of Technology Oct. 19, 2011. Problems. Large genomes – long distances over which cis -elements must be assayed
E N D
Genome wide analysis methods for cis-regulatory element identification and network linkage validation MBL GRN course Ellen Rothenberg Division of Biology, California Institute of Technology Oct. 19, 2011
Problems • Large genomes – long distances over which cis-elements must be assayed • Multiple paralogs of transcription factors in vertebrates: redundant binding specificity of co-expressed paralogs can mask important biological function in single ko • Multiple tissue use of same transcription factors in different contexts: what is a “target”? What mechanism(s) controls target selection, & at what level? • For postembryonic development, depth of sequential developmental events
How can whole-genome deep sequencing approaches help? • Some accessible questions • Status of genes off/on/silenced/poised • Sites of cis-reg elements • Presence of target genes • Putative interacting factor complexes • Some additional hypotheses that can be evaluated • Do different “epigenetic states” enforce stable expression or silencing? • Do different “epigenetic states” alter transition thresholds for expression or silencing? • If so, how?
Frontier questions for analysis of gene networks in animals with large genomes • Using genome-wide approaches to identify relevant cell type-specific cis-regulatory modules • Role of histone modification in time delay regulation of gene expression • Transcription factor-eye view of the genome • Relationship between epigenetic modification and transcription factor binding • Transcription factor binding vs. transcription factor function: another frontier
Histone Modifications Associated with Chromatin Accessibility and Gene Activity • Acetylated H3 (K9 & K14) (H3K(9,14)Ac) • Correlated with active chromatin domains. (regulatory elements, including active promoters, enhancers, LCRs, and insulators) ( Roh, T.Y. et al. Genes & Dev. 2005) • Di-methylated H3K4 (H3K4me2) • A marker of transcription competence, and often associated with inactive unmethylated high CpG promoters. (Bernstein, B.E. et al. Cell 2005, Weber, M. Nat. Genet. 2007). Closely related to developmentally poised genes in hematopoiesis (Orford, K. et al. Dev Cell 2008) • Tri-methylated H3K27 (H3K27me3) • A marker for polycomb complex mediated transcription silence. (Boyer, L. A. et al. Nature 2006)
Histone Modifications Associated with Chromatin Accessibility and Gene Activity • Mono-methylated H3K4 (H3K4me1) • Correlated with distal regulatory elements (Heintzmann et al. 2007, 2009), irrespective of function • Anticorrelated with promoters • Converted to H3K4me2 or H3K4me3 locally where further activated • Tri-methylated H3K4 (H3K4me3) • Correlated with active promoters (actual TSS’s) • Also seen at “poised” promoters, even if temporarily repressed • Occasionally seen at distal enhancers when they become active (looping with polymerase II loading; via transfer??) • Tri-methylated H3K27 (H3K27me3) • PRC2-mediated repression as before • Can be “bivalent” with H3K4me3 at a class of unstably repressed TSS’s
Rules for association between transcription factor binding and H3K4 methylation: promoter proximal vs. promoter-distal sites Case: E2A (Tcf3) bHLH factor in mouse pro-B cells (Lin, …Murre 2010 Nat Immunol)
Recruitment of E2A to new sites via EBF expression in pro-B cells EBF1: factor required for progression from pre-pro-B to pro-B cell stageEBF1-/- cells are arrested in pre-pro-B stageE2A binding and histone modification assessed at sites where EBF1 binds in pro-B cells
Binding of E2A with different partners across the genome in the pre-pro-B cell–to–pro-B cell transition:Many sites of collaborative binding work as B-cell enhancers
Defining elements involved in gene expression change: Global survey of epigenetic transformations across T lineage commitment • cis-regulatory elements “in play” vs. “inaccessible” from DN1/ETP to DN3 stage and beyond • RNAseq: transcriptome • ChIP-seq:histone modifications from DN1 stage through commitment & beyond • mapped by genome-wide deep sequencing FL-derived ETP,DN2a,DN2b; Adult thymus-derived DN3a, & DP (TCRa-/-) “activation” AcH3 “accessibility” H3K4me2 H3K27me3 “repression” (one form) RNA AcH3 Ad thy DP H3K4me2 (Jingli Zhang, Ali Mortazavi, Brian Williams, Lorian Schaeffer, Barbara Wold, Ellen Rothenberg, unpublished data)
H3K4me2 H3K(9,14)Ac H3K27me3 RNA All genes during T cell specification:Histone modifications at the transcriptional start site vs. RNA expression DN1, DN2a, DN2b, DN3, DP stages, left rightHeat map, log2 of integrated intensity
Activation and repression of key genes important for hematopoiesispromoter-associated modifications vs. expression
H3K9,14Ac H3K4me2 DN1 DN2A DN2B DN3 DP H3K27me3 RNA DN1 DN2A DN2B DN3 DP Bcl11b (T-cell specific transcription factor)
H3Ac H3K4me2 DN1 DN2A DN2B DN3 DP H3K27me3 RNA DN1 DN2A DN2B DN3 DP CD3 Cluster (3 genes crucial for TCR signaling)
Functional foreshadowing: GATA-3 binds fully to Cd3d enhancer before gene activation DN1 DN2a H3K4me2 DN2b DN3 DP DN1 GATA3 DN2b DP
H3K9,14Ac H3K4me2 H3K27me3 RNA Hhex (Progenitor cell self-renewal factor)
H3K9,14Ac H3K4me2 H3K27me3 RNA PU.1 (Progenitor cell & myeloid factor)
Predicting obscure objects of desire:a long-sought T-cell cis-reg element for Gata3 as a T-lineage H3K4me2 peak
Predicting obscure objects of desire:a long-sought T-cell cis-reg element for Gata3 as a T-lineage H3K4me2 peak
A far downstream region undergoes parallel developmentally regulated histone modifications as the Bcl11b locus
NFS25 (non-T) Raw264.7 (non-T) T-cell specific enhancer activity of the far downstream “major peak” element with the Bcl11b promoter in stable transfection experiments pGL3 Basic pGL3 Basic pGL3 Control pGL3 PR3 pGL3 PR3-MP pGL3 Control pGL3 PR3 pGL3 PR3-MP Relative Luciferase Activities (Firefly/Renilla) P2C2 (T cells) pGL3 Basic pGL3 Basic pGL3 Control pGL3 PR3 pGL3 PR3-MP pGL3 Control pGL3 PR3 pGL3 PR3-MP
Interpretation of histone marks • Far greater information content at sites where histone marking changes across a defined developmental interval • Histone mark changes: “transcription factors at work”
Transcription factors give site specificity to histone modifiers:Tbet(Tbx21) directly recruits H3K27 demethylase and H3K4 methyltransferase to target sites to activate Th1 genes Direct binding sites for KDM and KMT onTbet mapped by mutational phenotype analysis: Miller, … Weinmann, 2008, Genes Dev
Scanning the genome from a transcription factor’s point of view • Even most restrictive PWM has many occurrences in genome • Many realistic PWMs allow vast numbers of hits across genome • Many transcription factors are used for completely different roles in different cell types • How much of genome is accessible to factor in any one cell type? • How is this determined?
“B cell factor” EBF1 itself needs targeting help:“BEKO” T cells, not NIH/3T3 fibroblasts, provide context for EBF1 recruitment and function at B-cell genes (Treiber, …Grosschedl,2010, Immunity)
PU.1 ChIP-seq: PU.1 has a distinct set of genome-wide binding sites in early T-lineage cells DN1 vs.macrophage DN1 vs.mature B cell DN1 vs.prethymic progenitor(E2a ko pre-pro-B cell) DN1 vs. non-T (B, M, pre-pro B): distinct site preferences
PU.1 ChIP-seq: PU.1 has a consistent set of genome-wide binding sites in early T-lineage cells Decreasing PU.1 lower occupancy But highly similar binding site specificity within T lineage DN1 vs. non-T (B, M, pre-pro B): distinct site preferences DN1 vs. DN2a or vs. DN2b: similar preference
Sites of PU.1 engagement in early T cells have “active” histone marks H3 Ac and H3K4me2
Potential PU.1 binding sites that are preferentially occupied in early T cells mark genes that are more highly expressed in early T cells
PU.1 enhances the “phase 1” state: positive regulation of another stem/progenitor-cell gene, Bcl11a H3K4me2 ChIP-seq PU.1 ChIP-seq Overexpression of PU.1 in fetal thymocytes upregulates Bcl11a, and forced expression of PU.1 in an immature T-cell line that has shut off Bcl11a can “reawaken” Bcl11a expression (Jingli Zhang, Marissa M. Del Real)
PU.1 binds genes with integral roles in early T-cell development, at least as well as it does in prethymic precursors… as long as they are accessible Prethymic lymphoid precursors
But not at “master” B-cell lineage gene Pax5: But it can be excluded from genes that must be kept silent throughout T-cell development H3Ac H3Ac H3K4me2 H3K4me2 H3K27me3 H3K27me3 PU.1 RNA: Prethymic lymphoid precursors
Developmental time course of PU.1 site occupancy in pro-T cells correlates with positive regulation of linked genes globally Genes downregulated from DN1 to DN2b Genes upregulated from DN1 to DN2b Cumulative probability More PU.1 binding in DN1(faster losswith dilution) More PU.1 binding in DN2b (slower than average dilution) Cumulative probability (Prox)
The challenge: distinguishing functional from nonfunctional or redundant PU.1 sites • Binding at >30,000 sites in DN1 cells • Even more sites in macrophages! • There are only ~30K genes in all in the mouse genome • Binding occurs at genes with various expression patterns like and unlike PU.1 (upreg or stable or silent) • H3K4me2 induced, often parallel to PU.1 occupancy
In PU.1-deficient hematopoietic progenitors, activation of a PU.1 transgene can recruit histone methylation to somePU.1 binding sites H3K4me1 marking at distal PU.1 binding sites induced by activating PU.1 (Heinz et al., 2010, Molec. Cell.) Can PU.1/H3K4 methylation be used to identify distal sites of PU.1 action?
Possible clues to subsets of PU.1 sites where H3K4 methylation depends on PU.1 occupancy • at distal sites (potential enhancers) where H3K4me2 marking is dynamic • most useful to look at individual genes, not aggregate!!
The challenge: distinguishing functional from nonfunctional or redundant PU.1 sites Expression like PU.1 Expression unlike PU.1 • One possible difference: promoter-associated binding is skewed to genes with “phase 1” expression like PU.1 • Causal, or effect of promoter/ enhancer looping??
PU.1 take home lessons • Binding alone is too easy for this factor to enable us to distinguish sites where it is and is not rate limiting for function • Relatively constant set of sites bound in pro-T cells irrespective of factor concentration • “Gregarious” association with open promoters of stably expressed genes as well as PU.1-sensitive genes • All have beautiful PU.1 target motifs (RRRGGAAGTG) • Binding at distal sites is often sufficient to induce histone modification • Function depends on collaboration with other factors • But for given PU.1-sensitive genes, binding pattern focused functional dissection of potential cis-elements
In contrast… GATA-3 is expressed almost stably throughout T cell developmentMultiple essential roles from DN1 to DN3 to DP to positive selection in the thymus, and in mature T cells in peripheral immune responses ~1000 binding sites detected in each stage
But in contrast… a GATA-3 site is not always a GATA-3 site: different sites bound in different stages despite stable expression
GATA-3 binding can precede activation of a cis-reg element and anticipate transcription
But: substantial shifts in relative site preference for GATA-3 between stages DN1 DN2b DP
Specification of T cells in multiple layers of regulation TCF7