550 likes | 703 Views
Genomics of Erythroid Regulation: G1E and G1E-ER4. January 20, 2010. Investigators on global predictions and tests. Penn State Hardison Francesca Chiaromonte Yu Zhang Webb Miller Stephan Schuster Frank Pugh, collaborator Kateryna Makova, collaborator Anton Nekrutenko, collaborator
E N D
Genomics of Erythroid Regulation: G1E and G1E-ER4 January 20, 2010
Investigators on global predictions and tests • Penn State • Hardison • Francesca Chiaromonte • Yu Zhang • Webb Miller • Stephan Schuster • Frank Pugh, collaborator • Kateryna Makova, collaborator • Anton Nekrutenko, collaborator • Childrens’ Hospital of Philadelphia • Mitch Weiss • Gerd Blobel • Emory Univ. • James Taylor • Duke Univ. • Greg Crawford, collaborator • Univ. Queensland • Andrew Perkins, collaborator • NHGRI • Laura Elnitski, collaborator
Aims of Global tests and predictions of erythroid regulation
Major erythroid transcription factors Factor Class Mode of discovery GATA1 Zn finger binds b globin locus NF-E2 bZIP binds b globin locus KLF1/EKLF Zn finger subtractive hybridization SCL/TAL1 bHLH rearranged in leukemias GFI1b Zn finger oncoviral integration site ZBTB7a/LRF POZ-Kruppel proto-oncogene lymphomas Others
-GATA- Transcription factor GATA-1 GATA-1 Globin genes • Founding member of a small family of proteins - GATA-2 GATA-6 • Binds functionally important cis WGATAR motifs in regulatory regions of many hematopoietic genes • Essential for erythroid andmegakaryocyte development - gene knockout studies in mice - analysis of human patients
histone modifying enzymes Transcription cofactors Lineage-restricted factors
Gene activation by alterations in chromatin • “Regulatory signals entering the nucleus encounter chromatin, not DNA, and the rate-limiting biochemical response that leads to activation of gene expression in most cases involves alterations in chromatin structure. How are such alterations achieved?” • Gary Felsenfeld & Mark Groudine (2003) Controlling the double helix. Nature 421: 448-453 • "It is now generally argued that reorganization of these chromatin structures is a process that is mechanistically linked to many gene activation or repression events, and is initiated by the action of site-specific transcription factors, acting either through ATP-driven nucleosome remodeling machines, or via the action of enzymes that covalently modify various components of the chromatin structure.” • Sam John, … John A. Stamatoyannopoulos, and Gordon L. Hager (2008) Interaction of the Glucocorticoid Receptor with the Chromatin Landscape. Molecular Cell 29, 611–624. • “These results imply that GATA-1 is sufficient to direct chromatin structure reorganization within the beta-globin LCR and an erythroid pattern of gene expression in the absence of other hematopoietic transcription factors.” • Layon ME, Ackley CJ, West RJ, Lowrey CH. (2007) Expression of GATA-1 in a non-hematopoietic cell line induces beta-globin locus control region chromatin structure remodeling and an erythroid pattern of gene expression. J Mol Biol. 366:737-744.
Chromatin transitions before and after TF binding • "We propose four specific kinds of interaction. The classical mode for GR binding to chromatin involves receptor-dependent recruitment of the Swi/Snf complex (1), resulting in a hormone-dependent hypersensitive transition. It is now clear, however, that some hormone-dependent events must involve other remodeling species (2). Furthermore, many GR binding events are associated with pre-existing transitions, and these constitutive events fall again into two classes, Brg1 dependent (3) and Brg1 independent (4). ” • Sam John, … John A. Stamatoyannopoulos, and Gordon L. Hager (2008) Interaction of the Glucocorticoid Receptor with the Chromatin Landscape. Molecular Cell 29, 611–624.
Order of events in activation can vary • Figure 5. Models Depicting Different Orders of Action by Regulators and Chromatin- Remodeling Complexes • Regulators, HAT complexes, and ATP-dependent remodeling complexes can act in different orders (pathway A, B, or C) and still give the same end result: a template competent for transcription. Although not shown, it is also possible that binding by the general transcription factors precedes the action and recruitment of HAT complexes and ATP-dependent remodelers. • Geeta J. Narlikar, Hua-Ying Fan and Robert E. Kingston (2002) Cooperation between Complexes that Regulate Chromatin Structure and Transcription. Cell 108: 475-487
What biochemical events precede and follow GATA1 binding? • Where should we look in the genome? • Segments around the transcription start site (TSS) • all genes • expressed vs nonexpressed genes • all GATA1-responsive genes • induced vs repressed genes • All TF-occupied segments (OSs) • Distal TF OSs (i.e. outside the TSS zone) • All mappable regions • Much larger computation • Treat levels of biochemical marks as • continuous variables • discrete segments (bound or not, histones modified or not) • Categorize genes and OSs by the order of events • Category 1. Co-occupancy by other TFs, histone modifications, and DHS formation occurs after the TF1 of interest binds or is activated • Category 2. Co-occupancy by other TFs, histone modifications, and DHS formation occurs before the TF1 of interest binds or is activated
G1E-ER4+estradiol Cell-based models to study GATA1 function in vitro hematopoietic differentiation: Gata1– ES cells erythropoietin stem cell factor thrombopoietin immature hematopoietic cell lines G1ME G1E add back GATA-1 + erythroid erythroid megakaryocyte
Global analysis of GATA1-regulated erythroid gene expression in G1E cells (1999-2009) Stably expressing estrogen- activated GATA-1 (GATA-1-ER) + estradiol
Transcriptome analysis hrs in estradiol 0 3 7 14 21 30 morphology hemoglobin U74 array 12,500 probesets 9,266 genes 430 2.0 array 45,000 probesets 19,000 genes Affymetrix gene chip Blood 2004 Genome Res 2009
Kinetics of GATA1-regulated Gene Expression GATA1-induced (>2-fold) 1048 genes • known targets • new gene discovery GATA1-repressed (>2-fold) 1568 genes • stem cell/progenitor markers • proto-oncogenes (Kit/Myc/Myb) • function unknown Affy 430 2.0
Factor occupancy and GATA1 responses • 60 megabase region of chromosome 7 • identify new GATA1-regulated genes • define combinatorial TF interactions • correlate histone marks w/ TF occupancy and gene expression
Western blots show specificity of antibodies and presence of proteins in cells G1E-ER4 G1E-ER4 CH12 CH12 MEL MEL G1E G1E 125 125 GATA1-ER 101 101 GATA2 GATA1 56.2 56.2 α-GATA1 α-GATA2 125 125 CTCF 101 101 56.2 56.2 TAL1 35.8 αTAL1 α-CTCF CH12 are B-lymphoid cells; others are erythroid. Cheryl Keller Capone
Major observations under investigation • GATA1 binds to a majority of the DNA segments occupied by TAL1 in G1E-ER4 cells (+E2). However, over half of these segments are occupied by TAL1 prior to restoration of GATA1. • Only a minority are at GATA2 occupied segments (OSs) • TAL1 seems to be redistributed around some target loci • Change gradient in TAL1 from HS6>HS1 to HS1>HS6 in Hbb LCR • Large changes in histone modifications are not observed after restoring and activating GATA1 • But some “small” changes are observed • Level of GATA1 occupancy is similar in mouse (G1E-ER4+E2 cells) and human (K562 cells), but only a small minority of occupied segments are shared • 15,000 GATA1 OSs in each species • 1,000 GATA1 OSs are shared
Hbb locus and surrounding OR genes • ChIPseq fits with previous data • TAL1 redistributes when GATA1 is restored • PolII and TAL1 are recruited to Hbb genes when GATA1 is restored
Zfpm1 • Induced immediately after GATA1-ER is activated • TAL1 occupancy corresponds to GATA2 OS in G1E
c-Kit • Repressed after GATA1-ER is activated • TAL1 occupancy at GATA1 OSs, may correspond to GATA2 OS in G1E • Loss of TAL1 occupancy correlates with repression
Changes in peaks of occupancy, co-occupancy • Start with peak calls from MACS for all the TF OS and from Fseq for DNase hypersensitive sites • Define overlapping segments as those sharing at least one nucleotide • Use set operations tools in Galaxy to find overlapping segments • Compare OS for each TF +/- GATA1 and find overlaps between TFs Chris Morrissey
TAL1 has the most overlap with GATA1 GATA1 Overlap 2,544 U TAL1_G1E 6,930 TAL1_ER4 7,449 GATA1 TAL1_G1E 4,269 U GATA1 TAL1_ER4 4,443 U Overlap 2,777 GATA1 15,361 Chris Morrissey
CTCF expands, but doesn't move GATA1 Overlap 528 U CTCF_G1E 15,757 CTCF_ER4 27,909 GATA1 CTCF_G1E 555 U GATA1 CTCF_ER4 932 U Overlap 14,982 GATA1 15,361 Chris Morrissey
GATA2 moves a lot (?) GATA1 Overlap 32 U GATA2_ER4 10,759 GATA2_G1E 2,077 U GATA1 GATA2_G1E 465 GATA1 GATA2_ER4 178 U Overlap 356 GATA1 15,361 But is this just an artifact of noisy GATA2 data? Seems like this would be an ideal application for Yu Zhang and Kuan-Bei’s improvement in peak calling by using ChIP data on other proteins. - RH Chris Morrissey
Compute ratios of signals in G1E and ER4, adjust by M vs A plot 1. for each 10bp bin, we have tag counts for both G1E and ER4: tagcnts_g1e and tagcnts_er4; the number of total mapped reads in G1E and ER4: reads_g1e (in millions), and reads_er4 (in millions) 2. calculate rpm_g1e=(tagcnts_g1e+1)/reads_g1e; rpm_er4=(tagcnts_er4+1)/reads_er4. (the reason to do +1 is to remove zeros) 3. calculate M=log2(rpm_er4/rpm_g1e); A=0.5*log2(rpm_er4*rpm_g1e) 4. do MA-plot by plotting M versus A, and build a lowess line through the dots 5. based on the lowess regression, for each "A", predict a value "P" 6. calculate M'=M-P; this M' stands for the difference between ER4 and G1E. Weisheng Wu and F. Chiaromonte
Effect of adjusting ratios by lowess of an M vs A plot Weisheng Wu
Correlations among chromatin features and expression in TSS segments • Examine 4kb DNA segments centered on transcription start sites (TSSs) for all genes • Determine mean signal for TF occupancy, histone modifications in each • Compute Log2 of ratios, MA lowess adjustment • Determine expression levels of genes and change between G1E and ER4 • Draw scatterplots and determine correlations for all pairwise comparisons Weisheng Wu
Correlations in G1E-ER4 cells +E2 Weisheng Wu Same sets of graphs for G1E and ratio of signals have been done
Notable pairwise correlations in TSSs • Co-occupancy by GATA1 and TAL1 • Positive correlation with Trx marks (H3K4me) • Negative correlation with Pc marks (H3K27me3) Weisheng Wu
Changes (if any) in biochemical features at TSS show little or no difference between induced and repressed genes black: TSSs of all genes, red: up, blue: down, green: non-responsive Weisheng Wu
The changes of HMs don’t differ quite much between induced and repressed genes at GATA1os
Principal components in genomic features at TSSs PCA Dataset: Raw counts of all factors, in G1E and ER4 models, in 4kb window around TSS Swathi A. Kumar
Principal components in genomic features at TSSs PCA Dataset: Raw counts of all factors, in G1E and ER4 models, in 4kb window around TSS Swathi A. Kumar
PCA results Swathi A. Kumar
H3K4me3 is major contributor to variance Swathi A. Kumar
Linear Discriminant Analysis Same dataset as used in pairwise correlations of features in TSS segments Initial run Binary response of induced or repressed expression Responsive vs Non-responsive Induced vs Repressed Predictor variables are raw counts of GATA1 and other associated transcription/histone factors, before and after induction. Leave-one-out cross validation Swathi A. Kumar
Major results from LDA • Responsive vs Non-responsive • Misclassification rate = 0.007% • Induced vs repressed • Misclassification rate = 36.7% Swathi A. Kumar
Shared vs lineage-specific GATA1 OSs • ChIP-seq reads in G1E-ER4 + E2 for mouse GATA1 OSs • ChIP-seq reads in K562 for human GATA1 OSs • Map each to their respective genomes (ELAND) • MACs peak calls • 15,000 in each • LiftOver each set of peaks to other species • About 10,000 liftOver in each • Run intersection of the liftedOver peaks • 1000 are shared in both species Yong Cheng, Kuan-Bei Chen
Shared GATA1 OSs show higher occupancy level ChIPseq signal for GATA1 in each OS Yong Cheng
Genes close to shared GATA1 OSs are enriched for well-known erythroid functions
Preserved GATA1 motifs are enriched in the shared GATA1 OSs Yong Cheng
Genome-wide turnover analysis in mouse GATA1 occupied intervals • B is under-estimated since we are using alignments instead of the mouse sequence itself • D : the compensatory motifs have a minimum distance of 10 bp Kuan-Bei Chen
Genome-wide turnover analysis in human Kuan-Bei Chen