1 / 101

Genes and Regulatory Elements

Genes and Regulatory Elements. Zhiping Weng U Mass Medical School. ENC yclopedia O f DNA elements (The ENCODE Project Consortium, Science 2004, Nature 2007). r112. r221. r121. r231. r113. m002. r212. 5. 4. 3. r331. r131. 1. 2. m011. m010. m009. r334. r223. r123. r332.

tareq
Download Presentation

Genes and Regulatory Elements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genes and Regulatory Elements Zhiping Weng U Mass Medical School

  2. ENCyclopedia Of DNA elements (The ENCODE Project Consortium, Science 2004, Nature 2007) r112 r221 r121 r231 r113 m002 r212 5 4 3 r331 r131 1 2 m011 m010 m009 r334 r223 r123 r332 r114 m013 r323 m012 m001 m003 r222 m014 r321 r312 r232 11 8 9 10 7 12 6 m008 r111 r211 r213 r233 r311 r313 r122 r322 18 16 17 r132 15 14 13 r333 m004 m005 m007 r133 r324 20 21 22 Y 19 m006 X ENCODE Goal: Identify all functional elements in the human genome. Pilot phase: 1% of the genome is being annotated very extensively (30 Mb of sequence). Now genome-wide

  3. The ENCODE Project Consortium (2004) The ENCODE (ENCyclopedia Of DNA Elements) Project Science, Vol 306, 636-640.

  4. Gene RNA-seq

  5. Epigenomics

  6. Regulatory Elements

  7. 45% repetitive DNA 2% genes (25,000) The human genome 53% Unique and segmental duplicated DNA Where are the gene regulatory elements? G. Crawford

  8. DNase hypersensitive (HS) sites identify active gene regulatory elements DNase I HS sites • Regions hypersensitive to DNase • Promoters • Enhancers • Silencers • Insulators • Locus control regions • Meiotic recombination hotspots HS sites identify “open” regions of chromatin Crawford et al., Nature Methods 2006

  9. or sequence directly. DNase-chip to identify DNase HS sites Crawford et al., Nature Methods 2006

  10. Arrays used for DNase-chip NimbleGen arrays 385,000 50-mer oligos oligos spaced every 38 bases (12 base overlap) non-repetitive unique regions 1% of the genome (44 ENCODE regions) Crawford et al., Nature Methods 2006

  11. DNase-chip Quality Assessment Xi H., Shulha H.P., Lin J.M., Vales T.R., Fu Y., Bodine D.M., McKay R.D.J, Chenoweth J.G., Tesar P.J., Furey T.S., Ren B., Weng Z.+, Crawford G.E.+ (2007) Identification and characterization of cell type-specific and ubiquitous chromatin regulatory structures in the human genome. +Co-corresponding authors PLoS Genetics, 8, 8-20.

  12. Ubiquitous HS sites 20% Cell-type specific and Common HS sites 80% Unique, common, and ubiquitous DNase HS sites GM CD4 HeLa H9 K562 IMR90 Collectively, the DHS cover 8.3% of the ENCODE regions.

  13. Have we reached saturation in identifying most DNase HS sites?

  14. CpG content of DNase HS sites

  15. Ubiquitous DNase HS sites are enriched for promoters (TSS) What about ubiquitous distal DNase HS sites? Locations of cell-type specific, common, and ubiquitous DNase HS sites with respect to the Transcription Start Site (TSS)

  16. Most Distal (non-TSS) ubiquitous DNase sites are insulators bound by CTCF ChIP

  17. Antibody against CTCF Tiling array Chromatin-immunoprecipitation (ChIP) - chip Kim T.H. et al. Direct Isolation and Identification of Promoters in the Human Genome Genome Research (2005) Direct sequencing  ChIP-seq

  18. The H19/IGF2 Locus is well insulated

  19. DNase HS sites identify insulator in the Hox locus

  20. Cell culture insulator assays demonstratethat DNaseI HS sites (that overlap CTCF) display enhancer blocking activity.

  21. CTCF motif sites are conserved

  22. CTCF sites make up a greater % of ubiquitous distal DNase HS sites than enhancers

  23. Ubiquitous DNase HS sites are enriched for promoters (TSS) Locations of cell-type specific, common, and ubiquitous DNase HS sites with respect to the Transcription Start Site (TSS)

  24. Ubiquitous proximal DNase HS sites

  25. Locations of cell-type specific, common, and ubiquitous DNase HS sites with respect to the Transcription Start Site (TSS)

  26. Antibody against histone modification • Tiling array • Sequencing

  27. Enrichment between tissue-specific H3K4me2 and DNase HS sites

  28. Cell type-specific DNase HS sites correlatewith cell type-specific histone modifications Similarly for H3K4me1, H3K4me3, H3ac and H4ac, for which we have experimental data.

  29. Cell type-specific DNase HS sites correlate with cell type-specific enhancers

  30. Cell type-specific DNase HS sites correlate with cell type-specific gene expression

  31. Transcriptional Motifs Gene transcription is controlled by molecules (transcription factors, or TFs) binding to short DNA sequences (cis-elements, TF motifs) in promoters and distal elements

  32. Finding enriched motifs in tissue-specific DNase HS sites Screen against a motif library, e.g., JASPAR or TRANSFAC STAT DHS #1 DHS #2 DHS #3 DHS #4 DHS #5 the Clover algorithm Myc/Max YY1 (etc.)

  33. JASPAR: a database of transcription factor motifs

  34. Raw score 17.3 Clover:Cis-eLement OVERrepresentation Myc/Max DHS sequences

  35. Clover Raw score The Clover AlgorithmFrith MC, Fu Y, Yu L, Chen J-F, Hansen U, Weng Z (2004). Detection of Functional DNA Motifs Via Statistical Overrepresentation. Nucleic Acids Res. 32:1372-1381. Lk: nucleotide at position k W: motif width S: a promoter sequence Ms: number of motif locations in a sequence A: all possibilities of choosing a subset of sequences N: the total number of promoter sequences

  36. Control DNA sequences Raw score 9.1 18 17.3 4.2 6.6 Clover:Cis-eLementOVERrepresentation Myc/Max DHS sequences P-value = 1/4

  37. Motifs enriched in cell-type specific DNase HS sites

  38. Motifs enriched in cell-type specific DNase HS sites

  39. Genome-wide DNase-chip and DNase-sequencing data • CD4 cells • 23 k proximal DNaseI HS sites • 72 k distal DNaseI HS sites

  40. Enriched transcription factor binding motifs in distal DNaseI HS sites • Hematopoietic system: • TAL1 • AML • PU.1 • C/EBPα • Immune system: • STAT1, STAT3, STAT5 • IRF1, IRF3 and IRF5

  41. Identify motif clusters (modules) Distal DHS sequences acgtcggctgacaccaggtctgcttgattcgatgagattgaattcgtaggagctggattagag ggcttggggcttgaggcttgacaccatatcgtagcgctgagttgctgagtttcgtatggcgct cgatgcttattagcggctattataggctagctaggcaatacacatcgctgatatagcggctta tgagatagcgtgctagctatatggattggaatattcggcgctgaaaggtcttagctagtcgta aatatatgcgcgtatgcgtatggcgggtatatgggggcttggtcttttttttcgcttaggtcg Find motif clusters in the human genome Enriched motifs

  42. Finding motif clusters with a hidden Markov model Motif Score Location in DNA Red = motif type 1 (e.g. TAL1) Blue = motif type 2 (e.g. ETS) 0.8 Cluster-Buster MC Frith, MC Li, Z Weng (2003). Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Research, 31(13):3666-8. http://zlab.bu.edu/cluster-buster/ 0.1 0.1

  43. Overlap * Sequence space Enrichment of the overlap = DHS * Motif Clusters Overlap between predicted motif clusters and distal DNase HS sites Predicted motif clusters Cutoff DNase HS sites

  44. Motif clusters can predict distal DNase HS sites genome-wide

  45. Summary • DNase HS sites identified from 6 cell types Cell-type specific Common Ubiquitous (found in all cell types studied) • Ubiquitous DNase HS sites are likely to function as… Promoters (TSS) Insulators (CTCF) (no enhancers?) • Ubiquitous sites indicative of housekeeping chromatin structure • Cell-type specific DNase HS sites Correlate with histone modifications in a cell type-specific manner Correlate with gene expression in a cell type-specific manner Correlate with enhancer elements in a cell type-specific manner Contain cell type-specific motifs • Motif clusters can predict DNase HS sites genome-wide

  46. Motif FindingMany Slides by Bill Noble @ UW

  47. Outline • What is a sequence motif? • Weight matrix representation • Motif search • Motif discovery • Expectation-maximization • Gibbs sampling • Patterns-with-mismatches representation

  48. What is a “Motif”? • Generally, a recurring pattern, e.g. • Sequence motif • Structure motif • Network motif • More specifically, a set of similar substrings, within a family of diverged sequences. • Protein sequence motifs • DNA sequence motifs

  49. Example motif

  50. Motif in Logos Format

More Related