1 / 18

Monica C. Sleumer

Monica C. Sleumer. Human Genome. 3,101,804,739 base pairs 22 chromosomes plus X and Y 21,224 protein-coding genes 15,952 ncRNA genes 3–8% of bases are under selection From comparative genomic studies Question: What is the genome doing ?. Objectives. Find all functional elements

wiley
Download Presentation

Monica C. Sleumer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monica C. Sleumer

  2. Human Genome • 3,101,804,739 base pairs • 22 chromosomes plus X and Y • 21,224 protein-coding genes • 15,952 ncRNA genes • 3–8% of bases are under selection • From comparative genomic studies • Question: What is the genome doing?

  3. Objectives • Find all functional elements • Bound by specific proteins • Transcribed • Histone modifications • DNA methylation • Use this information to annotate functional regions • Genes (coding and non-coding) • Promoters • Enhancers • Specific transcription factor binding sites • Silencers • Insulators • Chromatin states • Cross-reference data from other studies • Comparative genomics • 1000 Genomes Project • Genome-wide association studies (GWAS)

  4. ENCODE projects • ENCODE pilot project: 1% of the genome 2003-2007 • modENCODE: Drosophila and C. elegans • ENCODE main project 2007-2012 • 1649 dataset-generating experiments • 147 cell types • 235 antibodies and assay protocols • 450 authors • 32 institutes • 31 publications 2012-09-06 • 6 in Nature • 18 in Genome Research • 6 in Genome Biology • 1 in BMC Genetics www.nature.com/encode/category/research-papers

  5. Materials • 147 types of human cell lines, 3 priority levels • Tier 1 cell lines: top priority for all experiments • Tier 2 cell lines to be done after Tier 1 (next slide) • Tier 3: any other cell lines

  6. Tier 2 Cell Lines http://encodeproject.org/ENCODE/cellTypes.html

  7. Methods

  8. Results: RNA Sequencing • 62% of the genome is transcribed into sequences >200 bp long • 5.5% of this is exon • 31% is intergenic – no annotated gene • Remaining: intronic • CAGE-seq: 62,403 TSS • 44% within 100bp of the 5’ end of a GENCODE gene • Others: exons and 3’ UTRs, significance unknown • Lots of short ncRNAs: tRNA, miRNA, snRNA etc. • Further description: Wu Dingming, 9:30

  9. Results: Transcribed and protein-coding regions • GENCODE reference gene set • 20,687 Protein-coding • 6.3 alternatively spliced transcripts on average • 3.9 protein isoforms on average • Protein-coding exons: 1.22% of the genome • Still more to come: unidentified peptides in mass-spec • 18,441 ncRNA genes • 8801 short ncRNA • 9640 long nc RNA • 11,224 pseudogenes • 863 transcribed

  10. ChIP-Seq www.illumina.com/technology/chip_seq_assay.ilmn

  11. ChIP-Seq: Histone modifications

  12. Results: ChIP-Seq • 636,336 binding regions • 8.1% of the genome • Sequence-specific TF ChIP-seq: • 86% of the DNA segments occupied by sequence-specific transcription factors contained a strong DNA-binding motif • 55% cases contained the expected motif • Further description: Qin Zhiyi & Ma Xiaopeng, 13:30

  13. DNase I hypersensitivity • 2,890,000 unique hypersensitive sites (DHSs) • 4,800,000 sites across 25 cell types • Tier 1 and tier 2 cell types: 205,109 DHSs per cell type • 98.5% of ChIP-seq TFBS within DHSs • Further description: Guo Weilong 12:30, He Chao 14:30 https://www.nationaldiagnostics.com/electrophoresis/article/dnase-i-footprinting

  14. FAIRE-seq • Like the opposite of ChIP-seq • Cross-link the nucleosomes to the DNA • But not the sequence-specific TFs • Shear the DNA into small pieces • Remove the protein-bound DNA • Sequence the non-bound DNA Gaulton KJ et al, Nature Genetics 42, 255–259 (2010) doi:10.1038/ng.530

  15. DNA methylation • CpG methylation: regulates gene expression • In promoters: gene repression • In genes: gene transcription • 1,200,000 methylated CpGs in 82 cell lines and tissues • 96% differentially methylated, especially those in genes • Unmethylated genic CpG islands associated with P300 binding , an enhancer-related histone acetyltransferase • Allele-specific methylation: genomic imprinting • Aberrant methylation in cancer cell lines • Reproducible methylation outside CpGdinucleotides http://www.diagenode.com/en/applications/bisulfite-conversion.php

  16. Chromosome conformation capture Montavonand Duboule, Trends in Cell Biology (2012) 22:7, 347–354

  17. Results: Chromosome interactions • Chromosome conformation capture (3C) : • 5C: 3C-carbon copy • ChIA-PET • Identified 127,417 promoter-centred chromatin interactions using ChIA-PET • 98% intra-chromosomal • 2,324 promoters involved in ‘single-gene’ enhancer–promoter interactions • 19,813 promoters were involved in ‘multi-gene’ interaction complexes spanning up to several megabases • 50–60% of long-range interactions occurred in only one of the four cell lines • Further discussion: Li Yanjian, 10:40

  18. Primary Findings • 80.4% of the human genome is doing at least one of the following: • Bound by a transcription factor • Transcribed • Modified histone • 99% is within 1.7 kb of at least one of the biochemical events • 95% within 8 kb of a DNA–protein interaction or DNase I footprint • 7 chromatin states: • 399,124 enhancer-like regions • 70,292 promoter-like regions • Correlation between transcription, chromatin marks, and TF binding • Functional regions contain lots of SNPs • Disease-associated SNPs in non-coding regions tend to be in functional elements

More Related