180 likes | 306 Views
Monica C. Sleumer. Human Genome. 3,101,804,739 base pairs 22 chromosomes plus X and Y 21,224 protein-coding genes 15,952 ncRNA genes 3–8% of bases are under selection From comparative genomic studies Question: What is the genome doing ?. Objectives. Find all functional elements
E N D
Human Genome • 3,101,804,739 base pairs • 22 chromosomes plus X and Y • 21,224 protein-coding genes • 15,952 ncRNA genes • 3–8% of bases are under selection • From comparative genomic studies • Question: What is the genome doing?
Objectives • Find all functional elements • Bound by specific proteins • Transcribed • Histone modifications • DNA methylation • Use this information to annotate functional regions • Genes (coding and non-coding) • Promoters • Enhancers • Specific transcription factor binding sites • Silencers • Insulators • Chromatin states • Cross-reference data from other studies • Comparative genomics • 1000 Genomes Project • Genome-wide association studies (GWAS)
ENCODE projects • ENCODE pilot project: 1% of the genome 2003-2007 • modENCODE: Drosophila and C. elegans • ENCODE main project 2007-2012 • 1649 dataset-generating experiments • 147 cell types • 235 antibodies and assay protocols • 450 authors • 32 institutes • 31 publications 2012-09-06 • 6 in Nature • 18 in Genome Research • 6 in Genome Biology • 1 in BMC Genetics www.nature.com/encode/category/research-papers
Materials • 147 types of human cell lines, 3 priority levels • Tier 1 cell lines: top priority for all experiments • Tier 2 cell lines to be done after Tier 1 (next slide) • Tier 3: any other cell lines
Tier 2 Cell Lines http://encodeproject.org/ENCODE/cellTypes.html
Results: RNA Sequencing • 62% of the genome is transcribed into sequences >200 bp long • 5.5% of this is exon • 31% is intergenic – no annotated gene • Remaining: intronic • CAGE-seq: 62,403 TSS • 44% within 100bp of the 5’ end of a GENCODE gene • Others: exons and 3’ UTRs, significance unknown • Lots of short ncRNAs: tRNA, miRNA, snRNA etc. • Further description: Wu Dingming, 9:30
Results: Transcribed and protein-coding regions • GENCODE reference gene set • 20,687 Protein-coding • 6.3 alternatively spliced transcripts on average • 3.9 protein isoforms on average • Protein-coding exons: 1.22% of the genome • Still more to come: unidentified peptides in mass-spec • 18,441 ncRNA genes • 8801 short ncRNA • 9640 long nc RNA • 11,224 pseudogenes • 863 transcribed
ChIP-Seq www.illumina.com/technology/chip_seq_assay.ilmn
Results: ChIP-Seq • 636,336 binding regions • 8.1% of the genome • Sequence-specific TF ChIP-seq: • 86% of the DNA segments occupied by sequence-specific transcription factors contained a strong DNA-binding motif • 55% cases contained the expected motif • Further description: Qin Zhiyi & Ma Xiaopeng, 13:30
DNase I hypersensitivity • 2,890,000 unique hypersensitive sites (DHSs) • 4,800,000 sites across 25 cell types • Tier 1 and tier 2 cell types: 205,109 DHSs per cell type • 98.5% of ChIP-seq TFBS within DHSs • Further description: Guo Weilong 12:30, He Chao 14:30 https://www.nationaldiagnostics.com/electrophoresis/article/dnase-i-footprinting
FAIRE-seq • Like the opposite of ChIP-seq • Cross-link the nucleosomes to the DNA • But not the sequence-specific TFs • Shear the DNA into small pieces • Remove the protein-bound DNA • Sequence the non-bound DNA Gaulton KJ et al, Nature Genetics 42, 255–259 (2010) doi:10.1038/ng.530
DNA methylation • CpG methylation: regulates gene expression • In promoters: gene repression • In genes: gene transcription • 1,200,000 methylated CpGs in 82 cell lines and tissues • 96% differentially methylated, especially those in genes • Unmethylated genic CpG islands associated with P300 binding , an enhancer-related histone acetyltransferase • Allele-specific methylation: genomic imprinting • Aberrant methylation in cancer cell lines • Reproducible methylation outside CpGdinucleotides http://www.diagenode.com/en/applications/bisulfite-conversion.php
Chromosome conformation capture Montavonand Duboule, Trends in Cell Biology (2012) 22:7, 347–354
Results: Chromosome interactions • Chromosome conformation capture (3C) : • 5C: 3C-carbon copy • ChIA-PET • Identified 127,417 promoter-centred chromatin interactions using ChIA-PET • 98% intra-chromosomal • 2,324 promoters involved in ‘single-gene’ enhancer–promoter interactions • 19,813 promoters were involved in ‘multi-gene’ interaction complexes spanning up to several megabases • 50–60% of long-range interactions occurred in only one of the four cell lines • Further discussion: Li Yanjian, 10:40
Primary Findings • 80.4% of the human genome is doing at least one of the following: • Bound by a transcription factor • Transcribed • Modified histone • 99% is within 1.7 kb of at least one of the biochemical events • 95% within 8 kb of a DNA–protein interaction or DNase I footprint • 7 chromatin states: • 399,124 enhancer-like regions • 70,292 promoter-like regions • Correlation between transcription, chromatin marks, and TF binding • Functional regions contain lots of SNPs • Disease-associated SNPs in non-coding regions tend to be in functional elements