260 likes | 392 Views
University of Brawijaya 4 th December 2013. Austen Ganley INMS. Understanding the Human Genome: Lessons from the ENCODE project. Glossary. Non-coding RNA Sequencing Microarray Transcription start site Active/open Inactive/repression. Genome Genes DNA/RNA Protein Cell
E N D
University of Brawijaya 4th December 2013 Austen Ganley INMS Understanding the Human Genome: Lessons from the ENCODE project
Glossary • Non-coding RNA • Sequencing • Microarray • Transcription start site • Active/open • Inactive/repression • Genome • Genes • DNA/RNA • Protein • Cell • Transcription • Chromatin • Histones • Nucleosomes
transcriptional terminator transcriptional start site intron promoter exon
Introduction • Individual scientists worked together • Aim was to understand 1% of the human genome (2007), and 100% (2012) • Looked at: • Transcription • Chromatin/transcription factors • Replication • Evolution
Genes • Now estimated to be about 21,000 protein-coding genes (taking about 3% of the whole genome) • In addition, there are about 9,000 microRNAs, and about 10,000 long non-coding RNAs
Transcription • Transcription was measured by two different methods: • Whole genome microarrays • RNA-sequencing
Transcription • Transcription was measured by two different methods: • Whole genome microarrays • RNA-sequencing • They found at least 62% of the whole genome is transcribed (remember, genes only account for about 3% of the whole genome)
Transcriptional start sites • Goal is to identify the transcription start sites • Not easy to do! • Use a technique called CAGE (Cap Analysis Gene Expression)
CAGE • Makes use of the 5’ CAP on mRNA • First, mRNA is reverse-transcribed, to form cDNA (RNA-DNA hybrid) • Then, biotin is attached to the 5’ CAP, and the cDNA is fragmented • The biotin fragments are isolated (representing the 5’ end of mRNA), and these fragments are sequenced
About 60,000 transcription start sites found • Only half of these match known genes • What do the other ones do? May explain high level of transcription • The transcription start sites are often far upstream of the gene start, and can overlap genes
Transcriptional start sites from the DONSON gene Overlapping Genes • An overlapping gene, starting far upstream • The DONSON gene is a known gene • However, some transcripts start in the ATP50 gene, and include some ATP50 exons • Two genes are skipped out
Chromatin: histones and nucleosomes • Nucleosomes are formed from DNA that is packaged around histones • Histones are a set of proteins that usually associate as an octamer www.mun.ca/biochem/courses/3107/Topics/supercoiling.html www.palaeos.com/Eukarya/Eukarya.Origins.5.html
Dnase I hypersensitive sites (DHS) • DNase I preferentially digests nucleosome-depleted regions (DNaseI hypersensitive sites) • These are associated with gene transcription • Chromatin is digested with DNase I: only digests nucleosome-free regions • The remaining DNA is isolated, and put on a microarray or sequenced • Find the open, active regions of the genome Hebbes Lab, University of Portsmouth, UK Gilbert, Developmental Biology, Sinauer
DNase I hypersensitive sites • In total, about 3 million DNase I hypersensitive sites in the genome, covering about 15% (versus about 40,000 genes covering about 4%) • Transcriptional start sites are regions of DNase I hypersensitivity, as expected • Most DNase I hypersensitive sites are not associated with transcriptional start site, though
Genome Transcribed region Transcription start sites DNase I hypersensitive region Genes
Histone Modification Effects • Modifications occur on the histone tails • They alter the strength of DNA-histone binding, and influence the binding of other proteins to the DNA • Thus they can activate or silence gene expression
The “Histone Code” • The combination of histone modifications determine a gene’s transcriptional status – histone code • Some modifications are associated with active gene expression • H3K4me2 • H3K4me3 • H3ac • H4ac • Some with repression • H3K27me3 • H3K4me1 www.nature.com/nrm/index.html
ChIP (Chromatin immunoprecipitation) • Method to find where your protein of interest is binding to • You cross-link the sample, and fragment the DNA into pieces • Immunoprecipitate using an antibody to your protein of interest • Reverse the cross-links, and isolate the DNA • To find where in the genome the protein was bound: • Hybridise the DNA to a microarray (ChIP-chip) OR sequence it (ChIP-seq) www.rndsystems.com/product_detail_objectname_exactachip_assayprinciple.aspx
Histone modification profiles • They found that histone modifications associated with active transcription were found around transcription start sites • They found that histone modifications associated with gene repression were depleted around transcription start sites • This is as expected • Around DNase I hypersensitive sites not near transcription start sites, they found almost the opposite pattern
Enrichment of active histone marks and depletion of inactive histone marks at a transcription start site Enrichment of inactive histone marks but little enrichment of active histone marks at a DNase I hypersensitive site
Histone modification profiles • They also found other patterns • Combining all the results (plus results for transcription factor binding), they say that the human genome is divided into seven different types of chromatin states • Which state it is depends on what combination of histone modifications/transcription factor binding there is
The seven chromatin states Enhancer (yellow) Gene body (green) Inactive region (grey) Promoter (red)
ENCODE Grand Summary Transcription start sites: • Twice as many transcription start sites as traditional “genes” • transcripts span large regions, even between genes DNase I hypersensitive sites: • more than just at transcription start sites • two types: those found both at TSS, and those found at other regions • these have different chromatin profiles Transcription: • a lot of non-coding transcription (~60% of the genome transcribed) – much more than needed just to transcribe all the genes Overview: • genome can be generalised into seven different states • the function of some of these states is known – e.g. promoter • the function of others is not known, but may explain the high level of transcription and open chromatin structure Histone modifications: • active marks correlate with TSS/DHS • distal DHS have a different histone modification profile Chromatin states: • The genome can be divided into seven different types • these are determined by the combination of histone modifications and transcription factor binding that occur