310 likes | 759 Views
Functional Non-Coding DNA Part I Non-coding genes and non-coding elements of coding genes. BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG. What D oes ‘Functional N on-Coding DNA’ Mean?. DNA whose sequence affects transcripts made from DNA in some way
E N D
Functional Non-Coding DNAPart INon-coding genes and non-coding elements of coding genes BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
What Does ‘Functional Non-Coding DNA’ Mean? • DNA whose sequence affects transcripts made from DNA in some way • Could affect transcription levels, splicing or sequestering of RNA • Three main ways to identify functional non-coding elements • Sequence characteristics – favored bases • Genomic conservation • Epigenetic marks and open chromatin • especially outside of genes
Types of Non-Coding Elements • Non-coding RNAs • miRNAs, lncRNAs, etc • Non-coding gene elements • UTRs, splice sites, poly-adenylation sites, splice sites and regulating element, RNA-binding sites • DNA elements outside genes – our main focus • Promoters • Enhancers/Silencers • Insulators
Types of Non-Coding RNA • microRNAs • Silencing RNAs • Small nuclear/nucleolar RNAs • Piwi-Interacting RNAs • Long Non-Coding RNAs • Circular RNAs • Still other RNAs??? • Comprehensive data base at www.ncrna.org
Micro-RNAs • Micro-RNAs are small non-coding RNA molecules, about 21–25 nucleotides in length • They are processed from much longer genes, or from introns within mRNA, by several molecular pathways • Micro-RNAs base-pair with complementary sequences within mRNA molecules, often in 3’ or 5’ UTR. • miRNA binding usually results in gene repression either via translational stalling or by triggering mRNA degradation Image by Charles Mallery, U of Miami
Micro-RNAs • The human genome encodes over 1500 miRNAs, which are believed to affect more than half of human genes • miRNAs are abundant in many cell types • Thousands of copies per cell of some miRNAs • Those within gene introns share regulation • miRNAs are well-conserved across vertebrates • No orthologs between plant and animal miRNAs • miRBase is the comprehensive repository of micro-RNAs
Other Short RNAs: siRNA • Small interfering RNAs are double-stranded with an overhang • They are processed by some of the same machinery as miRNAsand have some of the same effects
Other Short RNAs: piRNA • Piwi-Interacting RNAs are longer 26-31 base single-stranded RNAs • PIWI (P-element Induces Wimpy Testis) protein • Over 50,000 sequences known in mouse • They are the largest class of nc-RNA • They seem to play an ancient role in defenseagainst retro-viruses and transposons
Other Short RNAs: snRNAs & snoRNAs • Small nuclear RNAs (snRNAs) are typically ~ 150 bases long, and associate with protein • Many conserved copies of each snRNA gene • U1-U6 snRNAs key parts of splicing machinery • Small nucleolar RNAs (snoRNAs) • Guide chemical modifications of other RNAs • Prader-Willi syndrome results from deletion of region containing 29 copies of SNORD116 on chr 15q11 U6 snRNA
Long Non-Coding RNAs • Many long (>200bp) stretches of genome are transcribed and have epigenetic marks like those of protein-coding genes • Most of these are spliced RNAs with two (or more) exons • GENCODE v15 has 13.5K lncRNA • See also • Derrien et al, Genome Research 2012 • Lee, Science 2012 From Derrien et al Genome Res 2012
Many lncRNAs Induce Silencing • Coat nearby gene(s) and silence them • Xist binds to gene clusters first • Xist binds disparate parts of chromosome • Many lncRNA are antisense to genes • Some lncRNAs maintain pluripotency of stem cells From Jeannie Lee lab (Harvard) website
Long Non-Coding RNAs - 2 • Most lncRNAs are expressed in only a few tissues • Most human lncRNAs are specific to the primate lineage From Derrien et al Genome Res 2012
Circular RNAs • Several thousand non-coding RNAs apparently form circular structures • Many form complexes with AGO and seem to absorb attached miRNAs, blocking processing • CDR1 has 70 conserved binding sites for mir7
Functional Pseudo-Genes • Pseudo-genes are copies of genes that are decaying and rarely (never) make proteins • Some pseudo-genes act to absorb negative regulators of the original gene – eg. SRGAP2B
How to Identify Non-Coding RNAs? • Short (and long) RNA transcriptomes • Promoter chromatin marks for independent (non-embedded) miRNAs and lncRNAs
Non-Coding Elements of Genes • TSS • 5' UTRs • Introns • Splicing regulation sites • 3' UTRs • Termination/Poly-adenylation sites
Transcription Start Sites • Transcription of most genes may initiate at several distinct clusters of locations with distinct promoters for each TSS • Two major types of metazoan TSS: CG-rich broad TSS, and narrow (often tissue-specific) TSS
Transcription Start Sites Transcription often starts at CG within promoter
5’ Untranslated Regions • First exon often contains dozens to thousands of bases before Start codon (median 150) • Sometimes contains regulatory sequences, e.g. binding sites for RNA binding proteins, and translation initiators
Splice Regulatory Sites • Splicing is achieved through binding of spliceosome to recognition sequences on nascent RNA molecule
Splice Regulatory Sites • Tissue-specific splice regulatory sites are highly conserved From Merkin et al Science 2012
Splicing Patterns Evolve in All Tissues Except Brain From Merkin et al Science 2012
Non-Coding Elements in Coding Exons • Many regulatory sites occur within coding exons, esp. toward 5’ end • These constrain some codons as much as protein sequence • Many human SNPs break TFBS but have little effect on protein (AFAWK) From Stergachis et al Science 2013
3’ Untranslated Regions • Longest exon is usually 3’UTR (>1000 nt) • Typically 1/3 – 1/2 of a gene is in 5’ & 3’ UTRs • 3’UTR has binding sites for miRNAs and RNA binding proteins • AU-rich elements (AREs) stabilize mRNA • Proteins recognize complex secondary structure GRIK4 3’UTR secondary structure is conserved
RNA Binding-Protein Sites • mRNAs are usually further processed (e.g. transported or sequestered) • RNA binding proteins recognize specific motifs within secondary structure of 3’ or 5’ UTR • These sites are often highly conserved From Ray et al Nature 2013
Poly-adenylation/Termination Sites • Transcripts can be terminated and poly-adenylated at sites with specific sequences • Most genes have alternate poly-adenylation sites • Median lengths of 3’UTR are 250 & 1773 bp(mouse)
Poly-adenylation/Termination Sites • Rapidly proliferating cells express gene isoforms with short 3’ UTRs • Neurons typically have longer 3’ UTRs Types of alternate poly-adenylation Elkon et al, NRG 2013