570 likes | 756 Views
RNAs in the human genome. Sam Griffiths-Jones The Wellcome Trust Sanger Institute. Outline. I. Non-coding RNA The genome’s dark matter Family classification Genome annotation II. ncRNA genes in the human genome Rogue’s gallery miRNAs Regulatory elements.
E N D
RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute
Outline • I. Non-coding RNA • The genome’s dark matter • Family classification • Genome annotation • II. ncRNA genes in the human genome • Rogue’s gallery • miRNAs • Regulatory elements
Protein/RNA genes DNA RNA X protein
ncRNA genes • …. code for functional RNAs • Many cellular machines contain RNA • Ribosome rRNA • Spliceosome snRNAs (U1,U2,U4,U5,U6) • Telomerase Telomerase RNA • SRP SRP RNA
Gene sweep • CSHL 2000-2003 • Rules • $1 in 2000, $5 in 2001 and $20 in 2002 • A gene is a set of connected transcripts. A transcript is a set of exons connected via transcription. At least one transcript must be expressed outside of the nucleus and one transcript must encode a protein. • One bet per person, per year • Results • 165 bets • Mean 61710 • Lowest 25947 • Highest 153478 • Answer: 21000 Winner: Lee Rowen • http://www.ensembl.org/Genesweep/
ncRNA genes • Genomic dark matter • Ignored by gene prediction methods • Not in EnsEMBL • Computational complexity • ~10% of human gene count?
The RNA World • Origin of life / central dogma paradox • DNA needs proteins to replicate • Proteins coded for by DNA • RNA can be code and machinery • Selex, aptamers • RNAs are remnants • Ancient • Essential
Biological sequence analysis Protein easy RNA hard
? ? Gene finding • Rules • ATG • TAA, TGA, TAG • GT…..AG • Compositional features • Exon lengths • Intron lengths • Codon bias • General genomic properties • Homology
Protein sequence analysis Query: 1 MKFYTIKLPKFLGGIVRAMLGSFRKD 26 M+ TIKLPKFL IVR G+ + D Sbjct: 390 MRIMTIKLPKFLAKIVRMFKGNKKSD 467
S. cerevisiae UCCUCGUGAGAGGG P. canadensis GUCUC.UGAGAGAU P. strasburgensis CUCUC.UGAGAGAG K. thermotolerans UUCUCGUGAGAGAA SS <<<<<....>>>>> Why are families useful? • Alignments of related sequences • Phylogenetic trees • Homologue detection • Genome annotation • Secondary structure prediction
RNA models • Covariance models (profile-SCFGs) • Analogue to profile-HMMs • Statistical representation of the alignment with structure • Homologue detection • Multiple sequence alignment • (Sean Eddy)
D D D D E B M M M M I I I Protein sequence analysis - HMMs ERELKKQKKLSNR ERELKK..KQSNR ERELKRQRKQSNR KAAAQRQKMIKNR EREKKKRKQSNR
MP MP MP ML ML ML G A A A – U G – C G – C RNA sequence analysis - SCFGs G G A A G A U C C < < < . . . > > >
RNA models - problems • Problems • Speed • Memory • Sensitivity • Speed • 30 billion bases in DBs • O(N3) wrt model length • small model 300 b/s • 28S rRNA 200 b/day
Rfam 5.0 • http://www.sanger.ac.uk/Software/Rfam/ • http://rfam.wustl.edu/ • 176 ncRNA families • Structure annotated alignments • Species distributions • Keyword searches • Sequence searches • >235000 regions in EMBL 76
What we don’t: 18S, 23S rRNAs Other large things (Xist etc) Lots of snoRNAs Lots of miRNAs Many small families Unknowns ncRNA families What we have: • tRNA • 5S, 5.8S rRNAs • Spliceosomal RNAs • SRP, RNaseP • Telomerase, tmRNA, vault • E. coli screens • Some snoRNAs • Some miRNAs • Some UTR elements • Self-splicing introns • …… more
Genome annotation • General One tool fits all Compute drain Automatic Eukaryotic complications Comprehensive Great for prokaryotes • Specific Heuristics One family, one gene finder Increased speed Increased sensitivity tRNAscan-SE, BRUCE, SRPscan, snoscan
Outline • I. Non-coding RNA • The genome’s dark matter • Family classification • Genome annotation • II. ncRNA genes in the human genome • Rogue’s gallery • miRNAs • Regulatory elements
Outline • I. Non-coding RNA • The genome’s dark matter • Family classification • Genome annotation • II. ncRNA genes in the human genome • Rogue’s gallery • miRNAs • Regulatory elements
International Human Genome Sequencing Consortium, Nature, 2001
X Dosage compensation X chromosome inactivation in mammals X Y X X
Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67 Xist – X inactive-specific transcript
International Human Genome Sequencing Consortium, Nature, 2001
microRNAs • A novel class of ncRNA gene • Products are ~22 nt RNAs • Precursors are 70-100 nt hairpins • Gene regulation by pairing to mRNA • Unknown before 2001
Timeline • Late 70’s – lin-4 and let-7 regulate developmental timing in worm • 1993 – lin-4 codes for a ~22 nt RNA, complementary to 3’ UTR of lin-14 • 2000 – …. so does let-7 (stRNAs) • 2000 – let-7 is conserved in bilaterally symmetric animals • 2001 – ~100 miRNAs discovered by cloning in worm, fly and human • 2002 – miRNAs conserved in plants • 2002 – Science magazine’s breakthrough of the year • 2002 – miRNA Registry established • 2003 – miRNAs may account for 1% of total gene count in animals • 2003 – a few targets of miRNAs identified • 2004 – miRNA Registry has 719 miRNAs
miRNA biogenesis Adapted from DP Bartel, Cell 116:281-297(2004)
miRNAs targets DP Bartel, Cell 2004 116:281-287
miRNA Registry 3.0 • Searchable database of published miRNAs • http://www.sanger.ac.uk/Software/Rfam/mirna/ • 719 entries from human, mouse, rat, worm, fly, and plants • Naming service • Pre-publication • Unique names for distinct miRNAs • Confidentiality for unpublished data
Genomic context 180 known miRNAs in human 130 intergenic 50 intronic 60 polycistronic 70 monocistronic
AAAAAAA ncRNA gene contexts tRNA, snRNAs,SRP, RNase P ….. Xist miRNAs miRNAs, snoRNAs
Inside-out genes protein
Inside-out genes degradation snoRNA Gas5, UHG, U17HG,U19H
PrfA Cis-regulatory RNA elements PrfA in Listeria 25oC 37oC Virulence gene expression
UTR elements in human • IRE regulation of iron metabolism • SECIS UGA -> SeC • Histone 3’ UTR 3’ end formation • Vimentin 3’ UTR mRNA localisation • CAESAR CTGF repression • …. many more
ncRNAs in human genome • SRP RNA 1 • RNase P RNA 1 • Telomerase RNA 1 • RNase MRP 1 • Y RNA 5 • Vault 4 • 7SK RNA 1 • Xist 1 • H19 1 • BIC 1 • Antisense RNAs 1000s? • Cis reg regions 100s? • Others ? • tRNA 600 • 18S rRNA 200 • 5.8S rRNA 200 • 28S rRNA 200 • 5S rRNA 200 • snoRNA 300 • miRNA 250 • U1 40 • U2 30 • U4 30 • U5 30 • U6 20 • U4atac 5 • U6atac 5 • U11 5 • U12 5