1 / 56

RNAs in the human genome

RNAs in the human genome. Sam Griffiths-Jones The Wellcome Trust Sanger Institute. Outline. I. Non-coding RNA The genome’s dark matter Family classification Genome annotation II. ncRNA genes in the human genome Rogue’s gallery miRNAs Regulatory elements.

omana
Download Presentation

RNAs in the human genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNAs in the human genome Sam Griffiths-Jones The Wellcome Trust Sanger Institute

  2. Outline • I. Non-coding RNA • The genome’s dark matter • Family classification • Genome annotation • II. ncRNA genes in the human genome • Rogue’s gallery • miRNAs • Regulatory elements

  3. T. thermophilus - Ramakrishnan et al., Cell, 2002

  4. Protein/RNA genes DNA RNA X protein

  5. ncRNA genes • …. code for functional RNAs • Many cellular machines contain RNA • Ribosome rRNA • Spliceosome snRNAs (U1,U2,U4,U5,U6) • Telomerase Telomerase RNA • SRP SRP RNA

  6. How many genes in the human genome?

  7. Gene sweep • CSHL 2000-2003 • Rules • $1 in 2000, $5 in 2001 and $20 in 2002 • A gene is a set of connected transcripts. A transcript is a set of exons connected via transcription. At least one transcript must be expressed outside of the nucleus and one transcript must encode a protein. • One bet per person, per year • Results • 165 bets • Mean 61710 • Lowest 25947 • Highest 153478 • Answer: 21000 Winner: Lee Rowen • http://www.ensembl.org/Genesweep/

  8. ncRNA genes • Genomic dark matter • Ignored by gene prediction methods • Not in EnsEMBL • Computational complexity • ~10% of human gene count?

  9. The RNA World • Origin of life / central dogma paradox • DNA needs proteins to replicate • Proteins coded for by DNA • RNA can be code and machinery • Selex, aptamers • RNAs are remnants • Ancient • Essential

  10. Biological sequence analysis Protein easy RNA hard

  11. ? ? Gene finding • Rules • ATG • TAA, TGA, TAG • GT…..AG • Compositional features • Exon lengths • Intron lengths • Codon bias • General genomic properties • Homology

  12. Protein sequence analysis Query: 1 MKFYTIKLPKFLGGIVRAMLGSFRKD 26 M+ TIKLPKFL IVR G+ + D Sbjct: 390 MRIMTIKLPKFLAKIVRMFKGNKKSD 467

  13. RNA sequence analysis

  14. RNA sequence analysis

  15. S. cerevisiae UCCUCGUGAGAGGG P. canadensis GUCUC.UGAGAGAU P. strasburgensis CUCUC.UGAGAGAG K. thermotolerans UUCUCGUGAGAGAA SS <<<<<....>>>>> Why are families useful? • Alignments of related sequences • Phylogenetic trees • Homologue detection • Genome annotation • Secondary structure prediction

  16. RNA models • Covariance models (profile-SCFGs) • Analogue to profile-HMMs • Statistical representation of the alignment with structure • Homologue detection • Multiple sequence alignment • (Sean Eddy)

  17. D D D D E B M M M M I I I Protein sequence analysis - HMMs ERELKKQKKLSNR ERELKK..KQSNR ERELKRQRKQSNR KAAAQRQKMIKNR EREKKKRKQSNR

  18. MP MP MP ML ML ML G A A A – U G – C G – C RNA sequence analysis - SCFGs G G A A G A U C C < < < . . . > > >

  19. RNA models - problems • Problems • Speed • Memory • Sensitivity • Speed • 30 billion bases in DBs • O(N3) wrt model length • small model 300 b/s • 28S rRNA 200 b/day

  20. Sanger supercomputers

  21. Rfam 5.0 • http://www.sanger.ac.uk/Software/Rfam/ • http://rfam.wustl.edu/ • 176 ncRNA families • Structure annotated alignments • Species distributions • Keyword searches • Sequence searches • >235000 regions in EMBL 76

  22. What we don’t: 18S, 23S rRNAs Other large things (Xist etc) Lots of snoRNAs Lots of miRNAs Many small families Unknowns ncRNA families What we have: • tRNA • 5S, 5.8S rRNAs • Spliceosomal RNAs • SRP, RNaseP • Telomerase, tmRNA, vault • E. coli screens • Some snoRNAs • Some miRNAs • Some UTR elements • Self-splicing introns • …… more

  23. Genome annotation • General One tool fits all Compute drain Automatic Eukaryotic complications Comprehensive Great for prokaryotes • Specific Heuristics One family, one gene finder Increased speed Increased sensitivity tRNAscan-SE, BRUCE, SRPscan, snoscan

  24. Outline • I. Non-coding RNA • The genome’s dark matter • Family classification • Genome annotation • II. ncRNA genes in the human genome • Rogue’s gallery • miRNAs • Regulatory elements

  25. Outline • I. Non-coding RNA • The genome’s dark matter • Family classification • Genome annotation • II. ncRNA genes in the human genome • Rogue’s gallery • miRNAs • Regulatory elements

  26. International Human Genome Sequencing Consortium, Nature, 2001

  27. X Dosage compensation X chromosome inactivation in mammals X Y X X

  28. Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67 Xist – X inactive-specific transcript

  29. International Human Genome Sequencing Consortium, Nature, 2001

  30. microRNAs • A novel class of ncRNA gene • Products are ~22 nt RNAs • Precursors are 70-100 nt hairpins • Gene regulation by pairing to mRNA • Unknown before 2001

  31. Timeline • Late 70’s – lin-4 and let-7 regulate developmental timing in worm • 1993 – lin-4 codes for a ~22 nt RNA, complementary to 3’ UTR of lin-14 • 2000 – …. so does let-7 (stRNAs) • 2000 – let-7 is conserved in bilaterally symmetric animals • 2001 – ~100 miRNAs discovered by cloning in worm, fly and human • 2002 – miRNAs conserved in plants • 2002 – Science magazine’s breakthrough of the year • 2002 – miRNA Registry established • 2003 – miRNAs may account for 1% of total gene count in animals • 2003 – a few targets of miRNAs identified • 2004 – miRNA Registry has 719 miRNAs

  32. “miRNA” in PubMed

  33. miRNA biogenesis Adapted from DP Bartel, Cell 116:281-297(2004)

  34. miRNAs targets DP Bartel, Cell 2004 116:281-287

  35. PNAS 99:15524-15529(2002)

  36. miRNA Registry 3.0 • Searchable database of published miRNAs • http://www.sanger.ac.uk/Software/Rfam/mirna/ • 719 entries from human, mouse, rat, worm, fly, and plants • Naming service • Pre-publication • Unique names for distinct miRNAs • Confidentiality for unpublished data

  37. Genomic context 180 known miRNAs in human 130 intergenic 50 intronic 60 polycistronic 70 monocistronic

  38. AAAAAAA ncRNA gene contexts tRNA, snRNAs,SRP, RNase P ….. Xist miRNAs miRNAs, snoRNAs

  39. Inside-out genes protein

  40. Inside-out genes degradation snoRNA Gas5, UHG, U17HG,U19H

  41. PrfA Cis-regulatory RNA elements PrfA in Listeria 25oC 37oC Virulence gene expression

  42. UTR elements in human • IRE regulation of iron metabolism • SECIS UGA -> SeC • Histone 3’ UTR 3’ end formation • Vimentin 3’ UTR mRNA localisation • CAESAR CTGF repression • …. many more

  43. ncRNAs in human genome • SRP RNA 1 • RNase P RNA 1 • Telomerase RNA 1 • RNase MRP 1 • Y RNA 5 • Vault 4 • 7SK RNA 1 • Xist 1 • H19 1 • BIC 1 • Antisense RNAs 1000s? • Cis reg regions 100s? • Others ? • tRNA 600 • 18S rRNA 200 • 5.8S rRNA 200 • 28S rRNA 200 • 5S rRNA 200 • snoRNA 300 • miRNA 250 • U1 40 • U2 30 • U4 30 • U5 30 • U6 20 • U4atac 5 • U6atac 5 • U11 5 • U12 5

More Related