1 / 47

Interrogating the transcriptome in all its diversity

Interrogating the transcriptome in all its diversity. Joel H Graber. Why were so many predictions of the number of genes in a mammalian genome wrong?. Nature Genetics , June 2000, v25 , n2. Mammalian genomes contain far more transcript variants than protein variants.

fayola
Download Presentation

Interrogating the transcriptome in all its diversity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interrogating the transcriptome in all its diversity Joel H Graber

  2. Why were so many predictions of the number of genes in a mammalian genome wrong? • Nature Genetics, June 2000, v25, n2.

  3. Mammalian genomes contain far more transcript variants than protein variants • Average protein products per locus = 1.7 • Average distinct transcripts per locus = 5.7 Genome Biology(2009) 10:201.

  4. A processed, protein coding mRNA molecule includes distinct functional regions Genomic sequence Protein coding sequence 5’-untranslated region (5’-UTR) 3’-untranslated Region (3’-UTR)

  5. ~ 1-100 Mbp ~ 1-1000 kbp 5’ 3’ 3’ 5’ 5’ 3’ … … … … 3’ 5’ promoter (~103 bp) Polyadenylation site (~10-100 bp) enhancers (~10-100 bp) other regulatory sequences (~ 10-100 bp) Pieces of a (Eukaryotic) Protein -Coding Gene(on the genome) exons (cds&utr) / introns (~ 102-103 bp) (~ 102-105 bp)

  6. Alternate mRNA processing can lead to multiple transcript and/or protein products … … 3 transcripts 1 protein product

  7. Translation control mRNA localization DNA = water in pipes Protein = water in pool Transcription control mRNA degradation mRNA = water in hose Protein degradation Carolyn demonstrates gene regulation

  8. A somewhat more formal view of regulation in the various stages of gene expression

  9. Systematic changes to mRNA processing can significantly change the regulatory program of a cell • Changes can be in a single gene or systemic • Regulatory control during transcript generation • Transcription initiation site • Splicing pattern • 3’-processing (polyadenylation and cleavage) site • RNA editing • Subsequent isoform-specific regulatory control • Stability • Translational efficiency • Localization

  10. A brief history of transcript measurement

  11. Implications of transcript variation for gene expression measurement • Most large scale expression studies report one level per gene per sample • Microarrays: • One reported value of expression per probeset; • Duplicate probesets are either averaged or discarded • mRNAseq • RPKM (reads per kilobase of transcript per million reads) • For many genes, summarization to one expression level in a given cell type is inadequate

  12. Every time we find a new way to measure RNA, we find previously unknown types Mattick et al, Trends Genet 2009

  13. Classes of alternative transcripts • Alternative splicing • Alternative transcript initiation sites • Alternative cleavage and polyadenylation (3’-processing) • Combinations of one or more of these

  14. The cascade of alternative mRNA processing in gene regulation mRNA processing selections during mRNA generation can have a profound effect on downstream regulation of the resulting transcript

  15. Processing and specifically alternative processing are controlled by cis-elements and transfactors • mRNA processing signals are typically constrained in both sequence content and positioning • Activity of specific sites is a function of the strength of the local signals and the cell/environment specific concentrations/activities of transfactors

  16. Alternative splicing

  17. Alternative splicing can occur in several ways http://www.wormbook.org/

  18. Splicing signals and interacting factors

  19. Cis elements required for splicing 3‘ss 5‘ss BP Yeast GUAUGU UACUAAC YAG ESE ESE Vertebrates YYYY AG GUAAGU CURAY NCAG GU 10-15 ESE? ESE? Plants AG GUAAGU CURAY UGYAG GU UA-rich UA-rich 62 100 70 49 64 95 100 44 79 99 58 53 42 100 57 5‘ss – 5‘ splice site (donor site) 3‘ss – 3‘ splice site (acceptor site) BP – branch point (A is branch point base) YYYY10-15 – polypyrimidine track Y – pyrimidine R – purine N – any base

  20. PWM representations of splice site signals (mice)

  21. Frequency of bases in each position of the splice sites Donor sequences: 5’ splice site exon intron %A 30 40 64 9 0 0 62 68 9 17 39 24 %U 20 7 13 12 0 100 6 12 5 63 22 26 %C 30 43 12 6 0 0 2 9 2 12 21 29 %G 19 9 12 73 100 0 29 12 84 9 18 20 A GGU A A G U Acceptor sequences: 3’ splice site intron exon %A 15 10 10 15 6 15 11 19 12 3 10 25 4 100 0 22 17 %U 51 44 50 53 60 49 49 45 45 57 58 29 31 0 0 8 37 %C 19 25 31 21 24 30 33 28 36 36 28 22 65 0 0 18 22 %G 15 21 10 10 10 6 7 9 7 7 5 24 1 0 100 52 25 Y Y Y Y Y Y Y Y Y Y Y N Y AGG Polypyrimidine track (Y = U or C; N = any nucleotide)

  22. Example 1: Insulin-like growth factor 1 (Igf1) • AKA somatomedin C or mechano growth factor • Produced primarily by the liver as an endocrine hormone • Primary action is mediated by binding to IGF1R • Natural activator of the AKT pathway • A primary mediator of the effects of growth hormone • Expression has been • Negatively correlated with lifespan • Positively correlated with body size • Its regulatory control remains poorly understand after 30y

  23. IGF1 is subject to extensive alternative mRNA processing ~83,000 nt

  24. IGF1 mRNA data indicates at least 15 or more transcript isoforms

  25. Salient features of IGF1 expression • Mature, circulating IGF1 protein is a cleavage product, coded entirely in exons 3 and 4 • Exon 5 contains an additional peptide cleavage product, with demonstrated independent functionality • Exons 1 and 2 are mutually exclusive, and likely not the only upstream, transcript initiating exons • Exon 5 can be skipped, included or 3’-terminal • Exon 6’s reading frame changes depending on whether it is spliced from exon 4 or 5

  26. IGF1 has two possible terminal exons (5 and 6) ~22,000 nt

  27. IGF1 Exon 6, if included can vary between ~200 and ~6400 nt

  28. Alternative polyadenylation

  29. Alternative 3’-processing can arise in several ways with varying consequences Adapted from Yan J, et al.,Genome Research. 2005; 15(3):369-75.

  30. PolyA site selection depends on sequence elements and abundance/stochiometry of trans-factors PAS 5’ UGUA AAUAAA 30 kD PAPOL 160 kD 73 kD 68 kD 25 kD 100 kD CPSF 50 kD 77 kD 77 kD Symplekin 64 kD UG-rich 50 kD 64 kD CSTF DSE U-rich hnRNP H G-rich Up to >80 proteins in complex 3’

  31. NMF defines patterns of signals that control 3’-processing (cleavage and polyadenylation)

  32. Example 2: Insulin-like growth factor 2 mRNA binding protein 1 (Igf2bp1) • Contains four K homology domains and two RNA recognition motifs • Binds to the 5’-UTR of IGF2 mRNA, regulating translation • Can act as an oncogene if misregulated • Evolutionarily conserved, with critical role in mRNA localization and translational control

  33. Consequences: Igf2bp1 has transforming potential only when expressed in its truncated isoform ~50,000 nt ~6,500 nt 5’ 3’ AAA… AAA… Mayr and Bartel, Cell 2009

  34. Inclusion (or exclusion) of regulatory sequences in the 3’-UTR fine tune expression and response • Spicheret al, Mol Cell Biol 1998

  35. Example 3: Regulated control of polyA site selection for anitbodies during B-cell maturation

  36. Alternative transcription initiation

  37. Alternative transcription initiation can arise in several ways with varying consequences

  38. CAGE tags showed an unexpectedly high frequency in the 3’-UTR

  39. 3’-UTR CAGE tags occur in evolutionarily conserved contexts with a common local sequence

  40. The definition of a gene becomes much more fluid: Ins2-IGF2 • Two genes with spurious connection? • One large genes with distinct, disjoint transcripts?

  41. Cleaved 3’-UTR RNA products (uaRNAs) are often tissue-specific and can localize differentially

  42. Next time: Details of measuring transcript differences in large-scale

More Related