1 / 54

Alternative Splicing from ESTs

Alternative Splicing from ESTs. Eduardo Eyras Bioinformatics UPF – February 2004. Intro ESTs Prediction of Alternative Splicing from ESTs. Transcription. exons. introns. pre-mRNA. Splicing. Mature mRNA. Translation. Peptide. 5’. 3’. 3’. 5’. 5’ CAP. AAAAAAA. Different Splicing.

bonita
Download Presentation

Alternative Splicing from ESTs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alternative Splicing from ESTs Eduardo Eyras Bioinformatics UPF – February 2004

  2. Intro • ESTs • Prediction of • Alternative Splicing from ESTs

  3. Transcription exons introns pre-mRNA Splicing Mature mRNA Translation Peptide 5’ 3’ 3’ 5’ 5’ CAP AAAAAAA

  4. Different Splicing Mature mRNA Translation Different Peptide 5’ 3’ 3’ 5’ Transcription exons introns pre-mRNA 5’ CAP AAAAAAA

  5. Alt splicing as a mechanism of gene regulation Functional domains can be added/subtracted  protein diversity Can introduce early stop codons, resulting in truncated proteins or unstable mRNAs It can modify the activity of the transcription factors, affecting the expression of genes It is observed nearly in all metazoans Estimated to occur in 30%-60% of human

  6. Forms of alternative splicing Exon skipping / inclusion Alternative 3’ splice site Alternative 5’ splice site Mutually exclusive exons Intron retention Constitutive exon Alternatively spliced exons

  7. How to study alternative splicing?

  8. ESTs (Expressed Sequence Tags) Single-pass sequencing of a small (end) piece of cDNA Typically 200-500 nucleotides long It may contain coding and/or non-coding region

  9. 5’ 5’ 5’ 3’ 3’ 3’ AAAAAA AAAAAA AAAAAA ESTs Cells from a specific organ, tissue or developmental stage mRNA extraction Add oligo-dT primer TTTTTT 3’ 5’ Reverse transcriptase RNA TTTTTT DNA 3’ 5’ Ribonuclease H TTTTTT 3’ 5’ DNA polimerase Ribonuclease H 5’ 3’ AAAAAA Double stranded cDNA TTTTTT 3’ 5’

  10. ESTs 5’ 3’ AAAAAA Clone cDNA into a vector TTTTTT 3’ 5’ 5’ EST Single-pass sequence reads Multiple cDNA clones 3’ EST

  11. Sampling the Transcriptome with ESTs Genomic Primary transcript Splicing Splice variants oligo-dT primer Reverse transcriptase cDNA clones (double stranded) EST sequences (Single-pass sequence reads) 5’ 3’ 5’ 3’

  12. Large scale EST-sequencing coupled to Genome sequencing

  13. EST sequencing • Is fast and cheap • Gives direct information about the gene sequence • Partial information Resulting ESTs Known gene (DB searches) Similar to known gene Contaminant Novel gene

  14. dbEST release 20 February 2004 • Number of public entries: 20,039,613 • Summary by organism • Homo sapiens (human) 5,472,005 • Mus musculus + domesticus (mouse) 4,056,481 • Rattus sp. (rat) 583,841 • Triticum aestivum (wheat) 549,926 • Ciona intestinalis 492,511 • Gallus gallus (chicken) 460,385 • Danio rerio (zebrafish) 450,652 • Zea mays (maize) 391,417 • Xenopus laevis (African clawed frog) 359,901 • …

  15. EST lengths ~ 450 bp Human EST length distribution (dbEST Sep. 2003 )

  16. Anatomical System The tissue, organ or anatomical system from which the sample was prepared. Examples are digestive, lung and retina. Cell Type The precise cell type from which a sample was prepared. Examples are: B-lymphocyte, fibroblast and oocyte. Pathology The pathological state of the sample from which the sample was prepared.Examples are: normal, lymphoma, and congenital. Developmental Stage The stage during the organism's development at which the sample was prepared. Examples are: embryo, fetus, and adult. Pooling Indicates whether the tissue used to prepare the library was derived from single or multiple samples.  Examples are pooled, pooled donor and pooled tissue. ESTs provide expression data eVOC Ontologieshttp://www.sanbi.ac.za/evoc/ J Kelso et al. Genome Research 2002

  17. ESTs provide expression data eVOC Ontologieshttp://www.sanbi.ac.za/evoc/ Developmental Stage Anatomical System Pathology Cell Type Pooling … nervous brain cerebellum … Library 1 Library 2 … ESTs ESTs

  18. Linking the expression vocabulary to gene annotations ESTs Genes V Curwen et al. Genome Research (2004)

  19. Gene expression vocabulary

  20. Normalized vs. non-normalized libraries

  21. The down side of the ESTs • Cannot detect lowly/rarely expressed genes or non-expressed sequences (regulatory) Random sampling: the more ESTs we sequence the less new useful sequences we will get

  22. Using ESTs to study Alternative Splicing

  23. It defines the location of exons and introns We can verify the splice sites of introns  check the correct strand of spliced ESTs It helps preventing chimeras It can avoid putting together ESTs from paralogous genes We can prevent including pseudogenes in our analysis ESTs aligned to the genome EST Stop * AG GT PolyA Processed pseudogene True match best in genome Paralog Must Clip poly A tails before aligning

  24. Alternative Exons/ 3´ PolyA sites from ESTs ESTs can also provide information about potential alternative splicing when aligned to the genome (and when aligned to mRNA data)

  25. Aligning ESTs to the Genome • Many ESTs  Fast programs, Fast computers • Nearly exact matches Coverage >= 97% • Percent_id >= 97% • Splice sites: GT—AG, AT—AC, GC—AG

  26. Genomics as a Technology Development of special software: fast versus accurate alignment Development of special technology: efficient use of computer farms (~2000 CPUs)

  27. Recovering full transcripts from ESTs

  28. Recover the mRNA from the ESTs

  29. The Problem ESTs Genome What are the transcripts represented in this set of mapped ESTs?

  30. Predict Transcripts from ESTs ESTs Transcript predictions Merge ESTs according to splicing structure compatibility

  31. Redundant ESTs Consider 2 ESTs in a Genomic Cluster with more ESTS x z x + z z gives redundant splicing information, we could keep only x x z w x + z z + w However, the relation with other ESTs in the cluster is important: a third EST, w, is compatible with z but not with x. --> keep all relations

  32. Extension of the exon structure Consider 2 ESTs in a Genomic Cluster with more ESTS x y x + y y extends x, we can assume that they are from the same mRNA x z w Our success will depend on the coverage of the exons. However, ESTs are 3’and 5’ biased (ESTs like z not so frequent), hence we will have fragmentation.

  33. Representation For every 2 ESTs in a Genomic Cluster, we decide if they represent equivalent splicing structures The compatibility relation is a graph: x x Extension y y x Inclusion x z z E Eyras et al. Genome Research (2004)

  34. Criteria of “merging” Allow edge-exon mismatches mismatches Allow internal mismatches Allow intron mismatches Is this intron real?

  35. Transitivity x x y y Extension z w x Inclusion z x z w w This reduces the number of comparisons needed

  36. ClusterMerge graph Each node defines an inclusion sub-tree y z y x z x Extensions form acyclic graphs x x y y z z w w E Eyras et al. Genome Research (2004)

  37. Mergeable sets Example 1 2 3 4 5 6 7

  38. Mergeable sets Example 1 3 1 2 3 2 5 7 4 5 6 4 6 7

  39. Mergeable sets Example Root 1 3 1 2 3 2 5 7 4 5 6 4 6 7 Leaves

  40. Mergeable sets Example Root 1 3 1 2 3 2 5 7 4 5 6 4 6 7 Leaves Lists produced: (1,2,3,5,6,7) ( 1,2,3,4,5,7)

  41. Deriving the transcripts from the lists Internal Splice Sites: external coordinates of the 5’ and 3’ exons are not allowed to contribute

  42. Deriving the transcripts from the lists Splice Sites: are set to the most common coordinate 5’ and 3’ coordinates: are set to the exon coordinate that extends the potential UTR the most

  43. Single exon transcripts Reject resulting single exon transcripts when using ESTs

  44. Alternative splicing • and comparative genomics

  45. Conservation of Alternative Splicing Degree of conservation: 30-60% Methods: 1.- compare single events 2.- Cross-alignment of full transcripts

  46. Exon Skipping Events Introns flanking alternatively spliced (skipped) exons have high sequence conservation. Higher on average than constitutive inrons. R Sorek & G Ast. Genome Research 13:1631-1637, 2003

  47. Conserved Alternative Exon • Sequences regulating the (Alternative) splicing Flanking Introns Overrepresented hexamer (downstream) Overrepresented sequences in conserved introns (between human and mouse) may be Involved in the regulation of alternative splicing. Overrepresented: found in these introns more often than expected at random AND not found in intronic sequences flanking constitutive exons (and upstream of skipped ones) R Sorek & G Ast. Genome Research (2003) 13:1631-1637

  48. Sequences regulating the (Alternative) splicing Conserved Alternative Exon Flanking Introns Overrepresented hexamer Not all types of events are equally conserved. Introns flanking alternative 5´and 3´exons, and retained introns, have higher sequence conservation. Sugnet CW, Kent WJ, Ares M Jr, Haussler D. Pac Symp Biocomput. 2004;:66-77

  49. Frame preservation A Resch et al. Nucleic Acids Research 2004, 32 (4) 1261-1269

  50. Predicting alternative exons

More Related