1 / 14

UNCOVER – a new tool to predict “missing” skipped exons

UNCOVER – a new tool to predict “missing” skipped exons. Uwe Ohler uwe.ohler@duke.edu Institute for Genome Sciences and Policy, Duke University Noam Shomron / Chris Burge Department of Biology, MIT. Types of alternative splicing. Conservation of AS.

Download Presentation

UNCOVER – a new tool to predict “missing” skipped exons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNCOVER –a new tool to predict “missing” skipped exons UweOhler uwe.ohler@duke.edu Institute for Genome Sciences and Policy, Duke University Noam Shomron / Chris Burge Department of Biology, MIT

  2. Types of alternative splicing

  3. Conservation of AS • EST alignments deliver a surprisingly small number of conserved AS events across species • Often, only the more frequent isoform is conserved(Modrek & Lee, Nat Genet 2003) • Two questions: • Are we still missing a lot of AS events despite the large number of ESTs? • Are most reported AS events due to noise?(experimental as well as cellular; cf. tiling arrays)Or: AS as a frequent shortcut in evolution? • Intuitively, conserved AS may be “more important”

  4. Conserved vs Non-Conserved • Alternative-conserved exons (ACEs) differ from non-conserved and constitutive ones • Surrounded by long stretched of intronic conservation(intronic splicing elements?); reading frame; length • These features are so far NOT used in gene prediction algorithms • Sorek et al (2004), Yeo et al (2005): • SVM classifiers to predict whether known conserved exons are skipped or not (~2,000 predictions)(implies often that skipping is the minor isoform) • Only a fraction of human skipping predicted to be conserved

  5. Conservation patterns in exons • Could such patterns be used to find new cases of alternatively spliced exons where the major isoform is skipping? • Idea: look locally --- examine orthologous introns of already known genes Q: 541 gccgcagctgcagacagcccggctggaacaagaggtggcttcgtgctcaacttccatgcg ||||| || ||||||||||| || || | || || ||||||||||||||||| || S: 541 gccgccgcc---gacagcccggccggcacccgcggcggtttcgtgctcaacttccacgca Q: 601 gacacggaactg---ggcaagaagaagggcggcctcttccgtcggggttcccttctcggc ||| | || || |||||||||||||||||||||||||| |||| |||||||| || S: 598 gacgctgagctagcgggcaagaagaagggcggcctcttccggaggggctcccttcttgga

  6. UNCOVER: a pHMM to predict AS

  7. Advantages of this approach • Closes the gap between EST sequencing… • Despite normalization and large scale, always limited coverage • Increasing amount of noise – only “major” variants are found to be conserved between species • … and (pair) gene finding • Known to miss and/or overpredict exons • Usually infers the overall best gene structure; will miss alternative 5’/3’ events • Will also miss skipped exons that do not keep the frame or contain a stop codon

  8. An example alignment human 1 AAGTGAATAATAGTTTGCGCGGTACTAATGCCTGACCGGAATTGAGATGTGTTGCCTCTG 1 ||||||::|:|.::| |:|:|.:.|:|:||| |||::|||||| mouse 1 AAGTGAGCAGTCACT-GTGTGCCCCCAGTGC------------------TGTCACCTCTG label FFFFCCCCCCCCCCCCCCCCCCCCCCCCCCCIIIIIIIIIIIIIIIIIICCCCCCCCCCC [...] human 671 TTTGCTGTTAAGTGTGTGTACACTTCAAGACCAAAGTAATTTTCTTTCATTCTTTTTTAT 841 |||||||||||||||||||:||||||||:||:::|||||||||||||||||||||||||| mouse 669 TTTGCTGTTAAGTGTGTGTGCACTTCAAAACTGGAGTAATTTTCTTTCATTCTTTTTTAT label CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTT human 731 CCTGTAGATCGCCAGTACCTACTGCAACATCTTTTCTCCCTACACAGCGACTCCAGCTTG 901 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| mouse 729 CCTGTAGATCGCCAGTACCTACTGCAACATCTTTTCTCCCTACACAGCGACTCCAGCTTG label TTTTTTTTTT23123123123123123123123123123123123123123123123123 human 791 GGAGGGCAGGGCCAGGGTTGTCACAGCTTCCCCTGTGGTGTCTGCCTGCCAAGCACAGCT 961 ||||||||||||||||||||:||||||||||||||||||||||||||||||||||||||| mouse 789 GGAGGGCAGGGCCAGGGTTGCCACAGCTTCCCCTGTGGTGTCTGCCTGCCAAGCACAGCT label 123123123123123123123123123123123123123123123123123123123123 human 851 CTGGAGTTAGCCCTGGGTGTGAGGTGAGAAAGAGATTGCATGGTCTGG----TTTTTTTC 1021 |||||||||||||||||||||||||||||||||||||||||||||||| |||||||| mouse 849 CTGGAGTTAGCCCTGGGTGTGAGGTGAGAAAGAGATTGCATGGTCTGGTCTTTTTTTTTC label 12312312312312312312FFFFFFFFFCCCCCCCCCCCCCCCCCCCIIIICCCCCCCC [...]

  9. Results: Curated set • 241 human/mouse alternative-conserved exons • From <1 kB to ~100 kB; total length 1,625,789 nt • Five exons are masked by RepeatMasker • Postprocessing: • Remove all predictions <10 codons, <70% identity

  10. Orthologous ENCODE introns • 323 human ENCODE genes have annotated mouse RBH orthologs • BLAST all human exon junctions (30 nt on each side) against mouse orthologs • Keep introns for which BLAST alignment spans across the junction, i.e. is in the corresponding CDS position • 1,823 orthologous intron pairs (47 > 30kB each)

  11. Early ENCODE evaluation --- fall 2004 • 135 predictions in the ENCODE 1% of the genome • 73 out of 1,776 orthologous introns (4.1%), located in 40 out of 323 genes (12.4%) • 42 predictions correspond to known skipped exons in Ensembl or human ESTs in dbEST • 15 annotated as Ensembl genes • 7 annotated as EST genes / VEGA genes • 3 spliced EST hits were not annotated • Rest potentially correspond to alternative terminal exons (1 annotated)

  12. ENCODE regions – RT-PCRs • We tested 20 of the additional predictions flanked by strong splice sites (plus later, 10 putative terminal exons) • Several primer pairs to flanking exons and/or flanking plus predicted exon • RT-PCR of several human adult tissues (brain, liver, plus 7 more; 15/20 expressed)

  13. ENCODE evaluation -- recent • 205 predictions (including long introns) • 25 were annotated as part of known genes as of Aug ‘04 • 38 hits to new HAVANA annotation • 37 additional with EST hits in current Genbank release • What about the newly validated ones? • 2 cases (ST7, SON) are now annotated ;-)

  14. Thanks to... MIT – Chris Burge Lab Noam Shomron Will Fairbrother (PLOS Biol., ESEs and SNPs) Dirk Holste (Genome Biol., tissue specific AS) Zefeng Wang (Cell, Exonic Splicing Silencers) Gene Yeo (PNAS, splicing mammals vs. fish) Duke -- Institute for Genome Sciences and Policy www.genome.duke.edu

More Related