1 / 13

Hitchhiker’s Guide to PASA: An RFC on Integrating an Alien Program to Brent Lab

Hitchhiker’s Guide to PASA: An RFC on Integrating an Alien Program to Brent Lab. Bob Zimmermann 11-07-06. Again, What Is PASA?. P rogram to A ssemble S pliced A lignments “PASA” == Core algorithm Given genomic coords, what is a likely candidate for a full length transcript?

vinson
Download Presentation

Hitchhiker’s Guide to PASA: An RFC on Integrating an Alien Program to Brent Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hitchhiker’s Guide to PASA:An RFC on Integrating an Alien Program to Brent Lab Bob Zimmermann 11-07-06

  2. Again, What Is PASA? • Program to Assemble Spliced Alignments • “PASA” == Core algorithm • Given genomic coords, what is a likely candidate for a full length transcript? • “PASA Pipeline” == Heuristic Suite • Mish mash of programs; the topic of today’s presentation

  3. PASA’s Original Goals • Improvement of annotations thru ESTs • How can ESTs imply additional transcripts? • How do we define likely updates? • How do we align? • Etc., etc. • A lot of this work is usually hand-done • But it doesn’t all need to be • Greatly reduces overhead

  4. Preliminaries • PASA Pipeline uses MySQL: • Annotations are loaded via adapters • All ESTs + alignments are stored in the db • Storage problem? • Web portal displays results via db • Annotations are compared in the db • Atom is a set of assembled EST alignments • Takes the form of a database

  5. To Illustrate (user) conf, ESTs, genome annotation updates… (PASA Pipeline) assemblies (a PASA db)

  6. So What Happens? • Three Major Phases: • Alignment • Assembly • Update

  7. Alignment • Typically, BLAT is the first pass. • GMAP is an alternative • Should these not pass validation… • Reasonable intron length • Percent ID • Single Exon • Sim4 is run • Bail on the EST otherwise

  8. Assembly • “Maximal” assemblies • Most consistent alignments…RTFP • Subject to more validation • FL-cDNAs are considered putative novel • ESTs are possible extensions • Alignments ORFs are guessed • Longest ORF--should we think about this?

  9. Comparison • User supplies an annotation set • Pipeline marks “good” updates • Percent overlap • Percent ID (non-flcDNA assms) • Min ORF size • Max UTRs • All tweakable • SO: Better predictions, more annotations! • Another chance for us to rule the school

  10. Visuals!

  11. What do we want with this? • I’m working on it: • Use to augment our predictions • Long pipeline: ESTs,flcDNAs->iPE->N-SCAN->PASA • Use to generate EST sequences • Different fork in the pipeline: Alignment->Assembly->ESTSEQ->N-SCAN • An awesome alignment tool • Can incorporate Pairagon, etc. • Ideas?

  12. Caveats! • PASA’s algorithm is cubic in # of ESTs • Awful for human • Brian wrote a faster algorithm • Still running (3-4 days?) human. • Who trusts ESTs anyway? • seqclean tool can get rid of some junk • any number of criteria can be added • maybe N-SCAN tips the scales back?

  13. More Caveats • Missing pieces: • Alignent to estseq • Update gtfs • Use Pairagon • Brian is not versioning well • But he might make me a developer (good)

More Related