Hitchhiker’s Guide to PASA: An RFC on Integrating an Alien Program to Brent Lab

Hitchhiker’s Guide to PASA:An RFC on Integrating an Alien Program to Brent Lab Bob Zimmermann 11-07-06

Again, What Is PASA? • Program to Assemble Spliced Alignments • “PASA” == Core algorithm • Given genomic coords, what is a likely candidate for a full length transcript? • “PASA Pipeline” == Heuristic Suite • Mish mash of programs; the topic of today’s presentation

PASA’s Original Goals • Improvement of annotations thru ESTs • How can ESTs imply additional transcripts? • How do we define likely updates? • How do we align? • Etc., etc. • A lot of this work is usually hand-done • But it doesn’t all need to be • Greatly reduces overhead

Preliminaries • PASA Pipeline uses MySQL: • Annotations are loaded via adapters • All ESTs + alignments are stored in the db • Storage problem? • Web portal displays results via db • Annotations are compared in the db • Atom is a set of assembled EST alignments • Takes the form of a database

To Illustrate (user) conf, ESTs, genome annotation updates… (PASA Pipeline) assemblies (a PASA db)

So What Happens? • Three Major Phases: • Alignment • Assembly • Update

Alignment • Typically, BLAT is the first pass. • GMAP is an alternative • Should these not pass validation… • Reasonable intron length • Percent ID • Single Exon • Sim4 is run • Bail on the EST otherwise

Assembly • “Maximal” assemblies • Most consistent alignments…RTFP • Subject to more validation • FL-cDNAs are considered putative novel • ESTs are possible extensions • Alignments ORFs are guessed • Longest ORF--should we think about this?

Comparison • User supplies an annotation set • Pipeline marks “good” updates • Percent overlap • Percent ID (non-flcDNA assms) • Min ORF size • Max UTRs • All tweakable • SO: Better predictions, more annotations! • Another chance for us to rule the school

Visuals!

What do we want with this? • I’m working on it: • Use to augment our predictions • Long pipeline: ESTs,flcDNAs->iPE->N-SCAN->PASA • Use to generate EST sequences • Different fork in the pipeline: Alignment->Assembly->ESTSEQ->N-SCAN • An awesome alignment tool • Can incorporate Pairagon, etc. • Ideas?

Caveats! • PASA’s algorithm is cubic in # of ESTs • Awful for human • Brian wrote a faster algorithm • Still running (3-4 days?) human. • Who trusts ESTs anyway? • seqclean tool can get rid of some junk • any number of criteria can be added • maybe N-SCAN tips the scales back?

More Caveats • Missing pieces: • Alignent to estseq • Update gtfs • Use Pairagon • Brian is not versioning well • But he might make me a developer (good)

Hitchhiker’s Guide to PASA: An RFC on Integrating an Alien Program to Brent Lab

Hitchhiker’s Guide to PASA: An RFC on Integrating an Alien Program to Brent Lab

Presentation Transcript

Using Integrating the Healthcare Enterprise (IHE) profiles for an healthcare DataGrid Based on AliEn

An Introduction to: Designer s Illustrated Guide to NFPA 101

An approach to 5 S

A Hitchhiker s Guide to Guns vs Butter

LAB To LAND An Orissa Experience

An Integrated Guide to WIA Program Eligibility

An A-to-Z guide to Disability

Program Director’s Guide to ACGME Resident Survey: An Overview

An insider s guide to tech startups

The Hitchhiker s guide to Engineering Insurance

An Introduction to S-parameters

A Hitchhiker ’ s Guide to Guns vs Butter

We’re Going on an Alien Hunt!

An Idiot’s Guide to

Invent an ALIEN

An Experimenter’s Guide to OpenFlow

Integrating an intervention program to eliminate and prevent bullying

An Introduction to the Prescience Lab

„ A Hitchhiker`s guide to the IceCube Detector “

AN ALIEN HAND

We’re going on an alien hunt

DELIBERATELY CREATING AN ALIEN PLANET