1 / 14

Object-oriented Design and Programming

A program to locate mRNA sequences on the mouse genome efficiently using FASTA format and localization programs like BLAT, SSAHA, or MegaBLAST. Matches defined by start/end pairs on mRNA and chromosome/start/end tuples on genome. Implements object-oriented design principles for seamless operation.

mgrantham
Download Presentation

Object-oriented Design and Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object-oriented Design and Programming Conrad Huang PC204, Fall 2004

  2. Requirements • I have a bunch of mRNA sequences and I want to know where they are located on the mouse genome. • I want a fast method because I plan to do this with lots of data sets. This is probably better than most requirement statements that you’re likely to get

  3. Specification • Genomic and mRNA sequences are stored in FASTA format • Matches between genomic and mRNA (sub)sequences are found using a localization program, e.g., BLAT, SSAHA or MegaBLAST • Each match is defined by a (start, end) pair on the mRNA and a (chromosome, start, end) 3-tuple on the genomic sequence • The genomic location of an mRNA sequence is defined by a set of matches that maximally covers the mRNA Imprecise, but workable. Needs statement of what constitutes acceptable results.

  4. Objects FASTA file Genomic sequence mRNA sequence Match Genomic location Operations Read sequences from file Get matches from localization program Collate matches into genomic location Design (OO) Not uh-oh. O-O.

  5. Relationships FASTA file FASTA file Genomic sequences mRNA sequences Matches Genomic locations Arrow direction indicates reference or composition

  6. Classes • FastaFile • used for reading both genomic and mRNA sequences • Sequence • represents either genomic or mRNA sequence • Match • obtained from localization program output • GenomicLocation • either obtained from localization program output (BLAT) or composed from matches using our own algorithm (SSAHA or BLAT) • Localization program output parser Some classes, like FastaFile, can serve as the implementation of more than one concept (file of genomic sequences and file of mRNA sequences)

  7. Class Methods • FastaFile • read(filename) • parse FASTA file content into a list of Sequence instances • Sequence • None • data derived by localization program parser

  8. Class Methods (cont.) • LocalizationOutputParser • localize(sequence) • run localization program and parse output • if using BLAT, identify genomic location and matches from output • if using SSAHA or MegaBLAST, get list of matches from output and compute genomic location

  9. Class Methods (cont.) • Match • None • data filled in by LocalizationOutputParser • GenomicLocation • None • data filled in by LocalizationOutputParser

  10. Instance Attributes • Some attributes are dictated by the class • FastaFile must have a list or dictionary of sequences • These may be accessible externally • Some attributes are dictated by the operation • Reading a FASTA file might use a variable to keep track of the last line read • These are often for internal use only • Defining actual attributes in our classes is left as an exercise for the reader Yeah, I ran out of steam here, and there are already too many slides.

  11. Module Organization • sequence.py • defines FastaFile and Sequence • location.py • defines Match and GenomicLocation • blat.py, ssaha.py, megablast.py • defines BlatParser, SsahaParser and MegablastParser, respectively Keep classes that cannot stand independently in the same module

  12. Design (finishing touches) • Select implementation algorithms • file parsers (FASTA or localization output) are sometimes available on the Internet • recursion and result caching for generating genomic location from list of matches If this task is too big, you need to partition the problem further

  13. Implementation • Coding, testing and debugging • Start with the class skeletons • Write test code for each module • Test modules separately when possible • Test early and often Well, it’s about time! Project is due in 10 minutes.

  14. Roll Out • Release product to user • include User’s Guide • description of options • example usage • test cases The code is, of course, already completely documented.

More Related