1 / 16

A Parallel Solution to Global Sequence Comparisons

A Parallel Solution to Global Sequence Comparisons. CSC 583 – Parallel Programming By: Nnamdi Ihuegbu 12/19/03. Brief Introduction. Human Genome Project (and others) -> Vast amount of biological data Venture: Computer Science and Biology (BCB) -> Genetic Databases (map,genomic,proteomic)

sumi
Download Presentation

A Parallel Solution to Global Sequence Comparisons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Parallel Solution to Global Sequence Comparisons CSC 583 – Parallel Programming By: Nnamdi Ihuegbu 12/19/03

  2. Brief Introduction • Human Genome Project (and others) -> Vast amount of biological data • Venture: Computer Science and Biology (BCB) -> Genetic Databases (map,genomic,proteomic) • Expected date of Completed map of human genome: end of 2003 • Next stage: Sequence comp. and Seq-Protein function. • Useful to Pharm. Companies (CADD – e.g. SKB’s Relenza).

  3. Results - Sequence • Current Sequence Generation Technologies • Maxam-Gilbert (use chemicals to cleave DNA at a specific base/length) • Sanger (use enzymatic procedures to produce DNA based on specific base—i.e. length)

  4. Derivation of nucleotide sequence from human chromosome

  5. Sequence Comparison Methods • Types of Sequence Comparisons/alignmts. • Global (“How similar are these two sequences?”) • To find best overall alignment b/w two sequences • 1970: Needleman and Wunch (global, dynamic) • Shortcomings: in small similarities w/in 2 subseq. • Local (“What sequences in a database are most similar to this sequence?”) • To find the best subseq. match b/w two sequences • 1981: Smith and Waterman (local, dynamic) • Shortcomings: not computationally efficient, slow

  6. Results - Sequence

  7. Results - Sequence • Heuristic Search (Quick, Approximate) • Quickly search for “words” that match sequence. Then recursively perform local search on each matched word until no other matches • FASTA (1998), BLAST(1990) • Shortcomings: approximate not exact, E-Value (sig if <0.05)

  8. Results – Sequence (CSC Implementation) • Sequence alignment can be represented as matrices and graphs (using rules and costs) • When converted into a directed acyclic graph, solution of the sequence alignment is the shortest-path with maximum value (max. path problem).

  9. Sequencing (CSC Implementation) • Can be solved dynamically as a ‘running max score’ (RMS). • For each D(i,j), best RMS = max(west+gap1, north+gap2, NW+current_score) • Replace D(i,j) with max • Needleman-Wunch Dynamic Program Diag. edge = character matches; down edge = gap in string 2; across edge = gap in string 1

  10. Parallel Solution Work (Slaves) allocated in stripes

  11. A A T T T T T T -1 -1 3 3 3 3 G G -1 -1 -3 -3 -3 -3 [Gap] [Gap] -2 -2 -6 -6 -6 -6 Parallel Solution (Cont’d) Allocating Strips in SubMatrix

  12. A A T T T T T T -1 -1 3 -1 -1 3 G G -1 -1 -3 -1 -1 -3 [Gap] [Gap] -1 -2 -1 -6 -6 -1 Parallel Results Path: T A -1 G T -3 _ T -6 -10 Each cell in each strip computes maximum of NEIGHBORS (running max)

  13. Improvements • Parallel Smith-Waterman (localized; start and continue while >0 then end); (BLAZE-Stanford). • Pipeline implementation on an actual Mesh Topology • Other possible data infrastructures to traverse data in search of shortest path (e.g. Trees -- specialized)

  14. Improvements (Cont’d) • Faster means of comparing and aligning multiple sequences simultaneously (e.g. comparing novel protein sequence to family).

  15. Any Questions?

More Related