1 / 5

Performance Profiling of NGS Genome A ssembly A lgorithms

Performance Profiling of NGS Genome A ssembly A lgorithms. Alex Ropelewski Pittsburgh Supercomputing Center ropelews@psc.edu 412-268-4960. NGS: Assembly Algorithm. de Bruijn Graph. ALIGNED 3-MERS ATG TGG GGC GCG CGT GTG TGC GCA CAA

merritt
Download Presentation

Performance Profiling of NGS Genome A ssembly A lgorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Profiling ofNGS Genome Assembly Algorithms Alex Ropelewski Pittsburgh Supercomputing Center ropelews@psc.edu 412-268-4960

  2. NGS: Assembly Algorithm de Bruijn Graph ALIGNED 3-MERS ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT 1.ATG AT TG 10.AAT 2.TGG AA GG 6.GTG 7.TGC 3.GGC 9.CAA CA GC 8.GCA 4.GCG Genome: ATGGCGTGCAAT GT CG 5.CGT Assembled Genome via Eulerian Cycle (reads represented as edges)

  3. Program characteristics • 2 codes of interest: • Allpaths-LG: designed for assembling large genomes (Mostly C++, pipeline uses make) • Velvet: used frequently for small genomes (written in C; uses some OpenMP) • Both codes are: • memory intensive • time intensive • have some parallelization

  4. Desired Profile Information • For each program/step in the assembly pipeline: • Time and Memory consumption • Identification of serial and parallel steps • Quantify I/O characteristics • Quantify how many times each step is run • For the most time consuming and most called programs/steps: • Time consumed by each function • How many times is each function called • Quantify I/O characteristics • Identify parallel steps and examine scaling • Describe the main memory consumers

  5. General Outcome • Where should the optimization effort be focused? • Are there serial optimizations? • Additional candidates for parallelization? • Can the existing parallelization be improved? • Can the IO be improved? • Memory performance issues to address? • Something else?

More Related