Performance Profiling of NGS Genome A ssembly A lgorithms

Performance Profiling ofNGS Genome Assembly Algorithms Alex Ropelewski Pittsburgh Supercomputing Center ropelews@psc.edu 412-268-4960

NGS: Assembly Algorithm de Bruijn Graph ALIGNED 3-MERS ATG TGG GGC GCG CGT GTG TGC GCA CAA AAT 1.ATG AT TG 10.AAT 2.TGG AA GG 6.GTG 7.TGC 3.GGC 9.CAA CA GC 8.GCA 4.GCG Genome: ATGGCGTGCAAT GT CG 5.CGT Assembled Genome via Eulerian Cycle (reads represented as edges)

Program characteristics • 2 codes of interest: • Allpaths-LG: designed for assembling large genomes (Mostly C++, pipeline uses make) • Velvet: used frequently for small genomes (written in C; uses some OpenMP) • Both codes are: • memory intensive • time intensive • have some parallelization

Desired Profile Information • For each program/step in the assembly pipeline: • Time and Memory consumption • Identification of serial and parallel steps • Quantify I/O characteristics • Quantify how many times each step is run • For the most time consuming and most called programs/steps: • Time consumed by each function • How many times is each function called • Quantify I/O characteristics • Identify parallel steps and examine scaling • Describe the main memory consumers

General Outcome • Where should the optimization effort be focused? • Are there serial optimizations? • Additional candidates for parallelization? • Can the existing parallelization be improved? • Can the IO be improved? • Memory performance issues to address? • Something else?

Performance Profiling of NGS Genome A ssembly A lgorithms

Performance Profiling of NGS Genome A ssembly A lgorithms

Presentation Transcript

CHAPTER 8 A NNEALING- T YPE A LGORITHMS

Chapter 7: Sor ting A lgorithms

Chapter 7: Sor ting A lgorithms

Profiling: Software Performance

Functional profiling of the S. cerevisiae genome

Individual Performance Profiling

Sequencing technologies and Velvet a ssembly

3E A ssembly

Mass A ssembly

Convex Hull A lgorithms

Profiling a “polis”

Performance Profiling

All Student A ssembly

Sequencing a genome

C ompartmentalized S hotgun A ssembly

Mapping NGS sequences to a reference genome

Functional profiling of the S. cerevisiae genome

Sequencing a genome

NGS Bioinformatics Workshop 1.5 Genome Annotation

Data S tructures and A lgorithms