1 / 10

Swarms & Bundles: Bioinformatics & Biostatistics on Biowulf

Explore the use of swarm and bundle techniques on the Biowulf cluster for bioinformatics and biostatistics applications. Learn about GWAS, sequence analysis, statistical modeling, protein folding, molecular docking, tomographic reconstruction, and more. Discover how to run multiple independent processes in parallel and maximize computational efficiency.

Download Presentation

Swarms & Bundles: Bioinformatics & Biostatistics on Biowulf

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Swarms and Bundles: Bioinformatics and Biostatistics on Biowulf David Hoover Scientific Computing Branch, Division of Computer System Services CIT, NIH

  2. Embarrassingly Parallel Problems • GWAS, with huge numbers of SNPs • Sequence analysis, assembly, and mapping • Testing and validating statistical models • Protein folding and threading • Molecular docking and compound screening • Tomographic reconstruction

  3. Characterization of Surface Protein 3 from Malaria Parasite P. Falciparum Protein folding calculations with Rosetta++ 100,000 cpu hours Tsai et al., Mol. Biochem. Parasitology, online preprint 2008

  4. How to run multiple independent processes in parallel 16 independent processes input output input output command command

  5. Biowulf Cluster Batch System job1 job1.out script batch job16 job16.out script batch

  6. Swarm biowulf% swarm -f file job1 job2 job3 job4 Node 1 Node 2 Node 3 Node 4 job1.out job2.out job3.out job4.out

  7. Bundled Swarm biowulf% swarm -f file -b 4 job1 Node 1 job1.out

  8. Swarm Facts • Written and maintained by Helix Systems Staff • swarm introduced in late 2000 • 82% of all batch jobs run on the cluster since 2002 are swarm jobs • ~60% of all wall time spent on swarm jobs • swarm has been shared with clusters around the world

  9. Swarm World Records • Largest swarm: 683,445 commands • Largest bundle: 24,000 commands per CPU

  10. Future Challenges • How to deal with larger multicore nodes? Node 1 Node 2 Node 3

More Related