High performance bioinformatics

Group May 09-06 Bryan McCoy Kinit Patel Tyson Williams Advisor/Client: Zhao Zhang High performance bioinformatics

What is Bioinformatics? • Genetic sequencing • Massive amounts of data • Many simple operations • Perfect for distributed computing

Problem • Current solutions are not realistically feasible • Too expensive • Super computers • High powered servers • Too slow • Some inputs can takes several days • Need for high speed, low cost solutions

Our Solution • Cell Processor • Based on Phase 1 • Cluster of PlayStation 3s • MPI • Message Passing Interface

IBM Cell Broadband Engine • 1 Power Processing Element (PPE) • 8 Synergistic Processing Elements (SPEs) • Only 6 SPEs are accessible on a PlayStation 3 • 4 high speed rings for processor communication

DNAPenny • Compares DNA strands from different species • Score indicates evolution similarities between two species • Branch and bound search algorithm

Functional requirements • FR1. Ported applications shall run on the Cell B.E. • FR2. The results returned shall be the same as the original program. • FR3. The applications shall return their runtime. • FR4. The applications shall execute in parallel on multiple Cell B.E.s.

Non-Functional Requirements • NF1. The Cells shall all run on the Linux OS. • NF2. The resulting runtimes of the ported applications shall be faster than on the original applications. • NF3. The ported application shall be coded in the C language.

Market Survey • Results of the survey point to a huge speed up of computationally intensive programs. • Dr. GauravKhanna at the University of Massachusetts Dartmouth used cluster of 8 PS3s to replace a supercomputer. • Universitat Pompeu Fabra, in Barcelona, deployed in 2007 a BOINC system called PS3GRID for collaborative biological computing.

Risk Assessment • Slow network speed • Software support • Limited RAM • Hardware Failure • Lower quality entertainment hardware • Limited prior experience • Software development schedule

Resource Requirements • 3 PlayStation 3s • High performance network switch • Cell programming books • Front node (desktop computer) • Time

Software Environment • Use Fedora 9 OS as it is currently supported by the Cell SDK 3.1. • Uses the command line for user interface. • Use the IBM XLC compiler and/or the current GCC compiler.

Hardware Environment • 3 PlayStation 3s • High speed Crossbar switch • Private network • Front Node (desktop computer) • Proxy server • Network File Store (NFS)

I/O • Input • Inputs are DNA sequences stored in a text file. • Text is a CustalW alignment organized in Phylip format, a standard format for biological applications. • Output • Outputs are • The parsimony score • The best trees • The execution time • The score and best trees are output to the screen and to text files. • The execution time is output to a CSV (Comma Separated Value) file.

Work Breakdown Structure

Work Schedule • Gant chart

Deliverables • Source Code • Compiled Executable • Runtime Comparisons • Final Report • Poster • Final Presentation

Costs • Time • Approximately 555 man hours total. • Freely donated. Total cost $0. • Equipment • 3 PlayStation 3s • Provided by client • Crossbar router • Provided by client • Standard desktop computer • Provided by department Total cost $0.

Development: Initial Overview • Use MPI to distribute the program to the multiple PlayStations. • Each PlayStation would search one branch of the tree. • 1 function (supplement) took 90% of the runtime • Phase 1 ported this function to the SPEs

Development: Difficulties • Found a bug in supplement. • The bug did not affect results but did affect runtime. • We contacted the original developer, Dr. Felsenstein at the University of Washington, who fixed the bug. • The fix significantly improved runtime. • However, the fix negated all work done by Phase 1 as supplement no longer took a significant amount of runtime.

Development: Reworking • After the bug fix, no single function took a significant amount of runtime. • We decided to distribute branches of the tree search to different processors.

Development: Results • Completed our goals • Divided work among 3 PlayStation 3s. • Produced faster code that comparable sequential environment. • Due to time constraints, we were not able to port the code to the SPEs.

Testing • Used script to test multiple inputs. • Averaged the runtimes. • Used several different code revisions and machines to provide comparisons. • Projected the speedup that could be attained if code was ported to SPEs.

Results: Actual • Our current code is 20.76 times faster than the it was at the beginning of the semester. • Surpassed our original projections, which assumed the use of the SPEs.

Results: MPI • The speedup for MPI was 2.78. • Excellent speedup for 3 nodes.

Results: Comparison • Our final code came close to a high powered desktop. • Core 2 Quad at 2.66 GHz • Our projected results indicate a speedup of 6.4.

Results: Projected • Using all SPEs, the speedup should be 145.35 • Assuming SPEs run as fast as the PPEs • Before SPE vectorization

Conclusions • Achieved our goal of using MPI to get runtime improvement. • Contributed a major fix to a widely used application. • Surpassed our initial runtime goal. • Projected results show an even larger runtime improvement still possible.

Acknowledgements • May08-24 group (phase I) • Kyle Byerly • Shannon McCormick • Matt Rohlf • Bryan Venteicher • DNAPenny Author • Dr. Felsenstein • Advisor • Zhao Zhang • Environment Help • Steve Nystrom

Questions?

High performance bioinformatics