150 likes | 329 Views
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL. Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov, Haruna Cofer, Roberto Gomperts SGI. Problem Statement. Multiple Sequence Alignment (MSA)
E N D
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov, Haruna Cofer, Roberto Gomperts SGI
Problem Statement • Multiple Sequence Alignment (MSA) • Basis for phylogenetic analysis - Infer homology relationships • Building protein families - conserved region may imply common function • Aids in function/structure prediction of new proteins • Global MSA – Clustal W • Is it computationally expensive ? Yes, for 100 sequences. • Goal : Parallelize Clustal W • Clustal W takes hours for 100 or more sequences • Parallelization possible for the algorithm • Contribution of the paper • Parallel Clustal W • Parallel version of basic Clustal W • HT Clustal • Parallelize heterogeneous Multiple Sequence Alignment problems • MULTICLUSTAL • Parallel version of an optimization on Clustal W CMSC 838T – Presentation
Talk Overview • Overview of talk • Motivation • Background • Sequential Clustal W • Parallel Clustal W • HT Clustal • Problem Statement • Optimizations • MULTICLUSTAL • Sequential Algorithm • Optimizations • Observations CMSC 838T – Presentation
Introduction • Sequential Clustal W Algorithm • Given N sequences of length M each • Pairwise Alignment (PA) • Creates distance matrix N x N based on pairwise alignment scores • Evolutionary distance • Guide Tree (GT) construction (Phylogenetic tree) • Use Neighbor-joining algorithm • Progressive Multiple Alignment (PA) • Use guide tree to align closely related pairs of sequences • Progressively align next sequence to existing alignment CMSC 838T – Presentation
Parallel Clustal W • Problem Statement • Parallelize the Sequential Clustal W • Execution time breakup • PW = pairwise alignment, GT = guide tree, PA = progressive alignment CMSC 838T – Presentation
Parallel Clustal W • Pairwise Alignment Stage • N(N-1)/2 pairwise alignments • Send them randomly to different processors • Random – as jobs of different load • Random also produces statistically uniform distribution (over a large set of jobs) • 1.8X speedup achieved on a 1000 sequence MSA with 8 CPUs • Guide Tree Stage • Parallelize “find closest neighbors from distance matrix” • Used in the neighbor joining algorithm • Find minimum element of each row concurrently • Use this to find minimum element of matrix CMSC 838T – Presentation
Parallel Clustal W • Progressive Alignment Stage • Computation of a function score(I,J) precomputed in parallel • Alignment score of sequence I and J • Not much parallelization in the third stage • Overall Speedup • Speedup of 10x for 600 MA sequences using 16 CPUs • Time reduced from 1 hr 7 minutes to 6.5 minutes • Relative scaling is better for larger inputs CMSC 838T – Presentation
HT Clustal • Problem Statement • Calculate large numbers of MSAs of various sizes (independent problems) • Such problems seen in high-throughput (HT) research environments • Representative Problem (from paper) : • Perform independent MSA over 100 sets of sequences • Each set has between 20 to 100 sequences with average of 60 sequences • Average Length of sequence = 390 CMSC 838T – Presentation
HT Clustal - Optimizations • Basic Idea • Each MSA operation (on one set of sequences) is independent of the other • Run ClustalW as a uniprocessor job on one MSA problem • Launch multiple Clustal W jobs on different processors • Job Scheduling • Jobs of different duration – depends on sequence set • Two scheduling options explored: • Schedule dynamically – if processor is free, schedule an MSA job – chosen randomly • Schedule dynamically – Sequences are presorted (based on filesize) CMSC 838T – Presentation
HT Clustal – Performance Numbers • Speedups • Almost linear speedups • 31x on 32 CPUs for the representative MSA problem • 116X on 128 CPUs for a larger test case • Solution time reduced from 18.5 hours to 9.5 minutes • Speedup shown for the example MSA set: CMSC 838T – Presentation
HT Clustal – Effect of Presorting • Effect of presorting • Figure shows effect of presorting for the example MSA set 32 CPUs, 100 sets, ~3 jobs per CPU • If average number of jobs per CPU < 5 presorting helps • For larger number of jobs per CPU statistical averaging reduces load imbalance CMSC 838T – Presentation
MULTICLUSTAL • MULTICLUSTAL Algorithm • A Perl script to generate high quality MSA with little user intervention • Searches for best combination of Clustal W input parameters • To reduce gaps, increase clustering • Parameters to vary : • Scoring matrices : pairwise and multiple • Gap open and extension penalties (pairwise and multiple) • Sequential Algorithm : • Till all parameters are sufficiently varied { • alignment = Run Clustal W () • Calculate quality of alignment • Change Parameters } • Quality of alignment • A numerical quantity based on • identitical amino acid matches • Conservative amino acid substitutions • Gap events, amino acid islands I.e. –X-, -XX-, -XXX-, -XXXX- CMSC 838T – Presentation
MULTICLUSTAL Optimizations • Optimization on MULTICLUSTAL • Run Clustal W once • Reuse tree generated in the PW/GT Stages • Guide tree calculated only once for multiple runs • Results in speedups from 1.5X to 3X • Use Parallel Clustal W for each run of Clustal W CMSC 838T – Presentation
Observations • Parallelizability • First (pairwise alignment) and second (guide tree) stages are parallelizable • Third stage is mostly sequential – speedup limited • 100 sequence MSAs possible ? • PIR at NBRF (Georgetown University) takes maximum of 20 sequences for MSA • Speedup improves user response, for 20 sequences a PC would be sufficient • Probable applications: • Research Environments ? • PIR servers ? • Speedup only on shared memory SGI 3000 workstation ? CMSC 838T – Presentation