140 likes | 222 Views
John W. Romein Jaap Heringa Henri E. Bal. A Million-Fold Speed Improvement in Genomic Repeats Detection. Vrije Universiteit Faculty of Sciences, Department of Computer Science Bio-Informatics Group & Computer Systems Group Amsterdam, the Netherlands. Vrije Universiteit , Amsterdam.
E N D
John W. Romein Jaap Heringa Henri E. Bal A Million-Fold Speed Improvementin Genomic Repeats Detection • Vrije Universiteit • Faculty of Sciences, Department of Computer Science • Bio-Informatics Group & Computer Systems Group • Amsterdam, the Netherlands VrijeUniversiteit, Amsterdam
repeats in bio sequences • important to detect • essential for evolution • protein structure & function • diseases • hard to detect • any length • mutations • insertions/deletions different fragment sizes • tandem and distant
repro • delineates repeats • sensitive • two phases • find top alignments (slow) • find repeats • replaced phase 1 • old algorithm • O(n4) n < 2,000 • new algorithm • O(n3) n < 60,000 • 3-level parallel: SIMD, SMP, cluster
sidestep: sequence alignment • superpose two sequences (TATGCAG, TCTGAG) • match symbols vertically (good: +2, bad: -1) • allow gaps (-2-1*length) • maximize score • compute matrix using dynamic programming
sidestep: local alignment • Find sub-sequences that match well • Ignores non-matching values before and after the subsequence (by disallowing negative values) • Construct actual alignment: O(n3) time • Computing only the scores: O(n2) time • (see paper)
summary • (TATGCAG, TCTGAG) => 6 • takes O(n2) time • (TATGCAG, TCTGAG) => • takes O(n3) time • Matching <junk1> TATGCAG <junk2> with <junk3> TCTGAG <junk4>gives same result as matching only the substrings TATGCAG and TCTGAG
finding topalignments • red lines: top alignments • split sequence every possible way • align subsequence-pair • best is first top alignment • trick: find next best (top) alignment usingO(n2) algorithm n times; construct topalignment using O(n3) algorithm • repeat while avoiding found top alignments • user typically wants 5-30 top alignments • ordered list, do most promising alignments first • realign 3-10%
performance old vs. new • sequence: longest known protein (titin) • speed improvement increases with sequence length
parallel alignment • parallelism within alignment • loop-carried dependency • concurrent alignments • speculative parallelism • good performance • three-level parallelism • SSE/SSE2 multimedia extensions (SIMD) • shared memory MIMD • distributed memory MIMD
SIMD parallelism • multimedia extensions • 4 (SSE) or 8 (SSE2) parallel operations on consecutive 2-byte words • compiler intrinsics • compute 4 (or 8) neighboring matrices concurrently • interleaved memory layout • use fine-grained hardware for coarse-grained computation • applicable to any program that does many alignments
SSE/SSE2 performance • speedups w.r.t. new algorithm • superlinear speedups • MAX operator • 8 extra mmx/xmm registers • scheduling • cache-aware alignment: 4 – 6.5 times faster
MIMD parallelism • SIMD (SSE) parallelism is speculative • If a matrix (alignment) is ‘promising’, its neighbors probably also are promising • MIMD parallelism: • use dynamic task scheduling, selecting most promising tasks from a job queue • Shared memory (SMP): easy • Distributed memory: MPI, master/worker
total parallel performance • SMP: 2 CPUs 2 2 times faster • cluster: 64*2 CPUs 548 – 889-fold speedup • Up to 125x faster than SSE version on 1 CPU
conclusions • new algorithm >> 100 times faster • much more for longer sequences • parallel: SSE(2), SMP, cluster • SSE(2) parallelism yields superlinear speedups • 128 CPUs: 548 – 889-fold speedup • 1,000,000-fold speed improvement