Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)

Shortly: • HiddenMarkov Models are extensively used to model processes in many fields • The runtime of HMM algorithms is usually linear in the length of the input • We show how to exploit repetitions to obtain speedup • First provable speedup of Viterbi’s algorithm • Can use different compression schemes • Applies to several decoding and training algorithms

Markov Models P1←1= 0.9 P2←1= 0.1 P2←2= 0.8 • statesq1 , … ,qk q2 q1 P1←2= 0.2 • transition probabilitiesPi←j e1(A) = 0.3 e1(C) = 0.2 e1(G) = 0.2 e1(T) = 0.3 e2(A) = 0.2 e2(C) = 0.3 e2(G) = 0.3 e2(T) = 0.2 • emission probabilitiesei(σ) σєΣ • time independent, discrete, finite

Markov Models HiddenMarkov Models time 1 1 1 1 2 2 2 2 states k k k k xn x1 x2 x3 observed string • We are only given the description of the model and the observed string • Decoding: find the hidden sequence of states that is most likely to have generated the observed string

Decoding – Viterbi’s Algorithm time states v6[4]=maxj{e4(c)·P4←j·v5[j]} v6[4]= e4(c)·P4←2·v5[2] v5[2] v6[4]= P4←2·v5[2] v6[4]= v5[2] probability of best sequence of states that emits first 5 chars and ends in state 2 probability of best sequence of states that emits first 5 chars and ends in state j

Outline • Overview • Exploiting repetitions • Using LZ78 • Using Run-Length Encoding • Summary of results

VA in Matrix Notation v1[i]=maxj{ei(x1)·Pi←j · v0[j]} Mij(σ) = ei (σ)·Pi←j v1[i]=maxj{ Mij(x1) · v0[j]} (A⊗B)ij= maxk{Aik ·Bkj } Viterbi’s algorithm: O(k2n) vn=M(xn) ⊗ M(xn-1) ⊗ ··· ⊗ M(x1) ⊗v0 v1= M(x1) ⊗v0 v2= M(x2) ⊗ M(x1) ⊗v0 O(k3n)

Exploiting Repetitions c a t g a a c t g a a c vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0 12 steps • compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once • use it twice! vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0 6 steps

Exploiting repetitions ℓ - length of repetition W λ – number of times W repeats in string computing M(W) costs (ℓ -1)k3 each time W appears we save (ℓ -1)k2 W is good if λ(ℓ -1)k2 > (ℓ -1)k3 number of repeatsλ > k number of states matrix-matrix multiplication > matrix-vector multiplication

Offline General Scheme • dictionary selection: choose the set D={Wi } of good substrings • encoding: compute M(Wi ) for every Wi in D • parsing: partition the input X into good substringsX = Wi1Wi2 … Win’X’ = i1,i2, … ,in’ • propagation: run Viterbi’s Algorithm on X’ using M(Wi)

LZ78 • The next LZ-word is the longest LZ-word previously seen plus one character • Use a trie • Number of LZ-words is asymptotically < n ∕ log n g a aacgacg c g

Using LZ78 Cost • dictionary selection:D = words in LZ parse of X • encoding: use incremental nature of LZM(Wσ)= M(W) ⊗M(σ) • parsing:X’ = LZ parse of X • propagation: run VA on X’ using M(Wi ) • Speedup: k2n log n k3n ∕ log n k • O(n) • O(k3n ∕ log n) • O(n) • O(k2n∕ log n)

Improvement a g c g • Remember speedup condition: λ > k • Use just LZ-words that appear more than k times • These words are represented by trie nodes with more than k descendants • Now must parse X (step III) differently • Ensures graceful degradation with increasing k:Speedup: min(1,log n∕ k)

Experimental results ~x5 faster: • Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast) • Long - 22Mbp human Y-chromosome

Run Length Encoding aaaccggggg → a3c2g5 aaaccggggg → a2a1c2g4g1

Summary of results • General framework • LZ78 log(n) ∕ k • RLE r ∕log(r) • Byte-Pair Encoding r • Path reconstruction O(n) • F/B algorithms (standard matrix multiplication) • Viterbi training same speedups apply • Baum-Welch training speedup, many details • Parallelization

Thank you! Any questions?

Path traceback • In VA, easy to do in O(n) time by keeping track of maximizing states during computation • The problem: we run VA on X’, so we get the sequence of states for X’, not for X.we only get the states on the boundaries of good substrings of X • Solution: keep track of maximizing states when computing the matrices M(w). Takes O(n) time and O(nk2) space

Training • Estimate unknown parameters Pi←j , ei(σ) • Use Expectation Maximization: • Decoding • Recalculate parameters • Viterbi Training: each iteration costs O( VA + n + k2) path traceback + update Pi←j , ei(σ) Decoding (bottleneck) speedup!

Baum Welch Training • each iteration costs: O( FB + nk2) • If substring w has length l and repeats λ times satisfies:then can speed up the entire process by precalculation path traceback + update Pi←j , ei(σ) Decoding O(nk2)

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions