280 likes | 455 Views
Bit-parallel algorithms for computing all th e runs in a string. Kazunori Hirashima 1 , Hideo Bannai 1 , Wataru Matsubara 2 , Kazuhiko Kusano 2 , Akira Ishino 2 , Ayumi Shinohara 2. 1 Kyushu University, Japan 2 Tohoku University, Japan. Contents. Runs
E N D
Bit-parallel algorithms for computing all the runs in a string Kazunori Hirashima1, Hideo Bannai1, Wataru Matsubara2, Kazuhiko Kusano2, Akira Ishino2, Ayumi Shinohara2 1Kyushu University, Japan 2Tohoku University, Japan
Contents • Runs • Bit-parallel algorithms for counting runs • Counting prefix runs • Removing duplicate runs by position • Removing duplicates by Sieve • Computational Experiments • Conclusion
Runs • runs: occurrence of a periodic factor • non extendable(maximal) • exponent at least two • primitive-rooted • example: w = abbabbaccbcbcbc • run(w) : number of runs in string w
Calculating run(w) • Linear time algorithm [Kolpakov&Kucherov ‘99] • requires LZ-factorization of string • We present 3 bit-parallelalgorithms to calculate run(w) • does not require complicated data structures • very efficient for short strings
Contents • Runs • Algorithms • Counting prefix runs • Removing duplicate runs by position • Removing duplicates by Sieve • Computational Experiments • Discussion
Bit-parallel algorithms for counting runs For general alphabet: • Counting prefix runs For binary alphabet: • Removing duplicate runs by position • Removing duplicate runs by Sieve
Algorithm (counting prefix runs) prefix repetition = a repetition that is also a prefix prefix run = a run that is also a prefix • Idea For each suffix: • detect right maximal prefix repetitions of each period • count only repetitions with exponent at least 2 • count only left maximal repetitions
Algorithm(counting prefix runs) Detect right maximal prefix repetitions of each period prefix run prefix run example: w=aabaabaaaacaac ActiveArea w[1]=w[4],w[2]=w[4]
Algorithm(counting prefix run) Detect right maximal prefix repetitions of each period pseudo code example: w=aabaabaaaacaac nextChar=w[i]; bitmask=((occ[nextChar] >> (Length-i)) | (~0) << i); alive=alive&bitmask; ・・・ Length - i alive alive alive alive
Algorithm(counting prefix run) pseudo code • Count only repetitions with exponent at least 2 nextChar=w[i]; bitmask=((occ[nextChar] >> (Length-i)) | (~0) << i); prevAlive=alive; alive=alive&bitmask; IfprevAlive ^ alive & ActiveArea≠0 then count++; example: w=aabaabaaaacaac Ifimod 2 = 1 then activeArea := (activeArea << 1) | 1 ; prevAlive ^ alive
Algorithm(counting prefix run) • Count only left maximal repetitions w[3:8] seems to be run, but it can extend left. So w[3:8] isn’t a run. example: w=aabaabaaaacaac w[2]≠w[2+1] w[2]=w[2+2] w[2]=w[2+3]
Algorithm (binary strings) Idea • detect maximal repetition for each period 1, 2 ..., |w|/2. • count only repetitions with exponent at least 2. • count only repetitions of minimum period
Algorithm(Efficient algorithm for binary string) Detect maximal repetition for each period 1, 2 ..., |w|/2. • v= w ^ ((~w)>>p) • Examplep=3 w = w XOR ~w v= maximal repetition of period p in w stretch of 1’s in v P
Algorithm (binary strings) Delete repetitions with exponent less than 2. This is too short to be a run of period p = 3. • v= w ^ ((~w)>>p) • Examplep=3 2=5-3 5 7 4=7-3 w = w XOR ~w v = p=3 Stretch of 1’s must be at least length p=3.
Algorithm(Efficient algorithm for binary string) Delete repetitions with exponent less than 2. s = v; While (p>1) s = s & (v>>p); p--; END v=s; This calculation shortens each stretch of 1’s by p-1 & & & p - 1 2 1
Algorithm(Efficient algorithm for binary string) Delete repetitions with exponent less than 2. • Example • v = 00111111110010 p=7 • selfAND(v,p) • While p>1 s = p>>1; v = v & (v>>s); p = p – s; END O(p) →O(log p). p s
Algorithm(Efficient algorithm for binary string) • Examplew=00110011111111, p=4 • v=w^((~w)>>p) = 000011110001111 • selfAND(v,p) = 000000010000001 run with minimum period 1 We need to remove duplicates. • 2 approaches to remove repetitions of non-minimum periods: • Removing duplicate by Position • Removing duplicate by Sieve
Algorithm(Removing duplicate by Position) For period =1 to length/2 do • v=(w^((~w)>>1))&(1length>>period) ; • x=SelfAND(v,period); • While x ≠ 0do begPos=lsb(x); y=x+(1<<begPos); x= x & y; y=y & (-y); y=y << ((period – 1) << 1); If (runEndsByBegPos[begPos] & y) = 0then count ++; • runEndsByBegPos[begPos] = runEndsByBegPos[begPos] | y; • End • End End only count maximal repetitions with different begin and end positions 2 4 w= Begin position End position w^((~w)>>2)= w^((~w)>>4)=
Algorithm(Removing duplicate by Sieve) • Example: w=11110101010 For period =1 to length/2 do pvec[period]=w^((~w)>>1) ; End For period=1 to length/2 do x=SelfAND(pvec[period],period); count=count+oneRuns(x); For p=2*period to length/2 do x=x & (x >> period); Ifx=0 then break pvec[p] =pvec[p] ^ (x); End End w^((~w)>>1) w^((~w)>>2) xor w^((~w)>>3) delete runs in larger periods w^((~w)>>4) ・・・・ ・・・・
Algorithm(Removing duplicate by Sieve) • count=0; While (v≠ 0) v = v & ((v | (v – 1)) + 1); count++; END • Examplev=1001110011 • v | (v – 1) = 100111011 • v | (v – 1) + 1 = 100111100 • v & ((v | (v – 1))+1) = 100111000 • v | (v – 1) = 100111111 • v & ((v | (v – 1))+1) = 100000000 • v & ((v | (v – 1))+1) = 000000000 bit operations to count the number of stretches of 1’s
Contents • Runs • Algorithms • Counting prefix runs • Removing duplicate runs by position • Removing duplicates by Sieve • Computational Experiments • Discussion
Computational Experiments Calculate run(w) for all binary strings of length n • CPU :3.2GHz dual core Xeon • GPU :Geforce 8800GT • Memory :18GB • OS :MacOSX 10.5 Leopard
Computational Experiments GPU Use the programming tool CUDA Use the programming tool CUDA count=0 For period =1 to length/2 do pvec[period] = w ^ ((~w) >> 1) ; End For period=1 to length/2 do x = SelfAND(pvec[period],period); count = count + oneRuns(x); For p = 2 * period to length/2 do x=x & (x >> period); If x=0 then break pvec[p] = pvec[p]^(~x); End End Multi Processor Stream Processor Multi Processor ・・・
Computational Experiments Running time (seconds) for calculating run(w) for all binary strings of length n
Computational Experiments The maximum number of runs functionρ(n)=max { run(w) : |w| = n }for binary strings calculated for n up to 47 Kolpakov & Kucherov’99 New!
Lower and Upper bounds of ρ(n) 0 n 2n 3n 4n 5n ρ(n) 1.6n [Crochemore& Ilie ’08] 5n [Rytter ’06] cn[Kolpakov & Kucherov ’99] 3.44n [Rytter ’07] 3.48n [Puglisi etal. ’08] 0.927n [Franeck & Simpson ’06] 0.90n 0.95n 1.00n 1.05n 0.944565n [Matsubara et al ’08] 0.94457571235n [Matsubara et al ’09] [Simpson ’09] 1.029n[Crochemoreet al. ’08]
Computational Experiments f(n, r) : number of binary strings of length n with r runs n 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 f(n, 1) 2 6 14 18 18 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 f(n, 2) 0 0 2 14 38 66 98 138 170 210 242 282 314 354 386 426 458 498 530 570 602 f(n, 3) 0 0 0 0 8 38 102 202 376 596 880 1220 1622 2080 2598 3174 3808 4502 5252 6064 6930 f(n, 4) 0 0 0 0 0 4 34 130 306 682 1314 2296 3736 5686 8260 11562 15642 20626 26574 33590 41754 n f(n, 1) f(n, 2) f(n, 3) f(n, 4) 23 20 642 7860 51184 24 20 674 8842 61898 25 20 714 9890 74070 26 20 746 10988 87732 27 20 786 12154 103000 28 20 818 13368 119922 29 20 858 14652 138664 30 20 890 15982 159216 31 20 930 17384 181764 32 20 962 18830 206308 33 20 1002 20350 233012 34 20 1034 21912 261896 35 20 1074 23550 293138 36 20 1106 25228 326696 37 20 1146 26984 362804 38 20 1178 28778 401434 39 20 1218 30652 442762 40 20 1250 32562 486776 41 20 1290 34554 533702 42 20 1322 36580 583470 f(n, 2) = f(n – 2, 2) + 72 for n 9. f(n, 3) = 2f(n – 2, 3)- f(n – 4, 3) + 234 for n 16.
Conclusion • We presented 3 bit-parallel algorithms for efficiently computing all the runs in short strings. • O(n2) time if n = O(word size) • First algorithm can be used for strings with larger alphabet size at some cost • Two latter algorithms specialized for binary strings* and very efficient * We recently noticed that they can be adapted to handle larger alphabets • Calculated ρ (n) for binary strings of length up to n=47