230 likes | 361 Views
Modeling Delta Encoding of Compressed Files. S.T. Klein, T.C. Serebro, D. Shapira. Delta Encoding. Example: S=The Prague Stringology Club T=The Prague Stringology Conference 06 Δ =(1, 24)onferenc(3,2)06. Compressed Differencing. Delta encoding:.
E N D
Modeling Delta Encoding of Compressed Files S.T. Klein, T.C. Serebro, D. Shapira
Delta Encoding Example: S=The Prague Stringology Club T=The Prague Stringology Conference 06 Δ=(1, 24)onferenc(3,2)06
Compressed Differencing Delta encoding: Goal- Create a delta file of S and T, without decompressing the compressed files. Semi Compressed Differencing: Full Compressed Differencing: E(S) E(S) S E(T) S T Δ(S,T)
LZW compression STR = input character WHILE there are input characters { C = input character IF STR C is in T then STR = STR C ELSE { output the code for STR add STR C to T STR = C } } output the code for STR
Example S =abccbaaabccba E(S) =1233219571
construct the trie of E(S) i 1 while i ≤ u{ P Starting at the root, traverse the trie using P When a leaf v is reached k depth of v in trie output the position in S corresponding to v ii+ k } Semi Compressed Differencing Algorithm
Example E(S) =1233219571, T =ccbbabccbabccbba. (5,2) (9,3) b Δ(S,T)= (3,2) b (5,2) (9,3) (5,2)
Full Compressed Differencing Algorithm 1 construct the trie of E(S) 2 flag 0 // output character k 3 counter 1 // position in T 4 input oldcw from E(T) 5 while oldcwNULL // still processing E(T) { 5.1 input cw from E(T) 5.2 node Dictionary[oldcw] 5.3 if (Dictionary[cw] NULL) 5.3.1 k first character of string corresponding to Dictionary[cw] 5.4 else 5.4.1 k first character of string corresponding to node 5.5 if ((node has a child k) and (cwNULL)) 5.5.1 output (pos+flag,len-flag) corresponding to child k of node 5.5.2 flag 1 5.6 else 5.6.1 output (pos+flag, len-flag) corresponding to node 5.6.2 create a new child of node corresponding to k 5.6.3 flag 0 5.7 pos of child k of node counter 5.8 oldcw cw 5.9 counter counter + len - flag }
Example E(S) =1233219571 E(T) =33221247957
Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=c oldcw=3 E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T= 3
Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 k=c <3, 2> Δ(S,T)= 3 4 (1,2,c)
Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=cc oldcw=3 cw=3 flag=1 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=3 flag=1 k=c <3, 2> Δ(S,T)= 4 (1,2,c)
Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=2 flag=1 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=2 flag=1 k=b <3, 2> <5, 1> Δ(S,T)= 5 (2,2,c) 4 (1,2,c)
b Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccb oldcw=3 cw=2 flag=1 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=3 cw=2 flag=1 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=2 cw=2 flag=1 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=2 cw=2 flag=0 k=b <3, 2> <5, 1> <b, 0> Δ(S,T)= 6 (3,2,b) 5 (2,2,c) 4 (1,2,c)
b Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbb oldcw=2 cw=2 flag=0 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=2 flag=0 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=1 flag=0 k=a E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=1 flag=1 k=a <3, 2> <5, 1> <5, 2> Δ(S,T)= 7 (4,2,b) 6 (3,2,b) 5 (2,2,c) 5 (2,2,c) 4 (1,2,c) 4 (1,2,c)
b Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbba oldcw=2 cw=1 flag=1 k=a E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbab oldcw=2 cw=1 flag=1 k=a E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbab oldcw=1 cw=2 flag=1 k=b <3, 2> <5, 1> <5, 2> <2,1> Δ(S,T)= 8 (5,2,a) 7 (4,2,b) 6 (3,2,b) 5 (2,2,c) 5 (2,2,c) 4 (1,2,c) 4 (1,2,c)
b Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabcc oldcw=2 cw=4 flag=1 k=c <3, 2> <5, 1> <5, 2> <2,1> <3, 1> Δ(S,T)= 8 (5,2,a) 7 (4,2,b) 6 (3,2,b) 9 (6,2,b) 5 (2,2,c) 5 (2,2,c) 4 (1,2,c) 4 (1,2,c)
b b 10 (7,3,c) Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccba oldcw=4 cw=7 flag=1 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccba oldcw=4 cw=7 flag=0 k=b <3, 2> <5, 1> <5, 2> <2,1> <3, 1> (2, 1) Δ(S,T)= 8 (5,2,a) 7 (4,2,b) 6 (3,2,b) 9 (6,2,b) 5 (2,2,c) 4 (1,2,c)
b b b 11 (9,3,b) 13 (13,3,c) Example E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccbba oldcw=5 cw=7 flag=1 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccb oldcw=9 cw=5 flag=1 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccb oldcw=9 cw=5 flag=0 k=c E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabccbba oldcw=7 cw=Null flag=0 k=b E(S) =1233219571 E(T) =33221247957 S =abccbaaabccba T=ccbbabccbabc oldcw=7 cw=9 flag=0 k=b <3, 2> <5, 1> <5, 2> <2,1> <3, 1> (2, 1) (4, 2) <9, 3> (3, 1) (4, 2) Δ(S,T)= 8 (5,2,a) 7 (4,2,b) 6 (3,2,b) 9 (6,2,b) 5 (2,2,c) 4 (1,2,c) b 12 (11,3,b) 10 (7,3,c)
Combination of Pairs If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair <3, 2> <3, 2> <3, 3> <5, 1> <5, 1> <5, 2> <2,1> <3, 1> (2, 1) (4, 2) <9, 3> (3, 1) (4, 2) Δ(S,T)= S =abccbaaabccba S =abccbaaabccba S =abccbaaabccba
<5, 2> <2,2> c (4, 2) <9, 3> b (4, 2) Δ(S,T)= <3, 3> Combination of Pairs If two consecutive ordered pairs are of the form and , we combine them into a single ordered pair <5, 2> <2,1> <2,1> <2, 2> <3, 1> <3, 1> (2, 1) (4, 2) <9, 3> (3, 1) (4, 2) Δ(S,T)= <3, 3> S =abccbaaabccba S =abccbaaabccba S =abccbaaabccba
<5, 2> <2,2> c (4, 2) <9, 3> b (4, 2) Δ(S,T)= <3, 3> Encoding the delta file File consists of: (pos, len) in S (pos, len) in T Characters flags
Experiments: S = xfig.3.2.1 T = xfig.3.2.2 |T| = 812K |Gzip(T)| = 325K |LZW(T)| = 497K |Δ(S,T)| 3K