170 likes | 259 Views
HW 1 solution comments. superstring question from last week (Patchrawat’s solution) aa(ba) n (ba) n ba a(ba) n bb Comparison 2n+6 versus 4n+5 which is asymptotically 2. Problem 1: Tandem Arrays. 1: Two comments about overlap examples Definition: more than one occurrence of pattern
E N D
HW 1 solution comments • superstring question from last week • (Patchrawat’s solution) • aa(ba)n • (ba)nba • a(ba)nbb • Comparison • 2n+6 versus 4n+5 which is asymptotically 2
Problem 1: Tandem Arrays • 1: Two comments about overlap examples • Definition: more than one occurrence of pattern • Overlap: • b = aaa • String: aaaaaaaaaaa • Array1: aaaaaaaaa • Array2: aaaaaaaaa • Array3: aaaaaaaaa
Tandem Arrays continued • Computing efficiently • Z algorithm on b$S • Now we have an array of Z-values • Occurrences of b are marked by n values • Previous example Z-values • 3 3 3 3 3 3 3 3 3 2 1 • Process right to left • when I find a value of at least n, check entry n to the left • If it has value n, add my value to it • If not, and my value is >n, output my location and my value divided by n
Problems 2 and 3 • 2: everyone did well • 3: Most did fine, but I wanted a more precise answer in some cases
a a k’ k r Problem 3 • Case zk’ > |b| needs no comparisons • P(r+1) != P(|a|+1) or else current z-box larger • P(|a|+1) = P(|b|+1) since zk’ > |b| • therefore, P(r+1) != P(|b|+1) and zk = |b|
GG,1 GA,2 A, output GGA to frame 1 Problem 5 • Complaint: many answers had 3n character assignments and essentially read the characters 3n times total • Better answer: FSA approach
Problems 6 and 8 • Most submitted programs were ok • I tried to write comments somewhere on your assignments if there were any bugs • In the future, provide • README file or makefile • clear input instructions • let me input test cases so I can try simple values
Problems 9 and 10 • 9: Please submit using handin so that I can more easily use it to test any programs • Should be fairly comprehensive • 10: A few people wrote some comments down and maybe an example • Empirical means experimental • Design a sets of tests with inputs of some type • Characterize your input set • Give me summarized statistical data on how the various algorithms did
Problem 7 • Key idea • While one shift with just the bad character rule may be worse than one shift with the max of the bad character and good suffix rule, future shifts may pay off • A couple of people had correct solutions where bad character alone was better, but I would like you to push it a little to see how much better it can be • Example • Text: a(an-1x)k • Pattern: ban-1 • n+k comparisons versus kn comparisons
Bad character example • n=4, k=4 • aaaaxaaaxaaaxaaax • baaa • baaa • baaa • baaa • baaa
Same example with both rules • n=4, k=4 • aaaaxaaaxaaaxaaax • baaa • baaa • baaa • baaa • baaa
Problem 4 • Hard problem • All answers had mistakes or were very vague about how to update the mapping as we changed the starting point of our z-box • Consider the following example
Example • Parameters: a, b, c, d, e • Tokens: X • P = aXXabXXbaX • T = ecXXcdXXdeXXedX • Z values for P • 1 2 3 4 5 6 7 8 9 0 • a X X a b X X b a X • - 0 0 1 6 0 0 1 2 0 • copy to board
Example continued • ecXXcdXXdeXXedX • aXXabXXbaX • 08 • a maps to c • b maps to d • 001
Example continued • ecXXcdXXdeXXedX • aXXabXXbaX • 08001 • aXXabXXbaX • From P, we have 6 for next entry which extends beyond Z-box window of 4 • By problem 3, this would be just 4, but right answer is 10 • Now the mapping is a to d, b to e, and we need to do this WITHOUT going backwards and rechecking previously check positions. How?
Offset Array • Offset array for P • aXXabXXbaX • 1234567890 • 0003000350 • Offset array for T • ecXXcdXXdeXXedX • 123456789012345 • 000030003900350 • Matching • Match if both offsets are (0 or to left of current Z-box) • Else match if both offsets are identical
Example with offsets • ecXXcdXXdeXXedX • aXXabXXbaX • 08001 • aXXabXXbaX • offset for e is 9 which is outside z-box • offset for b is 0 • offset for d is 5 • offset for a is 5