1 / 32

Joint Advanced Student School 2004 Complexity Analysis of String Algorithms

Joint Advanced Student School 2004 Complexity Analysis of String Algorithms Sequential Pattern Matching : Analysis of Knuth-Morris-Pratt type algorithms using the Subadditive Ergodic Theorem 03 April 2014. Overview. Pattern Matching Sequential Algorithms Knuth-Morris-Pratt-Algorithm

favian
Download Presentation

Joint Advanced Student School 2004 Complexity Analysis of String Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Joint Advanced Student School 2004Complexity Analysis of String Algorithms Sequential Pattern Matching:Analysis of Knuth-Morris-Pratt type algorithms using the Subadditive Ergodic Theorem 03 April 2014

  2. Overview • Pattern Matching • Sequential Algorithms • Knuth-Morris-Pratt-Algorithm • Probabilistic tools • Subadditive Ergodic Theorem • Martingales and Azuma's Inequality • Analysis of KMP-Algorithms • Properties of KMP • Establishing subadditivity • Analysis

  3. Pattern Matching • Text , pattern • Comparison: • Alignment Position:for some k. Pattern-text comparison: M(l,k)=1 Pattern p abcde Text t xxxxxabxxxabcxxxabcde Alignment position AP

  4. Sequential Algorithms - Definition • Semi-sequential: AP are non-decreasing. • Strongly semi-sequential: (i) and comparisons define non-decreasing text positions . • Sequential: (i) and • Strongly sequential: (i), (ii) and (iii) abcde Text is compared only if following a prefix of the pattern. Example: xxxxxabxxxabcxxxabcde

  5. Example: Naive / brute force algorithm • Every text position is alignment position. • Text is scanned until... • pattern is found - then done. • mismatch occurs - then shift by one and retry. • Sequential algorithm. +1 +1 abcde +1 abcde abcde xxxxxabxxxabcxxxabcde

  6. Knuth-Morris-Pratt type algorithms (1) • Idea: (Morris-Pratt) Disreagard APs already known not to be followed by a prefix of p. • Knowledge: • Already processed pattern • Pre-processing of p. • Strongly sequential algorithm. +S ababcde ababcde xxxxxabxxxabcxxxabcde

  7. Knuth-Morris-Pratt type algorithms (2) • Morris-Pratt: • Knuth-Morris-Pratt: ababcde ababcde xxxxxabxxxabcxxxabcde ababcde (KMP also skips mismatching letters) ababcde xxxxxabxxxabcxxxabcde

  8. Pattern Matching - Complexity • Overall complexity: • Pattern or text is a realization of random sequence: • Question: complexity of KMP?

  9. Subadditivity – Deterministic Sequence Fekete (1923) • Subadditivity: • Superadditivity:

  10. Example: Longest Common Subsequence • Superadditive: • Hence: ababcafbcdabcde ababcafb cdabcde abcdeabcdfabcab abcdeabc dfabcab LCS: "abcabcdabc" (10) LCS: "abcab" (5), "dabc" (4) (Conjectured by Steele in 1982)

  11. Subadditivity – "Almost subadditive" DeBruijn and Erdös (1952) • positive and non-decreasing sequence • "Almost subadditive":

  12. Subadditive Ergodic Theorem Kingman (1976), Liggett (1985) • is a stationary sequence • does not depend on m

  13. Almost Subadditive Ergodic Theorem Deriennic (1983) • Subadditivity can be relaxed towith • Then, too:

  14. Martingales • A sequenceis a martingale with respect to the filtration if for all : • defines a random variable depending on the knowledge contained in .

  15. Martingale Differences • The martingale difference is defined asso that: • Observe:

  16. Azuma's Inequality (1) • Let be a martingale • Define the martingale difference as(The mean of the same element but depending on different knowledge) • Observe: (Deviation from the mean)

  17. Hoeffding's Inequality • Let be a martingale • Let there exist constant • Then:

  18. Azuma's Inequality (2) • Summary: • If is bounded, we know how to assess the deviation from the mean. • So now we need a bound on . • Trick: Let be an independent copy of . • Then:

  19. Azuma's Inequality (3) • Hence: • And we can postulate:

  20. Azuma's Inequality (4) • Let be a martingale • If there exists constant such thatwhere is an independent copy of • Then:

  21. KMP: Unavoidable alignment positions • A position in the text is called unavoidable AP if for any r,l it's an AP when run on . • KMP-like algorithms have the same set of unavoidable alignment positionswhere • Example: abcde xxxxxabxxxabcxxxabcde

  22. Pattern Matching: l-convergence • An algorithm is l-convergent if there exists an increasing sequence of unavoidable alignment positions satisfying • l-convergence indicates the maximum size "jumps" for an algorithm.

  23. KMP: Establishing m-convergence • Let AP be an alignment position • Define: • Hence: and so KMP-like algorithms are m-convergent.

  24. KMP: Establishing subadditivity (1) • If (number of comparisons) is subadditive we can prove linear complexity of KMP-like algorithms. • We have to show: is (almost) subadditive: • Approach:An l-convergent sequential algorithm satisfies:

  25. KMP: Establishing subadditivity (2) • Proof: • : the smallest unavoidable AP greater than r. • We split into and .

  26. KMP: Establishing subadditivity (3) • Comparisons done after r with AP before r: • Comparisons with AP between r and : • No more than m comparisons can be saved at Contributing to only Contributing to and ? ? ? S2 ? Contributing to and ? S1 ?

  27. KMP: Establishing subadditivity (4) • Comparisons with AP between r and : • No more than m comparisons can be saved at Contributing to only ? ? ? S3 ? Contributing to and

  28. KMP: Establishing subadditivity (5) • So we are able to bound: • We have shown: is (almost) subadditive: • Now we are able to apply the Subadditive Ergodic Theorem.

  29. KMP: Different Modeling Assumptions • Deterministic Model:Text and pattern are non random. • Semi-Random Model:Text is a realization of a stationary and ergodic sequence, pattern is given. • Stationary model:Both text and pattern are realizations of a stationary and ergodic sequence.

  30. KMP: Applying the Subadditive Ergodic Theorem • We have shown: is (almost) subadditive • Deterministic Model: • Semi-Random Model: • Stationary Model:

  31. KMP: Applying Azuma's Inequality • satisfies:where is an independent copy of . • So, using Azuma's Inequality: • is concentrated around its mean:

  32. Conclusion • Using the Subadditive Ergodic Theorem we can show there exists a linearity constant for the worst and average case resp.KMP has linear complexity. • The Subadditive Ergodic Theorem proves the existence of this constant but says nothing how to compute it. • Using Azuma's Inequality we can show that the number of comparisons is well concentrated around its mean.

More Related