230 likes | 1.08k Views
Strings and Pattern Matching Algorithms. Pattern P[0..m-1] Text T[0..n-1]. Brute Force Pattern Matching. Algorithm BruteForceMatch(T,P): Input: Strings T with n characters and P with m characters Output: String index of the first substring of T matching P, or an
E N D
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T with n characters and P with m characters Output: String index of the first substring of T matching P, or an indication that P is not a substring of T for i:=0 to n-m do //for each candidate index in T do // { j:=0 while (j<m and T[i+j]=P[j]) do j:=j+1 if j=m then return i } return “ there is no substring of T matching P.” Time complexity: O(mn)
c a b c d last(c) 4 5 3 -1 Boyer-Moore Algorithm Improve the running time of the brute-force algorithm by adding two potentially time-saving heuristics: Looking-Glass Heuristics: When testing a possible placement of P[0..m-1] against T[0..n-1], begin the comparisons from the end of P and move backward to the front of P. Character-Jump Heuristic: Suppose that T[i] does not match P[j] and T[i]=c. If c is not contained anywhere in P, then shift P completely past T[i], otherwise, shift P until an occurrence of character c in P gets aligned with T[i]. last(c): if c is in P, last(c) is the index of the last (rightmost) occurrence of c in P. Otherwise, define last(c)=1. Compute-Last-Occurrence(P,m,Σ) for each character c in Σ do last(c) := -1 for j := 0 to m-1 do last(P[j]) := j Time complexity: O(m+ |Σ|) Example: P[0..5] = abacab
…………………….a…………………….. …a………b… Algorithm BMMatch(T,P) Input: Strings T with n characters and P with m characters Output: String index of the first substring of T matching P, or an indication that P is not a substring of T Compute-Last-Occurrence(P,m,Σ) i:= m-1 j:= m-1 repeat { if P[j] = T[i] then if j=0 then return i //a match!// else i:= i-1 j:= j-1 else i:= i+(m-1)-min(j-1, last(T[i])) //jump step// j:= m-1 } until i>n-1 return “ there is no substring of T matching P.” m-j m-j-1 m-last(T[i])-1 …a………b… Time complexity( worst case): O(nm+ |Σ|) Example: T=aaaa…aaaa, P=baa…a Usually it runs much faster.
P: xxxx…………xxxxxxxx P: xxxx…………xxxxxxxx prefix prefix suffix suffix Knuth-Morris-Pratt Algorithm T b a c b a b a b a a a b c b a b … P a b a b a c a a b a b a c a P In general T: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Algorithm KMPPrefixFunction(P) Input: String P[1..m] with m characters Output: The prefix function pre for P, which maps j to the length of the longest prefix of P that is a suffix of P[1..j]. k:= 0 pre(1):= 0 for q := 2 to m do while k > 0 and P[k+1]P[q] do k := pre(k) if P[k+1]= P[q] then k := k+1 pre(q):= k return pre k: index of the last character in the prefix Example Time complexity: O(m)
Algorithm KMPMatch(T,P) Input: Strings T[1..n] with n characters and P[1..m] with m characters Output: String index of the first substring of T matching P, or an indication that P is not a substring of T pre:= KMPPrefixFunction(P) j:=0 for i:= 1 to n do while j>0 and P[j+1] ≠ T[i] do j := pre(j) if P[j+1] = T[i] then j := j+1; if j = m then print “Pattern occurs with shift” i-m; //a match!// j := pre(j) // look for the next match// Time complexity: O(m+n)
Assignment (1) How many character comparisons will be Boyer-Moore algorithm make in searching for each of the following patterns in the binary text? Text: repeat “01110” 20 times Pattern: (a) 01111, (b) 01110 (2) (i) Compute the prefix function in KMP pattern match algorithm for pattern ababbabbabbababbabb when the alphabet is ∑ = {a,b}. (ii) How many character comparisons will be KMP pattern match algorithm make in searching for each of the following patterns in the binary text? Text: repeat “010011” 20 times Pattern: (a) 010010, (b) 010110