200 likes | 562 Views
A Fast String Matching Algorithm. The Boyer Moore Algorithm. The obvious search algorithm . Considers each character position of str and determines whether the successive patlen characters of str matches pat . In worst case, the number of comparisons is in the order of i*patlen .
E N D
A Fast String Matching Algorithm The Boyer Moore Algorithm
The obvious search algorithm • Considers each character position of str and determines whether the successive patlen characters of str matches pat. • In worst case, the number of comparisons is in the order of i*patlen. Ex. pat: aab ; str: ..aaaaac .
Knuth-Pratt-Morris Algoritm • Linear search algorithm. • Preprocesses pat in time linear in patlen and searches str in time linear in i+patlen. EXAMPLE HERE IS A SIMPLE EXAMPLE … EXAMPLE EXAMPLE EXAMPLE
Characteristics of Boyer Moore Algorithm • Basic idea: string matches the pattern from the right rather than from the left. • Expected value: c*( i +patlen ), c<1 • Preprocessing pat and compute two tables: delta1 & delta2 for shifting pat & the pointer of str. • Ex. pat : AT-THAT; str : …WHICH-FINALLY-HALTS.—AT-THAT-POINT
Informal Description Compare the last char of the pat with the patlenth char of str : AT-THAT WHICH-FINALLY-HALTS.—AT-THAT-POINT Observation 1: charis not to occur in pat, skip patlen( =delta1(F) ) chars of str. AT-THAT
Informal Description Observation 2: char is in pat, slide pat downdelta1(-) positions so that char is aligned to the corresponding character in pat. delta1(char)= if char not occur in pat,then patlen ; else patlen –j , where j is the maximum integer such that pat(j)=char. • AT-THAT • WHICH-FINALLY-HALTS.--AT-THAT-POINT
Informal Description Observation 3a:str matches the last m chars of pat, and came to a mismatch at some new char. Move strptr by delta1(L).(pat shifted by delta1(L)-m) AT-THAT …FINALLY-HALTS.--AT-THAT-POINT AT-THAT
Informal Description Observation 3b: the final m chars of pat(a subpat) is matched, find the right most plausible reoccurrence of the subpat, align it with the matched m chars of str (slide pat delta2(-) positions). AT-THAT …FINALLY-HALTS.—AT-THAT-POINT AT-THAT AT-THAT
The delta1 & delta2 tables • The delta1 table has as many entries as there are chars in the alphabet. Ex. pat: a b c d e ; a t – t h a t delta1: 4 3 2 1 0 else,5; 1 0 4 0 2 1 0 else,7 • The delta2 table has as many entries as there are chars in pat. delta2( j )= ( j + 1- rpr(j) ) + (patlen – j)= patlen + 1 - rpr(j) Ex. pat: a b c d e ; a t - t h a t delta2: 9 8 7 6 1 ; 11 10 9 8 7 8 1
The algorithm stringlen length of string. i patlen. top : if i > stringlen then return false. j patlen. loop: if j=0 then return i+1. if string(i)=pat(j) then j j-1 i i-1 goto loop. close; i i +max( delta1(sting(i)) , delta2(j)) goto top.
The Implementation in mstring.c • Function: make_skip(char*, int) • Purpose: create the skip(delta 1) table • Function inputs: char *ptrn, int plen • Local variables: int *skip, *sptr • Return: int *skip • Function: make_shift(char*, int) • Purpose: create the shift(delta2) table • Function inputs: char*ptrn, int plen • Local variables: int *shift, *sptr; char *pptr, c • Return: int *shift
Flowchart of make_skip() Allocate memory to skip Return skip true *skip++=plen+1 plen==0? false skip[*ptrn++]=plen--
make_skip() int *make_skip(char *ptrn, int plen) { int *skip = (int *) malloc(256 * sizeof(int)); int *sptr = &skip[256]; if (skip == NULL) FatalPrintError("malloc"); while(sptr-- != skip) *sptr = plen + 1; while(plen != 0) skip[(unsigned char) *ptrn++] = plen--; return skip; }
Allocate memory to shift Procedures of make_shift(): c=ptrn[plen-1]; Look for rpr of c Look for two identical subpat Assign values to shift Return shift
make_shift() int *shift = (int *) malloc(plen * sizeof(int)); int *sptr = shift + plen - 1; char *pptr = ptrn + plen - 1; char c; if (shift == NULL) FatalPrintError("malloc"); c = ptrn[plen - 1]; *sptr = 1;
make_shift() while(sptr-- != shift) { char *p1 = ptrn + plen - 2, *p2, *p3; do { while(p1 >= ptrn && *p1-- != c); p2 = ptrn + plen - 2; p3 = p1; while(p3 >= ptrn && *p3-- == *p2-- && p2 >= pptr); } while(p3 >= ptrn && p2 >= pptr); // p2>=j,p3>=1 *sptr = shift + plen - sptr + p2 - p3; pptr--; }return shift;
Ex:j=5 j= 1 2 3 4 5 6 7 Pat: edbcabc step1 p1 step2 p3 p2 syep3 p3 p2 ∴ delta2( j )= (p2-p3)+ (plen – j) =5