1 / 39

Efficient Pattern Matching using DFA Algorithm

Learn how to use the Deterministic Finite Automaton (DFA) algorithm for efficient pattern matching in text strings. Understand the concept of states and transitions in building a DFA and simulate its operation on different texts. Discover how DFA minimizes backtracking and accelerates pattern recognition strategies. Explore step-by-step examples to grasp the practical implementation of the DFA algorithm for pattern searching.

dstapleton
Download Presentation

Efficient Pattern Matching using DFA Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KMP algorithm

  2. KMP algorithm • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • }

  3. General idea • Avoid backing up in the text string on a mismatch • For example • Text: 00000000000000000000000000000000000001 • Pattern: 000000001 • When we find a mismatch, how could we move forward in the text? • Cleverer way than Brute force ? • How to analyze the pattern?

  4. How ? Build a DFA • DFA – Deterministic finite-state automata • DFA = States + Transitions • States • For a pattern with m characters, there are (m + 1) states in the DFA • `At state j` means the first (j – 1) characters in the pattern are matched • The last state indicates ACCEPT (AC), i.e all characters in the pattern are matched • But we do not allocate entry for this state

  5. How ? Build a DFA • DFA – Deterministic finite-state automata • DFA = States + Transitions • Transitions • At each state, there are R possible transitions, in which R is the number of all possible characters • Formalize transitions as dfa[next_char][current_state] = next_state

  6. How ? Build a DFA • Explanation: dfa[next_char][current_state] = next_state • Suppose we are now at current_state • If we see that the next character is next_char, then we should transit to next_state • Therefore, dfa[R][m] is a 2-dimensional table exhaustively enumerates all possible cases • m – we do not allocate entry for the accept state

  7. How ? Build a DFA • Explanation: dfa[next_char][current_state] = next_state • Pattern: ABABAC (assume R=3 and the only characters are A,B,C) • 2D array representation • Directed graph representation

  8. How to use DFA ? • Example • Text: ABCABABABACA • Pattern: ABABAC

  9. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • State with state 0 • ABCABABABACA • Goto state 1

  10. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 1 • ABCABABABACA • Goto state 2

  11. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 2 • ABCABABABACA • Goto state 0

  12. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 0 • ABCABABABACA • Goto state 1

  13. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 1 • ABCABABABACA • Goto state 2

  14. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 2 • ABCABABABACA • Goto state 3

  15. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 3 • ABCABABABACA • Goto state 4

  16. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 4 • ABCABABABACA • Goto state 5

  17. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 5 • ABCABABABACA • Goto state 4

  18. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 4 • ABCABABABACA • Goto state 5

  19. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 5 • ABCABABABACA • Goto state 6

  20. public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } How to use DFA ? • Current state 6 (ACCEPT) • ABCABABABACA • j == m, we are now at the (6+1)th state

  21. How to build DFA ? • If we could match the next character, • If we see expected character, go to the next state • Pattern: ABABAC (assume R=3 and the only characters are A,B,C) We only need dfa[R][m] since there is no transition information for the last state A B A B A C 4 5 6 0 1 2 3

  22. How to build DFA ? • If we could match the next character • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } A B A B A C 4 5 6 0 1 2 3

  23. How to build DFA ? • If we failed to match the next character • Copy data from column x • Mimic the transitions of state x • Similar to `I am now in state x` or `restart from state x` • x is a restart state • Update restart state x • x state • Restart state, if we failed matching the j-th character, we restart from state x • How to restart? Since we copied the entries from x for failed cases, it is equivalent to restart from x. • The x state is one state behind our DFA building process at the very beginning. • The x state is updated based on the partially built DFA! It tries to find information in the pattern. • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • }

  24. How to build DFA ? j=0 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 0 c j

  25. How to build DFA ? j=1 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 1 • x (restart state): 0 • Process • Copy dfa[][0] to dfa[][1] • dfa[`B`][1]2 • x dfa[`B`][0] = 0 c j

  26. How to build DFA ? j=1 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • Understand restart state x • You are actually at state 1, but if you see next character is A or C, just suppose you are currently at state 0. Recall the meaning of states in DFA, state 0 means you have matched nothing. c j

  27. How to build DFA ? j=2 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 2 • x (restart state): 0 • Process • Copy dfa[][0] to dfa[][2] • dfa[`A`][2]3 • x dfa[`A`][0] = 1 c j

  28. How to build DFA ? j=2 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 2 • x (restart state): 0 • Understand restart state x • At state 2, you have matched `AB`, but if you see next character is `B` or `C`, you have to start from very beginning (state 0). c j

  29. How to build DFA ? j=2 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 2 • x (restart state): 0 • Understand restart state x • x dfa[`A`][0] = 1, why ? At current state 2, the expect char is `A`, which means if we failed to match at next state 3, we do not need start from the very beginning, since at least we have `A` matched (x=1). c j

  30. How to build DFA ? j=3 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 3 • x (restart state): 1 • Process • Copy dfa[][1] to dfa[][3] • dfa[`B`][3]4 • x dfa[`B`][1] = 2 c j

  31. How to build DFA ? j=3 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 3 • x (restart state): 1 • Understand restart state x • We restart from state 1 if we failed to match the expected `B`. The reason is that we know we have at least a `A` already matched (x=1). Restart state x was set in the previous step. c j

  32. How to build DFA ? j=3 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 3 • x (restart state): 1 • Understand restart state x • x dfa[`B`][1] = 2, why ? At current state 3, the expect char is `B` and restart state 1 tells us `A` is already matched in the pattern. Thus if we failed at next state 4, `AB` are already matched, i.e. we could update restart state x to 2. c j

  33. How to build DFA ? j=4 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 4 • x (restart state): 2 • Process • Copy dfa[][2] to dfa[][4] • dfa[`A`][4] • x dfa[`A`][2] = 3 c j

  34. How to build DFA ? j=4 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 4 • x (restart state): 2 • Understand restart state x • Explanation: at state 4, you already matched `ABAB`, if you failed to match next `A`, you assume you still matched `AB` since restart state is 2. This assumption is achieved by copying the column of 2 for failed cases (`B` and `C`). c j

  35. How to build DFA ? j=5 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 5 • x (restart state): 3 • Process • Copy dfa[][3] to dfa[][5] • dfa[`C`][5]6 • x dfa[`C`][3] = 0 c j

  36. How to build DFA ? j=5 • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • An example (ABABAC) • j (current state): 5 • x (restart state): 3 • Understand restart state x • Explanation: we have already matched `ABABA`, if we failed to match the expected `C`, we assume we have matched `ABA` since restart state is 3 c j

  37. Understand state x • public KMP(String pat) { • this.R = 256; • this.pat = pat; • // build DFA from pattern • int m = pat.length(); • dfa = new int[R][m]; • dfa[pat.charAt(0)][0] = 1; • for (int x = 0, j = 1; j < m; j++) { • for (int c = 0; c < R; c++) • dfa[c][j] = dfa[c][x]; // Copymismatch cases. • dfa[pat.charAt(j)][j] = j+1; // Set match case. • x = dfa[pat.charAt(j)][x]; // Update restart state. • } • } • public int search(String txt) { • // simulate operation of DFA on text • int m = pat.length(); • int n = txt.length(); • int i, j; • for (i = 0, j = 0; i < n && j < m; i++) { • j = dfa[txt.charAt(i)][j]; • } • if (j == m) return i - m; // found • return n; // not found • } Update x when we build the DFA is similar to the state transition when we match pattern in the text

  38. Understand state x • The transition of state x: match the pattern itself using partially constructed DFA table • Build the next state of the DFA: we need to know the info of restart state x • An example (ABABA) • x 0 • x dfa[`B`][0] = 0 • x dfa[`A`][0] = 1 • x dfa[`B`][1] = 2 • x dfa[`A`][2] = 3 • x dfa[`C`][3] = 0

  39. Conclusion • Update x when we build the DFA is similar to the state transition when we match pattern in the text • Understand that the process of building DFA is the same as matching the pattern to itself. • By analyzing the pattern, we know how to move forward when we see failed matching characters.

More Related