230 likes | 404 Views
CS222 Algorithms First Semester 2003/2004. Dr. Sanath Jayasena Dept. of Computer Science & Eng. University of Moratuwa Lecture 7 (28/10/2003) String Matching Part 2 Greedy Approach. Overview. Previous lecture: String Matching Part 1 Naïve Algorithm, Rabin-Karp Algorithm This lecture
E N D
CS222 AlgorithmsFirst Semester 2003/2004 Dr. Sanath Jayasena Dept. of Computer Science & Eng. University of Moratuwa Lecture 7 (28/10/2003) String Matching Part 2 Greedy Approach
Overview • Previous lecture: String Matching Part 1 • Naïve Algorithm, Rabin-Karp Algorithm • This lecture • String Matching Part 2 • String Matching using Finite Automata • Knuth-Morris-Pratt (KMP) Algorithm • Greedy Approach to Algorithm Design
String Matching PART 2
Finite Automata • A finite automatonM is a 5-tuple (Q, q0, A, ,δ), where • Q is a finite set of states • q0εQ is the start state • A Q is a set of accepting states • is a finite input alphabet • δ is the transition function that gives the next state for a given current state and input
How a Finite Automaton Works • The finite automaton M begins in state q0 • Reads characters from one at a time • If M is in state q and reads input character a, M moves to state δ(q,a) • If its current state qis inA, M is said to have accepted the string read so far • An input string that is not accepted is said to be rejected
Example • Q = {0,1}, q0 = 0, A={1}, = {a, b} • δ(q,a) shown in the transition table/diagram • This accepts strings that end in an odd number of a’s; e.g., abbaaa is accepted, aa is rejected a input a b state 0 1 0 0 1 b 1 0 0 a transition table b transition diagram
String-Matching Automata • Given the pattern P [1..m], build a finite automaton M • The state set is Q={0, 1, 2, …, m} • The start state is 0 • The only accepting state is m • Time to build M can be large if is large
String-Matching Automata …contd • Scan the text string T [1..n] to find all occurrences of the pattern P [1..m] • String matching is efficient: Θ(n) • Each character is examined exactly once • Constant time for each character • But …time to compute δ is O(m ||) • δ Has O(m || ) entries
Algorithm Input: Text string T [1..n], δ and m Result: All valid shifts displayed FINITE-AUTOMATON-MATCHER (T, m, δ) n← length[T] q ← 0 fori ← 1 ton q ← δ (q, T [i]) ifq = m print “pattern occurs with shift” i-m
Knuth-Morris-Pratt (KMP) Method • Avoids computing δ(transition function) • Instead computes a prefix functionπin O(m) time • π has only m entries • Prefix function stores info about how the pattern matches against shifts of itself • Can avoid testing useless shifts
Terminology/Notations • String w is a prefix of string x, if x=wy for some string y (e.g., “srilan” of “srilanka”) • String w is a suffix of string x, if x=yw for some string y (e.g., “anka” of “srilanka”) • The k-character prefix of the pattern P [1..m] denoted by Pk • E.g., P0= ε, Pm = P =P [1..m]
Prefix Function for a Pattern • Given that pattern prefix P [1..q] matches text characters T [(s+1)..(s+q)], what is the least shift s’ > s such that P [1..k] = T [(s’+1)..(s’+k)] where s’+k=s+q? • At the new shift s’, no need to compare the first k characters of P with corresponding characters of T • Since we know that they match
Prefix Function: Example 1 b a c b a b a b a a b c b a T s a b a b a c a P q b a c b a b a b a a b c b a T s’ a b a b a c a P k a b a b a Pq Compare pattern against itself; longest prefix of P that is also a suffix of P5 is P3; so π[5]= 3 Pk a b a
Knuth-Morris-Pratt (KMP) Algorithm • Information stored in prefix function • Can speed up both the naïve algorithm and the finite-automaton matcher • KMP Algorithm on the board • 2 parts: KMP-MATCHER, PREFIX • Running time • PREFIX takes O(m) • KMP-MATCHER takes O(m+n)
Introduction • Greedy methods typically apply to optimization problems in which a set of choices must be made to arrive at an optimal solution • Optimization problem • There can be many solutions • Each solution has a value • We wish to find a solution with the optimal (minimum or maximum) value
Example Optimization Problems • How to give a balance in minimum number of coins? • How to allocate resources to maximize profit from your business? • A thief has a knapsack of capacity c; what items to put in it to maximize profit? • 0-1 knapsack problem (binary choice) • Fractional knapsack problem
Greedy Approach • Make each choice in a locally optimal manner • Always makes the choice that looks best at the moment • We hope that this will lead to a globally optimal solution • Greedy method doesn’t always give optimal solutions, but for many problems it does
Example • A cashier gives change using coins of Rs.10, 5, 2 and 1 • Suppose the amount is Rs. 37 • Need to minimize the number of coins • Try to use the largest coin to cover the remaining balance • So, we get 10 + 10 + 10 + 5 + 2 • Does this give the optimal solution?
Elements of Greedy Approach • Greedy-choice property • A globally optimal solution can be arrived at by making a locally optimal (greedy) choice • Proving this may not be trivial • Optimal substructure • Optimal solution to the problem contains within it optimal solutions to subproblems
Applications of Greedy Approach • Graph algorithms • Minimum spanning tree • Shortest path • Data compression • Huffman coding • Activity selection (scheduling) problems • Fractional knapsack problem • Not the 0-1 knapsack problem
Announcements • Assignment 4 • assigned today • due next week • Next 2 lectures • Topic: Graphs • By Ms Sudanthi Wijewickrema