210 likes | 363 Views
RAIK 283: Data Structures & Algorithms. Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu ylu@cse.unl.edu. *slides referred to http://www.aw-bc.com/info/levitin. Motivation example.
E N D
RAIK 283: Data Structures & Algorithms Space-Time Tradeoffs: String Matching Algorithms* Dr. Ying Lu ylu@cse.unl.edu *slides referred to http://www.aw-bc.com/info/levitin Design and Analysis of Algorithms – Chapter 7
Motivation example • How to sort array a[1], a[2], …., a[n], whose values are known to come from a set, e.g., {a, b, c, d, e, f, …. x, y, z}? Design and Analysis of Algorithms – Chapter 7
Space-time tradeoffs For many problems some extra space really pays off: • Extra space in tables • hashing • non comparison-based sorting (e.g., sorting by counting the array a[1], a[2], …., a[n], whose values are known to come from a set, e.g., {a, b, c, d, e, f, …. x, y, z}) • Input enhancement • auxiliary tables (shift tables for pattern matching) • indexing schemes (e.g., B-trees) • Tables of solutions for overlapping subproblems • dynamic programming Design and Analysis of Algorithms – Chapter 7
String matching • pattern: a string of m characters to search for • text: a (long) string of n characters to search in • Brute force algorithm? Design and Analysis of Algorithms – Chapter 7
String matching • pattern: a string of m characters to search for • text: a (long) string of n characters to search in • Brute force algorithm: • Align pattern at beginning of text • moving from left to right, compare each character of pattern to the corresponding character in text until • all characters are found to match (successful search); or • a mismatch is detected • while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat step 2. • Time efficiency: Design and Analysis of Algorithms – Chapter 7
String matching • pattern: a string of m characters to search for • text: a (long) string of n characters to search in • Brute force algorithm: • Align pattern at beginning of text • moving from left to right, compare each character of pattern to the corresponding character in text until • all characters are found to match (successful search); or • a mismatch is detected • while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat step 2. • Time efficiency: (nm) Design and Analysis of Algorithms – Chapter 7
Horspool’s Algorithm • A simplified version of Boyer-Moore algorithm that retains key insights: • compare pattern characters to text from right to left • given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement) Design and Analysis of Algorithms – Chapter 7
How far to shift when a mismatch occurs ? Look at first (rightmost) character in text that was compared. Three cases: • The character is not in the pattern .....c...................... (c not in pattern) BAOBAB • The character is in the pattern (but not at rightmost position) .....O...................... (O occurs once in pattern) BAOBAB .....A...................... (A occurs twice in pattern) BAOBAB • The rightmost characters produced a match .....B...................... BAOBAB .....B...................... AAOAAB • Shift Table: Stores number of characters to shift depending on first character compared Design and Analysis of Algorithms – Chapter 7
Shift table • Indexed by text and pattern alphabet • If a character c is not among the first m-1 characters of the pattern, then t(c) = the pattern’s length m • Otherwise, t(c) = the distance from the rightmost c among the first m-1 characters of the pattern to its last character Design and Analysis of Algorithms – Chapter 7
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Shift table • Constructed by scanning pattern before search begins • Indexed by text and pattern alphabet • All entries are initialized to length of pattern. E.g., BAOBAB: • For c occurring in the first m-1 characters of the pattern, update table entry to distance of rightmost c among the first m-1 characters of the pattern to its last character Design and Analysis of Algorithms – Chapter 7
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Shift table • Constructed by scanning pattern before search begins • Indexed by text and pattern alphabet • All entries are initialized to length of pattern. E.g., BAOBAB: • For c occurring in the first m-1 characters of the pattern, update table entry to distance of rightmost c among the first m-1 characters of the pattern to its last character Design and Analysis of Algorithms – Chapter 7
Shift table • We can create the shift table by processing pattern from L→R: Algorithm ShiftTable(P[0…m-1]) { initialize all the elements of Table with m for j = 0 to m-2 do Table[P[j]] = m-1-j return Table } Design and Analysis of Algorithms – Chapter 7
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Example BARD LOVED BANANAS BAOBAB Design and Analysis of Algorithms – Chapter 7
In-class exercises • P267: 7.2.1--- Apply Horspool’s algorithm to search for the pattern BAOBAB in the text BESS_KNEW_ABOUT_BAOBABS • P268: 7.2.3--- How many character comparisons will be made by Horspool’s algorithm in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 • 10000 Design and Analysis of Algorithms – Chapter 7
In-class exercises • P264: 7.2.1--- Apply Horspool’s algorithm to search for the pattern BAOBAB in the text BESS_KNEW_ABOUT_BAOBABS • P264: 7.2.3--- How many character comparisons will be made by Horspool’s algorithm in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 (1*996 = 996 comparisons) • 10000 (5*996 = 4980 comparisons) Design and Analysis of Algorithms – Chapter 7
Time Efficiency • Horspool’s algorithm • (nm) in the worst-case. • (n) for random texts Design and Analysis of Algorithms – Chapter 7
In-Class Exercises • You have an array of positive integers like ar[]={1,3,2,4,5,4,2}. You need to create another array ar_low[] such that ar_low[i] = number of elements lower than or equal to ar[i]. So the output of above should be {1,4,3,6,7,6,3}. • Algorithm time complexity should be O(n) and use of extra space is allowed.
Boyer-Moore algorithm • Based on same two ideas: • compare pattern characters to text from right to left • given a pattern, create a shift table that determines how much to shift the pattern when a mismatch occurs (input enhancement) • Uses additional shift table with same idea applied to the number of matched characters Design and Analysis of Algorithms – Chapter 7
In-class exercises • P264: 7.2.7 How many character comparisons will Boyer-Moore algorithm make in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 • 10000 Design and Analysis of Algorithms – Chapter 7
In-class exercises • P264: 7.2.7 How many character comparisons will Boyer-Moore algorithm make in searching for each of the following patterns in the binary text of 1000 zeros? • 00001 (1*996 = 996 comparisons) • 10000 (5*200 = 1000 comparisons) Design and Analysis of Algorithms – Chapter 7
In-class exercises • P264: 7.2.3--- How many character comparisons will be made by Horspool’s algorithm in searching for each of the following patterns in the binary text of 1000 zeros? • 01010 • P264: 7.2.7 How many character comparisons will Boyer-Moore algorithm make in searching for each of the following patterns in the binary text of 1000 zeros? • 01010 Design and Analysis of Algorithms – Chapter 7