190 likes | 356 Views
Graph and String Matching. String Matching Problem. Given a text string T of length n and a pattern string P of length m , the exact string matching problem is to find all occurrences of P in T . Example: T=“ A GCT TGA ” P=“GCT” Applications: Searching keywords in a file
E N D
String Matching Problem • Given a text stringT of length n and a pattern stringP of length m, the exact string matching problem is to find all occurrences of P in T. • Example: T=“AGCTTGA” P=“GCT” • Applications: • Searchingkeywords in a file • Searching engines • Database searching
Terminologies • S=“AGCTTGA” • |S|=7, length of S • Substring: Si,j=SiS i+1…Sj • Example: S2,4=“GCT” • Subsequence of S: deleting zero or more characters from S • “ACT” and “GCTT” are subsquences. • Prefix of S: S1,k • “AGCT” is a prefix of S. • Suffix of S: Sh,|S| • “CTTGA” is a suffix of S.
String Matching • Given a pattern P[1..m] and a text T[1..n], find all occurrences of P in T. Both P and T belong to *. • P occurs with shift s (beginning at s+1): P[1]=T[s+1], P[2]=T[s+2],…,P[m]=T[s+m]. • If so, call s is a valid shift, otherwise, an invalid shift. • Note: one occurrence begins within another one: P=abab, T=abcabababbc, P occurs at s=3 and s=5.
Naïve string matching Running time: O((n-m+1)m).
A Brute-Force Algorithm Time: O(mn) where m=|P| and n=|T|.
Rabin-Karp • The Rabin-Karp string searching algorithm calculates a hash value for the pattern, and for each M-character subsequence of text to be compared. • If the hash values are unequal, the algorithm will calculate the hash value for next M-character sequence. • If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence. • In this way, there is only one comparison per text subsequence, and Brute Force is only needed when hash values match. • Perhaps an example will clarify some things...
Rabin-Karp Example • Hash value of “AAAAA” is 37 • Hash value of “AAAAH” is 100
Rabin-Karp Algorithm pattern is M characters long hash_p=hash value of pattern hash_t=hash value of first M letters in body of text do if (hash_p == hash_t) brute force comparison of pattern and selected section of text hash_t= hash value of next section of text, one character over while (end of text or brute force comparison == true)
vertex edge What is a graph? • A set of vertices and edges • Directed/Undirected • Weighted/Unweighted • Cyclic/Acyclic
Representation of Graphs • Adjacency Matrix • A V x V array, with matrix[i][j] storing whether there is an edge between the ith vertex and the jth vertex • Adjacency Linked List • One linked list per vertex, each storing directly reachable vertices • Edge List
Graph Searching • Why do we do graph searching? What do we search for? • What information can we find from graph searching? • How do we search the graph? Do we need to visit all vertices? In what order?
Depth-First Search (DFS) • Strategy: Go as far as you can (if you have not visit there), otherwise, go back and try another way
Implementation DFS (vertex u) { mark u as visited for each vertex v directly reachable from u if v is unvisited DFS (v) } • Initially all vertices are marked as unvisited
Breadth-First Search (BFS) • Instead of going as far as possible, BFS tries to search all paths. • BFS makes use of a queue to store visited (but not dead) vertices, expanding the path from the earliest visited vertices.
Queue: 4 1 3 6 2 5 Simulation of BFS 1 4 3 2 6 5
Implementation while queue Q not empty dequeue the first vertex u from Q for each vertex v directly reachable from u if v is unvisited enqueue v to Q mark v as visited • Initially all vertices except the start vertex are marked as unvisited and the queue contains the start vertex only