1 / 19

Graph and String Matching

Graph and String Matching. String Matching Problem. Given a text string T of length n and a pattern string P of length m , the exact string matching problem is to find all occurrences of P in T . Example: T=“ A GCT TGA ” P=“GCT” Applications: Searching keywords in a file

Download Presentation

Graph and String Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph and String Matching

  2. String Matching Problem • Given a text stringT of length n and a pattern stringP of length m, the exact string matching problem is to find all occurrences of P in T. • Example: T=“AGCTTGA” P=“GCT” • Applications: • Searchingkeywords in a file • Searching engines • Database searching

  3. Terminologies • S=“AGCTTGA” • |S|=7, length of S • Substring: Si,j=SiS i+1…Sj • Example: S2,4=“GCT” • Subsequence of S: deleting zero or more characters from S • “ACT” and “GCTT” are subsquences. • Prefix of S: S1,k • “AGCT” is a prefix of S. • Suffix of S: Sh,|S| • “CTTGA” is a suffix of S.

  4. String Matching • Given a pattern P[1..m] and a text T[1..n], find all occurrences of P in T. Both P and T belong to *. • P occurs with shift s (beginning at s+1): P[1]=T[s+1], P[2]=T[s+2],…,P[m]=T[s+m]. • If so, call s is a valid shift, otherwise, an invalid shift. • Note: one occurrence begins within another one: P=abab, T=abcabababbc, P occurs at s=3 and s=5.

  5. An example of string matching

  6. Naïve string matching Running time: O((n-m+1)m).

  7. A Brute-Force Algorithm Time: O(mn) where m=|P| and n=|T|.

  8. Rabin-Karp • The Rabin-Karp string searching algorithm calculates a hash value for the pattern, and for each M-character subsequence of text to be compared. • If the hash values are unequal, the algorithm will calculate the hash value for next M-character sequence. • If the hash values are equal, the algorithm will do a Brute Force comparison between the pattern and the M-character sequence. • In this way, there is only one comparison per text subsequence, and Brute Force is only needed when hash values match. • Perhaps an example will clarify some things...

  9. Rabin-Karp Example • Hash value of “AAAAA” is 37 • Hash value of “AAAAH” is 100

  10. Rabin-Karp Algorithm pattern is M characters long hash_p=hash value of pattern hash_t=hash value of first M letters in body of text do if (hash_p == hash_t) brute force comparison of pattern and selected section of text hash_t= hash value of next section of text, one character over while (end of text or brute force comparison == true)

  11. vertex edge What is a graph? • A set of vertices and edges • Directed/Undirected • Weighted/Unweighted • Cyclic/Acyclic

  12. Representation of Graphs • Adjacency Matrix • A V x V array, with matrix[i][j] storing whether there is an edge between the ith vertex and the jth vertex • Adjacency Linked List • One linked list per vertex, each storing directly reachable vertices • Edge List

  13. Representation of Graphs

  14. Graph Searching • Why do we do graph searching? What do we search for? • What information can we find from graph searching? • How do we search the graph? Do we need to visit all vertices? In what order?

  15. Depth-First Search (DFS) • Strategy: Go as far as you can (if you have not visit there), otherwise, go back and try another way

  16. Implementation DFS (vertex u) { mark u as visited for each vertex v directly reachable from u if v is unvisited DFS (v) } • Initially all vertices are marked as unvisited

  17. Breadth-First Search (BFS) • Instead of going as far as possible, BFS tries to search all paths. • BFS makes use of a queue to store visited (but not dead) vertices, expanding the path from the earliest visited vertices.

  18. Queue: 4 1 3 6 2 5 Simulation of BFS 1 4 3 2 6 5

  19. Implementation while queue Q not empty dequeue the first vertex u from Q for each vertex v directly reachable from u if v is unvisited enqueue v to Q mark v as visited • Initially all vertices except the start vertex are marked as unvisited and the queue contains the start vertex only

More Related