180 likes | 338 Views
Search Algorithms Winter Semester 2004/2005 11 Oct 2004 1st Lecture. Christian Schindelhauer schindel@upb.de. Contents. The very various aspects of search in computer science Likewise Searching text Searching the Web Searching the DNS Searching the exit of a maze (labyrinth)
E N D
Search AlgorithmsWinter Semester 2004/200511 Oct 20041st Lecture Christian Schindelhauer schindel@upb.de
Contents • The very various aspects of search in computer science • Likewise • Searching text • Searching the Web • Searching the DNS • Searching the exit of a maze (labyrinth) • Searching a man over board • Trade-offs between time and space in search • Searching or Deciding? Which one is harder? • Language: English • Examinations can be also made in German (if wanted)
Organisation (I) • Lecture: • Monday, 11 am - 1pm, FU 116 (Beethoven) • Exercise Classes (Übungen) • Start next week • Participation in the exercises classes is mandatory • Monday, 1pm - 2pm, Stefan Rührup • Wednesday, 1pm - 2pm, Christian Schindelhauer • Registration for Exercise Classes • By StudInfo-System • See web page: • http://wwwcs.upb.de/cs/ag-madh/WWW/Teaching/2004WS/SearchAlg/ • Find web page from my home page: • http://www.upb.de/cs/schindel.html • Register for the Exercise Classes as soon as possible!
Organisation (II) • Material available at the web-site • Slides of the lectures in MS PowerPoint format and PDF • Lecture notes (with possible exam questions) • Exercises • Schedule of the lecture (with upcoming topics and examination dates) • Literature links • Material not available at the web-site • Solutions for the exercises • Solutions for the exam questions • Names of students registered for exercise classes or examinations
Examinations • Two exams: • 1st written exam (45 minutes) • Wednesday, 8 Dec. 2004, 12 am, in F0.530 • Contents: Lectures and Exercises in October and November 2004 • 2nd oral exam (25 minutes) • In the week from 7 Feb to 11 Feb 2005 in F2.315 • Each exam covers one half of the lecture • The over-all grade is the mean of both examination grades • Exercise rebate • If a student does not participate within the exercise class: • 1 extra examination question in the first test • 1 extra hour for solving an exercise prior to the 2nd oral exam
Exercises • Successful participation includes: • Registration to one of the exercise classes • Regularly appearing in the exercise classes • Solving at least two exercises (one in the first half and one in the second half) • Presenting these solutions within the exercise class • Written workouts of these solutions (submitted before the exams) • Reservations for exercises for presentation • Can be made by the StudInfo-System
Chapter I Chapter I Searching Text 10 Oct 2004
Search Text (Overview) • The task of string matching • Easy as a pie • The naive algorithm • How would you do it? • The Rabin-Karp algorithm • Ingenious use of primes and number theory • The Knuth-Morris-Pratt algorithm • Let a (finite) automaton do the job • This is optimal • The Boyer-Moore algorithm • Bad letters allow us to jump through the text • This is even better than optimal (in practice) • Literature • Cormen, Leiserson, Rivest, “Introduction to Algorithms”, chapter 36, string matching, The MIT Press, 1989, 853-885.
The task of string matching • Given • A text T of length n over finite alphabet : • A pattern P of length m over finite alphabet : • Output • All occurrences of P in T T[1] T[n] m a n a m a n a p a t i p i t i p i P[1] P[m] p a t i T[s+1..s+m] = P[1..m] m a n a m a n a p a t i p i t i p i Shift s p a t i
The Naive Algorithm Naive-String-Matcher(T,P) • n length(T) • m length(P) • for s 0 to n-m do • if P[1..m] = T[s+1 .. s+m] then • return “Pattern occurs with shift s” • fi • od Fact: • The naive string matcher needs worst case running time O((n-m+1) m) • For n = 2m this is O(n2) • The naive string matcher is not optimal, since string matching can be done in time O(m + n)
The Rabin-Karp-Algorithm • Idea: Compute • checksum for pattern P and • checksum for each sub-string of T of length m m a n a m a n a p a t i p i t i p i checksums 4 2 3 1 4 2 3 1 3 1 2 3 1 0 1 checksum 3 spurious hit valid hit p a t i
The Rabin-Karp Algorithm • Computing the checksum: • Choose prime number q • Let d = || • Example: • • Then d = 10, q = 13 • Let P = 0815 S4(0815) = (0 1000 + 8 100 + 1 10 + 5 1) mod 13 = 815 mod 13 = 9
How to Compute the Checksum: Horner’s rule • Compute • by using • Example: • • Then d = 10, q = 13 • Let P = 0815 S4(0815) = ((((010+8)10)+1)10)+5 mod 13 = ((((810)+1)10)+5 mod 13 = (3 10)+5 mod 13 = 9
How to Compute the Checksums of the Text • Start with Sm(T[1..m]) m a n a m a n a p a t i p i t i p i checksums Sm(T[1..m]) Sm(T[2..m+1])
The Rabin-Karp Algorithm Rabin-Karp-Matcher(T,P,d,q) • n length(T) • m length(P) • h dm-1 mod q • p 0 • t0 0 • for i 1 to m do • p (d p + P[i]) mod q • t0 (d t0 + T[i]) mod qod • for s 0 to n-m do • if p = ts then • if P[1..m] = T[s+1..s+m] then return “Pattern occurs with shift” s fi • if s < n-m then • ts+1 (d(ts-T[s+1]h) + T[s+m+1]) mod q fiod Checksum of the pattern P Checksum of T[1..m] Checksums match Now test for false positive Update checksum forT[s+1..s+m] usingchecksum T[s..s+m-1]
Performance of Rabin-Karp • The worst-case running time of the Rabin-Karp algorithm is O(m (n-m+1)) • Probabilistic analysis • The probability of a false positive hit for a random input is 1/q • The expected number of false positive hits is O(n/q) • The expected run time of Rabin-Karp is O(n + m (v+n/q))if v is the number of valid shifts (hits) • If we choose q ≥ m and have only a constant number of hits, then the expected run time of Rabin-Karp is O(n +m).
Knuth-Morris-Pratt: The Principle m a n a m a m a p a t i p i t m a m a m a m a m a m a m a m a m a m a m a m a m a m a
Thanks for your attentionEnd of 1st lectureNext lecture: Mo 18 Oct 2004, 11 am, FU 116Next exercise class: Mo 18 Oct 2004, 1 pm, F0.530 or We 20 Oct 2004, 1 pm, F1.110