120 likes | 201 Views
CSC 212 – Data Structures. Lecture 34: Strings and Pattern Matching. Problem of the Day.
E N D
CSC 212 –Data Structures Lecture 34: Strings and Pattern Matching
Problem of the Day • You drive a bus from Rotterdam to Delft. At the 1st stop, 33 people get in. At the 2nd stop, 7 more people get in, and 11 passengers leave. The 3rd stop, sees 5 people leave and 2 get in. After one hour, the bus arrives in Delft. What is the name of the driver? • Read the question: You are the driver!
Strings • Algorithmically, String is just sequence of concatenated data: • “CSC212 STUDENTS IN DA HOUSE” • “I can’t believe this is a String!” • Java programs • HTML documents • Digitized image • DNA sequences
Strings In Java • Java Strings are immutable • Java maintains a Map of text to String objects • Each time String created, Map is checked • If text exists, Java uses the String object to which it is mapped • Otherwise, makes a new String & adds text and object to Map • Happens “under the hood” • Make String work like a primitive type • Also makes it cheap to do lots of text processing
String Terminology • String drawn from elements in an alphabet • ASCII or Unicode • Bits • Pixels • DNA bases • SubstringP[i ... j] contains characters from P[i] through P[j] • Substrings starting at rank 0 called a prefix • Substrings ending with string’s last rank is suffix
Suffixes and Prefixes “I am the Lizard King!”
Pattern Matching Problem • Given strings T & P, find first substring of T matching P • T is the “text” • P is the “pattern” • Has many, many, many applications • Search engines • Database queries • Biological research
Brute-Force Approach • Common method of solving problems • Easy to develop • Often requires little coding • Needs little brain power to figure out • Uses computer’s speed for analysis • Examines every possible option • Painfully slow and use lots of memory • Generally good only with small problems
Brute-Force Pattern Matching • Compare P with every substrings in T, until • find substring of T equal to P -or- • reject all possible substrings of T • If |P| = m and |T| = n, takes O(nm) time • Worst-case: • T = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa • P = aaag • Common case for images & DNA data
Brute-Force Pattern Matching AlgorithmBruteForceMatch(String T,String P) // Check if each rank of T starts a matching substring for i0 to T.length()–P.length() // Compare substring starting at T[i] with P j0 while j<P.length()&&T.charAt(i + j)== P.charAt(j) j j +1 if j == P.length() return i// Return 1st place in T we find P return -1// No matching substring exists
Your Turn • Get back into groups and do activity
Before Next Lecture… • Keep up with your reading! • Cannot stress this enough • Get ready for Lab Mastery Exam • Start thinking about questions for Final