200 likes | 335 Views
Perl Chapter 7. Pattern Matching. Introduction. Scanning strings for substrings useful in many applications grep , find files, compilers, … Pattern matching UNIX (egrep) and awk.. Basis is regular expressions from theory of computation? Patterns are boolean expressions T/F
E N D
Perl Chapter 7 Pattern Matching
Introduction • Scanning strings for substrings useful in many applications • grep, find files, compilers, … • Pattern matching UNIX (egrep) and awk.. • Basis is regular expressions • from theory of computation? • Patterns are boolean expressions T/F • Patterns remember parts (list)
Syntax • m dl pattern dl [modifiers] • m is the operator • using / .. / as the delimiters makes m optional • Examples m ~pattern~ # ~ if / in pattern or /pattern/
Simple Patterns • Match individual char or character classes • 3 categories • normal chars– which match themselves • metachars, which have special meanings in patterns (\, $, ? , + ) • backslash will turn a meta char into a normal char \? • period • Escape sequences (\t) can appear in a pattern in which case they match themselves, if preceded by the \
Default string to match is $_ if (/snow/) { print “snow in \$_ \n”; } • /snow/ returns T/F • period matches any char expect a newline • /a../ would be an a followed by 2 non-newline chars
Matching Character classes • defined by placing chars in [ ]s • [A-Za-z] • [0-7] octal digit • [aeiou] • [^A-Za-z] chars NOT in char class
Common character classes • \d [0-9] • \D [^0-9] • \w [A-Za-z] a word char • \W [^A-Za-z] • \s [ \r\t\n\f] white space • \S [^ \r\t\n\f]
/[A-Z]”\s/ - matches uppercase letter, a double quote and a whitespace • /[\dA-Fa-f]/ - matches one Hex digit $pattern = “ slkdjfsdf”; if (/$pattern/) { …. }
Quantifiers • {n} - exactly n reps • {m, } – at least m reps • {m,n} - at least m, but not more than n /a{1,3}b}/ - matches ab, aab, aaab /(cats){3}/ - matches catscatscats /[abc]{1,2}/ - matches a, b, c, ab, ac, ba, bc, ca, cb • * 0 or more, including empty string • + 1 or more • ? 0 or 1 • . 1
/\w+/ matches 1 or more word-chars • /\d+\.\d+/ matches 1 or more digits, decimal, 1 or more digits (i.e., a real decimal number) Note \. matches decimal!! • /\$?\d+\.\d\d/ matches a price with or without $ • /ba(ll)*/ matches ba followed by 0 or more occurrences of string ll • /\d{3}-\d{2}-\d{4}/ matches SSN
Questions Assume $_ = “Tommie”; • Which m in Tommie does /m/ match? • What do these match? • /m*/ • /m+/ • /m*i/ • left most • matches empty string at beginning • matches mm • matches mmi
Matching • .* greedy mode (default) matches the max possible non-newline chars $_=“Bob Bobcat Bobolink”; /.*Bob/ will match the Bob in Bobolink Actually .* matches whole string, then backs up one character at a time until it finds a match for the rest of the pattern “Bob”, finding rightmost occurrence. Works that way for all quantified patterns.
Matching $_=“Freddie’s hot dogs are really hot!”; • /Fred+/ Fredd • /Fred+?/ ? minimal mode Fred • /.*hot/ last hot • /.*?hot/ first hot
Alternation • /a|e|i|o|u/ equivalent to /[aeiou]/ • /Fred|Mike|Dracula/ • left to right matching of alternatives • /Tom|Tommie/ never matches Tommie because leftmost pattern matched first • /to|too|two/ never matches too • Can use ( ) • /t(oo?|wo)/ to, too, or two
Precedence • From highest to lowest • () • Quantifiers • char sequence - [belly|belts|bells] • Alternation • Careful mixing alternation with char-class • [belly|belts|bells] eq to [belyts]
Binding operators • pattern can be matched to any string • connect string to pattern • $stringvar =~ /[,;:]/; finds pattern in $stringvar • $string !~ /[,;:]/; finds pattern, but inverts logic
Remembering matches $s = “TD ran for 305 yards today”; $s =~ /(\d+)(\w+)(\w+)/; print “$1 $2 $3 \n”; • prints 305 yards today • Matching parentheses $s =~ /((\d+)(\w+)(\w+))/; • $1 305 yards today • $2 305 • $3 yards • $4 today
Split with a pattern $s = “Betty, Bert, Bart, Bartholomew” @names = split /, /, $s $s = “Betty:778:Bert:222:Bart:43297:Bartholomew” $s =~ /:\d+:/ • $1 = Betty $2-Bert $3=Bart $4=Bartholomew
Substitutions $x = “no more apples!”; $x=~ s /apples/applets/; $x changed to “no more applets!” $x = “12034005”; $x =~ s/0//g; $x changes $x to “12345” • g modifier changes every occurrence
Translating characters • tr /search-list/replacement-list/ • tr /a-z/A-Z/; replaces all LC to UC, returns number replaced • tr /\./\./; replaces all . with ., but returns number of replacements (so in effect counts) $s = “Hello”; $s =~ tr /a-z/A-Z/; changes to HELLO, returns 4 (or true)