280 likes | 423 Views
Regular Expressions. Concepts of Regular Expressions. Allow for fast, flexible and reliable string handling A pattern that matches(or doesn’t match) a string A program within a program - with its own language Can be found in sed, awk, grep, procmail, vi and others. Regular Expressions.
E N D
Concepts of Regular Expressions • Allow for fast, flexible and reliable string handling • A pattern that matches(or doesn’t match) a string • A program within a program - with its own language • Can be found in sed, awk, grep, procmail, vi and others
Regular Expressions • Using Simple Patterns $_ = “yabba dabba doo”; if (/abba/) { print “match!”; }
Metacharacters . Matches a single char except newline /bet.y/ matches betty, betsy, bet=y, bet.y but not bety or betsey /bet\.y/ only matches bet.y
Quantifiers * Match preceding item 0 or more times ab*a matches aa aba abba… a.*a matches aa azya a346a… + Match preceding item 1 or more times ab+a matches aba but not aa a.+a matches a1a, aza, but not aa ? Last item is optional. Will match once or not at all ab?a matches aa and aba only.
Parentheses • Use parenthesis for grouping: /fred+/ matches freddddd /(fred)+/ matches fredfredfred /(fred)*/ matches hello world!
Alternatives | means “or” /fred|barney|betty/ matches a string containing either of those 3 names. Example using whitepace: /fred( |\t)barney/
Character Classes • A list of possible characters to match [abc]+ Match any string consisting of abc only. • Use “-” to specify ranges [a-zA-Z] • Generally used as part of a regular expression: if (/HAL-[0-9]+/)
Negating Character Classes • Sometimes it’s easier to specify the characters you don’t want: [^abc] [^n\-z]
Character Class Shortcuts • For frequently used character classes: [0-9] \d [A-Za-z0-9_] \w [\f\t\n\r ] \s [^\d] \D /HAL-\d+/ [\dA-Fa-f]+ [\d\D]
Delimeters • // is actually a shortcut for m// • Like qw//, you can use any pair of delimeters: m(fred) m<fred> m{fred} m!fred! • If you use / as the delimeter, you can omit the “m”: /fred/
Delimeters continued… • Choose a delimeter that doesn’t appear in your pattern. /^http:\/\// m#^http://#
Optional Modifiers • Case insensitive: /fred/i #matches Fred, FRED, fred, fReD • Newline matching: “.” doesn’t normally match \n, use the s modifier to change this: $_ = “fred\nbarney”; if (/fred.*barney/s) { …….
Anchors • Ensures pattern only matches from a certain spot: ^ Start of string $ End of string or newline /^fred/ # matches “frederick” but not “manfred” /rock$/ # matches “rock\n” not “rocks” /^\s*$/ # A line of whitespace
Word Anchors • \b is the word boundary anchor. /\bfred\b/ • Matches once at beginning of word, once at end • Words are \w words - letters chars and underscores:
Word Anchors continued • \B matches anything that is not a word boundary • Word boundaries make sure we don’t find: cat in delicatessen fish in selfishness
The Binding Operator • By default, perl matches regular expressions against $_ • =~ tells it to match against another var: $var =~ /blah/ • This is not an assignment operator! The value of $var is unchanged.
Binding Operator cont my $likes_perl = (<STDIN> =~ /\byes\b/i)
Interpolating Into Patterns • Variables interpolated similar to in a double quoted string: my $what = “fred”; while (<>) { if (/^($what)/) …
More interpolations Watch out for metacharacters: if $what contains “fred(barney” my $what = shift @ARGV; if (/^($what)/) Then regex becomes if (/^(fred(barney)/)
The Match Variables • One match variable for each pair of parentheses in pattern: $1, $2, $3… $_ = “username = alison”; if (/username = (\w+)/) { print $1; }
Persistance • Variables stay around until next successful pattern match • Don’t use the memory variables unless match worked! $wilma = ~/(\w+)/; print $1; OR if ($wilma =~ /(\w+)/) { print $1 } • Use vars within a few lines
The Automatic Match Variables $& Part of string that was matched $` Part of string prior to match $’ Part of string after match if (“Nine Inch Nails”) =~ /\s(\w+)/) $& is “ Inch” $` is “Nine” $’ is “ Nails ”
General Quantifiers • We have seen the quantifiers *, + and ? • For more control over quantifiers, use curly braces: /a{5,15}/ #matches 5 - 15 reps of a /a{2,4}/ # aa, aaa, aaaa • If you omit the second number, there is no upper limit
Precedence in Regular Expressions • Parentheses • Quantifiers (*, +, ?, {x,x}) • Anchors (^, $, \b, \B) • Alternatives ( | )
Examples of Precedence /^fred|barney$/ /^(fred|barney)$/ /(wilma|pebbles?)/ /^(\w+)\s+(\w+)$/ • Use parentheses to clarify precedence, but don’t forget to renumber memory refs!
patterntest.plhttp://www.ece.curtin.edu.au/~rtos204/patterntest.plpatterntest.plhttp://www.ece.curtin.edu.au/~rtos204/patterntest.pl #!/usr/bin/perl while (<>) { chomp; if (/REGULAR EXPRESSION/) { print “Matched: |`<$&>$’|\n”; } }
patterntest.pl in action bauhaus# ./patterntest.pl yabba dabba doo Matched: |y<abba> dabba doo| I just love abba Matched: |I just love <abba>|