320 likes | 410 Views
Programming in Perl regular expressions and m,s operators. Peter Verhás January 2002. Pattern Matching Operator. expression =~ m/regexp/options; $a = "apple"; print "yes!" if $a =~ m/pp/; The result is TRUE (1) or FALSE (0). M operator options. g global search
E N D
Programming in Perlregular expressions and m,s operators Peter Verhás January 2002.
Pattern Matching Operator expression =~ m/regexp/options; $a = "apple"; print "yes!" if $a =~ m/pp/; The result is TRUE (1) or FALSE (0).
M operator options • g global search • i case insensitive search • m multi-line string • s single line string • o evaluate once only • x extended regular expression Now let’s see what Regular expression is and then we will return to m operator fine points.
Regular Expressions • A regular expression is a string with joker characters and joker expressions. • We will look at examples to explain it.
Regular Expression to Verify Email (1) NOTES: $_ is used as default m/is default when / is used $_ =~ m/^.*@\w+\..+$/ @ would also work instead of \@ but \@ is safe @mail = ( 'peter@verhas.com', 'hab.akukk%mikkamakka@jeno', ); for( @mail ){ if( /^.*\@\w+\..+$/ ){ print "$_ seems to be a good eMail\n"; }else{ print "$_ bad address\n"; } } OUTPUT: peter@verhas.com seems to be a good eMail hab.akukk%mikkamakka@jeno bad address
Regular Expression to Verify Email (2) /^.*\@\w+\..+$/ • ^ at the start of the string • .* zero or more any-character • *means zero or more of what stands before • \@ a single @ character • \w+ one or more alpha character • +means one or more of what stands before • \. one . (dot) character • specialregexp character is escaped with\ • .+ one or more any character • $ until end of string
Search and Replace Example of Regular Expressions $text = 'JavaScript is not used on island Java.'; $text =~ s/Java(?!Script)/Borneo/; print $text; OUTPUT: JavaScript is not used on island Borneo. NOTES: Operator s will be dicussed later in detail (?! )is zero length forward look, detailed later
Meta (joker) Character • . any character but new line • ^start of string • $ end of string • \ escaping the next character • \w any alphacharacter • \W any non-alpha character • \s any white space • \S any non-whitespace Only examples, there are other meta characters, see the Perl manual.
Parentheses (1) $text = 'Hook is not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/; print "$1 $2 $3 $4 $5 $6\n"; # $text = 'Hook i not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/; print "$1 $2 $3 $4 $5 $6\n"; OUTPUT: Hook ok is la l a Hook ok i sl s l NOTES: Numbering is in the order of the opening parentheses
Parentheses without $n $text = 'Hook is not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/; print "$1 $2 $3 $4 $5 .$6.\n"; $text = 'Hook i not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/; print "$1 $2 $3 $4 $5 .$6.\n"; OUTPUT: Hook ok is la a .. Hook ok i sl l .. NOTES: (?: ) groups sub-expression without creating reference $6 is zero string
Character classes • List of characters between [ and ] • Interval, e.g. [a-f] • Negative character set[^a-f]
Repetitions • * zero or more times • + one or more times • ? zero or one time • {n} exactly n times • {n,} at least n times • {n,m} at least n times, at most m times NOTES: There is {n,} but there is not {,m} Why? (hint: {0,m} works, but {n,???}??)
Greedy repetition • Repetitions are greedy, eat as many characters as possible $text = 'Hook is not used on island Java.'; $text =~ /(.*)is/; #1 print "$1.\n"; $text =~ /(.*?)is/; #2 print "$1.\n"; $text =~ /(.*?)is.*n/; #3 print "$1.\n"; OUTPUT: Hook is not used on . Hook . Hook .
Other extensions • Other UNIX tools also use simpler, similar regular expressions • Perl regular expressions are more powerful List of some extensions on the next slides
Regular expression comment (?# comment comes here) • Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!
Regular Expression Parentheses • (?: sub expression w/o $n) (?: we have discussed it already beforehand as it came up in an example, but this is the proper place to discuss this construct.)
Positive look forward (?= subregexp) $t = 'jamaica rum rum kingston rum'; $t =~ s/([aeoui])(?=\w)/uc($1)/ge; print $t; • OUTPUT: jAmAIca rUm rUm kIngstOn rUm Example: Uppercase all vowels standing inside a word to upper case.
Negative look forward (?! subregexp) $t = 'jamaica rum rum kingston rum'; $t =~ s/([aeoui])(?!\w)/uc($1)/ge; print $t; • OUTPUT: jamaicA rum rum kingston rum Example: Uppercase all vowels standing end of a word to upper case.
Option change inside the regular expression (? imsx) • This can be used inside m/ or s/ operator. • i and g options can not be used Now we go back to operator m/ and discuss some details.
M operator array result @k = "abbabaa" =~ m/(bb).+(a.)/; print $#k; print ' ',$k[0],' ',$k[1],"\n"; OUTPUT: 1 bb aa NOTES: Parts of the expression are closed into ( ) $1, $2 ... are the default variables where the substrings are put
M operator option g(1) @k = "abbabaa" =~ m/(b)(a)/g; print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n"; OUTPUT: 3b a b a NOTES: $_ is used as default m/is default when / is used @ would also work instead of \@ but it is safe
M operator option g(2) $t = "abbabaa"; while( $t =~ m/(ab)(b|a)/g ){ print pos($t)," $1 $2\n"; } OUTPUT: 3 ab b 6 ab a
M operator option i • Case insensitive match print '.',"apple" =~ /AppLe/,".\n"; print '.',"apple" =~ /AppLe/i,".\n"; • prints .. .1.
M operator options m and s $t = "mah\na\nb"; while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n"; while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n"; while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n"; • OUTPUT: .ah.a.b. . b. .b. mmatches$to all\nin the string smatches.to\n(otherwise.is any character but\n)
M operator option o • Evaluate the regular expression only once to save processor $t = "al brab"; $a = 'al'; $b = 'rab'; &q;&p; $b = 'fe'; &q;&p; sub q { print ' q',$t =~ /$a\sb$b/o } sub p { print ' p',$t =~ /$a\sb$b/ } • prints q1 p1 q1 p
M operator option x @k = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1 .+ #one or more any-character (a.) #a letter 'a' and exactly one any-character /x; #space and comment allowed print $#k; print ' ',$k[0],' ',$k[1],"\n"; OUTPUT: 1 bb aa This option allows space (\ is space) and comments to ease readability.
Operator s $text =~ s/regexp/replace/egimosx • Options: • e replace is interpreted as expression • g global search and replace • i case insensitive search • m string is treated as multi-line • o regular expression is evaluated only once • s string is treated as single-line • x extended syntax for the regexp
Global Search and Replace $t = "abbab" ; $t =~ s/ab/aa/g; print $t; OUTPUT: aabaa replaces all occurrences of the search regular expression to the replacement string
m and s operators with different delimiters • / is the default, but you can use • ' to have non-interpolated string • Other non alphanumeric characters • () {} [] with matching character pairs • In this case s{search}{replace}
m and s operators with different delimiters example $text = 'a@bba@bbabb'; @b = ('bba'); $text =~ s{@b}{q}g; print "$text\n"; $text = 'a@bba@bbabb'; $text =~ s'@b'q'g; print "$text\n"; OUTPUT: a@q@qbb aqbaqbabb @bis evaluated in the first search but not in the second