Programming in Perl regular expressions and m,s operators

Programming in Perlregular expressions and m,s operators Peter Verhás January 2002.

Pattern Matching Operator expression =~ m/regexp/options; $a = "apple"; print "yes!" if $a =~ m/pp/; The result is TRUE (1) or FALSE (0).

M operator options • g global search • i case insensitive search • m multi-line string • s single line string • o evaluate once only • x extended regular expression Now let’s see what Regular expression is and then we will return to m operator fine points.

Regular Expressions • A regular expression is a string with joker characters and joker expressions. • We will look at examples to explain it.

Regular Expression to Verify Email (1) NOTES: $_ is used as default m/is default when / is used $_ =~ m/^.*@\w+\..+$/ @ would also work instead of \@ but \@ is safe @mail = ( 'peter@verhas.com', 'hab.akukk%mikkamakka@jeno', ); for( @mail ){ if( /^.*\@\w+\..+$/ ){ print "$_ seems to be a good eMail\n"; }else{ print "$_ bad address\n"; } } OUTPUT: peter@verhas.com seems to be a good eMail hab.akukk%mikkamakka@jeno bad address

Regular Expression to Verify Email (2) /^.*\@\w+\..+$/ • ^ at the start of the string • .* zero or more any-character • *means zero or more of what stands before • \@ a single @ character • \w+ one or more alpha character • +means one or more of what stands before • \. one . (dot) character • specialregexp character is escaped with\ • .+ one or more any character • $ until end of string

Search and Replace Example of Regular Expressions $text = 'JavaScript is not used on island Java.'; $text =~ s/Java(?!Script)/Borneo/; print $text; OUTPUT: JavaScript is not used on island Borneo. NOTES: Operator s will be dicussed later in detail (?! )is zero length forward look, detailed later

Meta (joker) Character • . any character but new line • ^start of string • $ end of string • \ escaping the next character • \w any alphacharacter • \W any non-alpha character • \s any white space • \S any non-whitespace Only examples, there are other meta characters, see the Perl manual.

Parentheses (1) $text = 'Hook is not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/; print "$1 $2 $3 $4 $5 $6\n"; # $text = 'Hook i not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/; print "$1 $2 $3 $4 $5 $6\n"; OUTPUT: Hook ok is la l a Hook ok i sl s l NOTES: Numbering is in the order of the opening parentheses

Parentheses without $n $text = 'Hook is not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/; print "$1 $2 $3 $4 $5 .$6.\n"; $text = 'Hook i not used on island Java.'; $text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/; print "$1 $2 $3 $4 $5 .$6.\n"; OUTPUT: Hook ok is la a .. Hook ok i sl l .. NOTES: (?: ) groups sub-expression without creating reference $6 is zero string

Character classes • List of characters between [ and ] • Interval, e.g. [a-f] • Negative character set[^a-f]

Repetitions • * zero or more times • + one or more times • ? zero or one time • {n} exactly n times • {n,} at least n times • {n,m} at least n times, at most m times NOTES: There is {n,} but there is not {,m} Why? (hint: {0,m} works, but {n,???}??)

Greedy repetition • Repetitions are greedy, eat as many characters as possible $text = 'Hook is not used on island Java.'; $text =~ /(.*)is/; #1 print "$1.\n"; $text =~ /(.*?)is/; #2 print "$1.\n"; $text =~ /(.*?)is.*n/; #3 print "$1.\n"; OUTPUT: Hook is not used on . Hook . Hook .

Other extensions • Other UNIX tools also use simpler, similar regular expressions • Perl regular expressions are more powerful List of some extensions on the next slides

Regular expression comment (?# comment comes here) • Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!

Regular Expression Parentheses • (?: sub expression w/o $n) (?: we have discussed it already beforehand as it came up in an example, but this is the proper place to discuss this construct.)

Positive look forward (?= subregexp) $t = 'jamaica rum rum kingston rum'; $t =~ s/([aeoui])(?=\w)/uc($1)/ge; print $t; • OUTPUT: jAmAIca rUm rUm kIngstOn rUm Example: Uppercase all vowels standing inside a word to upper case.

Negative look forward (?! subregexp) $t = 'jamaica rum rum kingston rum'; $t =~ s/([aeoui])(?!\w)/uc($1)/ge; print $t; • OUTPUT: jamaicA rum rum kingston rum Example: Uppercase all vowels standing end of a word to upper case.

Option change inside the regular expression (? imsx) • This can be used inside m/ or s/ operator. • i and g options can not be used Now we go back to operator m/ and discuss some details.

M operator array result @k = "abbabaa" =~ m/(bb).+(a.)/; print $#k; print ' ',$k[0],' ',$k[1],"\n"; OUTPUT: 1 bb aa NOTES: Parts of the expression are closed into ( ) $1, $2 ... are the default variables where the substrings are put

M operator option g(1) @k = "abbabaa" =~ m/(b)(a)/g; print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n"; OUTPUT: 3b a b a NOTES: $_ is used as default m/is default when / is used @ would also work instead of \@ but it is safe

M operator option g(2) $t = "abbabaa"; while( $t =~ m/(ab)(b|a)/g ){ print pos($t)," $1 $2\n"; } OUTPUT: 3 ab b 6 ab a

M operator option i • Case insensitive match print '.',"apple" =~ /AppLe/,".\n"; print '.',"apple" =~ /AppLe/i,".\n"; • prints .. .1.

M operator options m and s $t = "mah\na\nb"; while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n"; while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n"; while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n"; • OUTPUT: .ah.a.b. . b. .b. mmatches$to all\nin the string smatches.to\n(otherwise.is any character but\n)

M operator option o • Evaluate the regular expression only once to save processor $t = "al brab"; $a = 'al'; $b = 'rab'; &q;&p; $b = 'fe'; &q;&p; sub q { print ' q',$t =~ /$a\sb$b/o } sub p { print ' p',$t =~ /$a\sb$b/ } • prints q1 p1 q1 p

M operator option x @k = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1 .+ #one or more any-character (a.) #a letter 'a' and exactly one any-character /x; #space and comment allowed print $#k; print ' ',$k[0],' ',$k[1],"\n"; OUTPUT: 1 bb aa This option allows space (\ is space) and comments to ease readability.

Operator s $text =~ s/regexp/replace/egimosx • Options: • e replace is interpreted as expression • g global search and replace • i case insensitive search • m string is treated as multi-line • o regular expression is evaluated only once • s string is treated as single-line • x extended syntax for the regexp

Global Search and Replace $t = "abbab" ; $t =~ s/ab/aa/g; print $t; OUTPUT: aabaa replaces all occurrences of the search regular expression to the replacement string

m and s operators with different delimiters • / is the default, but you can use • ' to have non-interpolated string • Other non alphanumeric characters • () {} [] with matching character pairs • In this case s{search}{replace}

m and s operators with different delimiters example $text = 'a@bba@bbabb'; @b = ('bba'); $text =~ s{@b}{q}g; print "$text\n"; $text = 'a@bba@bbabb'; $text =~ s'@b'q'g; print "$text\n"; OUTPUT: a@q@qbb aqbaqbabb @bis evaluated in the first search but not in the second

Thank you for your kind attention.

Programming in Perl regular expressions and m,s operators

Programming in Perl regular expressions and m,s operators

Presentation Transcript

Operators and Expressions

Operators and Expressions

Operators and Expressions

Operators and Expressions

Operators and Expressions

Regular Expressions in Perl

The Power of Perl Regular Expressions

Operators and Expressions

Perl Regular Expressions

Regular Expressions in Perl – Part I

Operators and Expressions

Perl, Beyond the Basics: Regular Expressions, Subroutines, and Objects in Perl

Operators and Expressions

Operators and Expressions

Regular Expressions: Theory and Perl Implementation

Perl Regular Expressions in SAS 9

Operators and Expressions

Regular Expressions in Perl Part I

Perl Regular Expressions

Regular Expressions in Perl – Part 1

Operators AND Expressions

Operators and Expressions