Regular Expressions in Perl

CS/BIO 271 – Introduction to Bioinformatics Regular Expressionsin Perl

Regular Expressions • Regular expressions are a powerful tool for matching patterns against strings • Available in many languages (AWK, Sed, Perl, Python, Ruby, C/C++, others) • Matching strings with RegExp’s is very efficient and fast Types & Regular Expressions

RegExp basics • A regular expression is a pattern that can be compared to a string • A regular expression is created using the / / delimiters: • /^[abc].*f$/ • A regular expression is matched using the =~ (binding) operator • A regular expression match returns true or false • if ($mystring =~ /^[abc].*f$/) { } Types & Regular Expressions

String Matching • Examples of a few simple regular expressions $a = "Fats Waller"; $a =~ /a/ » 1 (true) $a =~ /z/ » nil (false) $a =~ /ll/ » 1 (true) Types & Regular Expressions

Regular Expression Patterns • Most characters match themselves • Wildcard: . (period) = any character • Anchors • ^ = “start of line” • $ = “end of line” Types & Regular Expressions

Character Classes • Character classes: appear within [] pairs • Most special Regexp characters (^, $, etc) turned off • Escape sequences (\n etc) still work • [aeiou] • [0-9] • ^ as first character = negate the class • You can use the literal characters ] and – if they appear first: []-abn-z] Types & Regular Expressions

Predefined character classes • These work inside or outside []’s: • \d = digit = [0-9] • \D = non-digit = [^0-9] • \s = whitespace, \S = non-whitespace • \w = word character [a-zA-Z0-9_] • \W = non-word character Types & Regular Expressions

Repetition in Regexps • These quantify the preceding character or class: • * = zero or more • + = one or more • ? = zero or one • {m, n} = at least m and at most n • {m, } = at least m • High precedence – Only matches one character or class, unless grouped: • /^ran*$/ vs. /^r(an)*$/ Types & Regular Expressions

Alternation • | is like “or” – matches either the regexp before the | or the one after • Low precedence – alternates entire regexps unless grouped • /red ball|angry sky/ matches “red ball” or “angry sky” not “red ball sky” or “red angry sky) • /red (ball|angry) sky/ does the latter Types & Regular Expressions

Side Effects (Perl Magic) • After you match a regular expression some “special” Perl variables are automatically set: • $& – the part of the expression that matched the pattern • $‘ – the part of the string before the pattern • $’ – the part of the string after the pattern Types & Regular Expressions

Side effects and grouping • When you use ()’s for grouping, Perl assigns the match within the first () pair to: • \1 within the pattern • $1 outside the pattern “mississippi” =~ /^.*(iss)+.*$/ » $1 = “iss” /([aeiou][aeiou]).*\1/ Types & Regular Expressions

Repetition and greediness • By default, repetition is greedy, meaning that it will assign as many characters as possible. • You can make a repetition modifier non-greedy by adding ‘?’ a = "The moon is made of cheese“ showRE(a, /\w+/) » <<The>> moon is made of cheese showRE(a, /\s.*\s/) » The<< moon is made of >>cheese showRE(a, /\s.*?\s/) » The<< moon >>is made of cheese showRE(a, /[aeiou]{2,99}/) » The m<<oo>>n is made of cheese showRE(a, /mo?o/) » The <<moo>>n is made of cheese Types & Regular Expressions

RegExp Substitutions Types & Regular Expressions

Using RegExps • Repeated regexps with list context and /g • Single matches Types & Regular Expressions

Regular Expressions in Perl

Regular Expressions in Perl

Presentation Transcript

Regular Expressions

Regular Expressions

The Power of Perl Regular Expressions

Regular Expressions

Perl Regular Expressions

Regular Expressions in Perl – Part I

Programming in Perl regular expressions and m,s operators

Perl, Beyond the Basics: Regular Expressions, Subroutines, and Objects in Perl

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions

Regular Expressions: Theory and Perl Implementation

Perl Regular Expressions in SAS 9

Regular Expressions in Perl Part I

Perl Regular Expressions

Regular Expressions

Regular Expressions in Perl – Part 1

Regular expressions

Regular Expressions

Perl Regular Expressions – Part 1