1 / 14

Regular Expressions in Perl

CS/BIO 271 – Introduction to Bioinformatics. Regular Expressions in Perl. Regular Expressions. Regular expressions are a powerful tool for matching patterns against strings Available in many languages (AWK, Sed, Perl, Python, Ruby, C/C++, others)

wirt
Download Presentation

Regular Expressions in Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS/BIO 271 – Introduction to Bioinformatics Regular Expressionsin Perl

  2. Regular Expressions • Regular expressions are a powerful tool for matching patterns against strings • Available in many languages (AWK, Sed, Perl, Python, Ruby, C/C++, others) • Matching strings with RegExp’s is very efficient and fast Types & Regular Expressions

  3. RegExp basics • A regular expression is a pattern that can be compared to a string • A regular expression is created using the / / delimiters: • /^[abc].*f$/ • A regular expression is matched using the =~ (binding) operator • A regular expression match returns true or false • if ($mystring =~ /^[abc].*f$/) { } Types & Regular Expressions

  4. String Matching • Examples of a few simple regular expressions $a = "Fats Waller"; $a =~ /a/ » 1 (true) $a =~ /z/ » nil (false) $a =~ /ll/ » 1 (true) Types & Regular Expressions

  5. Regular Expression Patterns • Most characters match themselves • Wildcard: . (period) = any character • Anchors • ^ = “start of line” • $ = “end of line” Types & Regular Expressions

  6. Character Classes • Character classes: appear within [] pairs • Most special Regexp characters (^, $, etc) turned off • Escape sequences (\n etc) still work • [aeiou] • [0-9] • ^ as first character = negate the class • You can use the literal characters ] and – if they appear first: []-abn-z] Types & Regular Expressions

  7. Predefined character classes • These work inside or outside []’s: • \d = digit = [0-9] • \D = non-digit = [^0-9] • \s = whitespace, \S = non-whitespace • \w = word character [a-zA-Z0-9_] • \W = non-word character Types & Regular Expressions

  8. Repetition in Regexps • These quantify the preceding character or class: • * = zero or more • + = one or more • ? = zero or one • {m, n} = at least m and at most n • {m, } = at least m • High precedence – Only matches one character or class, unless grouped: • /^ran*$/ vs. /^r(an)*$/ Types & Regular Expressions

  9. Alternation • | is like “or” – matches either the regexp before the | or the one after • Low precedence – alternates entire regexps unless grouped • /red ball|angry sky/ matches “red ball” or “angry sky” not “red ball sky” or “red angry sky) • /red (ball|angry) sky/ does the latter Types & Regular Expressions

  10. Side Effects (Perl Magic) • After you match a regular expression some “special” Perl variables are automatically set: • $& – the part of the expression that matched the pattern • $‘ – the part of the string before the pattern • $’ – the part of the string after the pattern Types & Regular Expressions

  11. Side effects and grouping • When you use ()’s for grouping, Perl assigns the match within the first () pair to: • \1 within the pattern • $1 outside the pattern “mississippi” =~ /^.*(iss)+.*$/ » $1 = “iss” /([aeiou][aeiou]).*\1/ Types & Regular Expressions

  12. Repetition and greediness • By default, repetition is greedy, meaning that it will assign as many characters as possible. • You can make a repetition modifier non-greedy by adding ‘?’ a = "The moon is made of cheese“ showRE(a, /\w+/) » <<The>> moon is made of cheese showRE(a, /\s.*\s/) » The<< moon is made of >>cheese showRE(a, /\s.*?\s/) » The<< moon >>is made of cheese showRE(a, /[aeiou]{2,99}/) » The m<<oo>>n is made of cheese showRE(a, /mo?o/) » The <<moo>>n is made of cheese Types & Regular Expressions

  13. RegExp Substitutions Types & Regular Expressions

  14. Using RegExps • Repeated regexps with list context and /g • Single matches Types & Regular Expressions

More Related