1 / 21

Regular Expressions in Perl – Part 1

Regular Expressions in Perl – Part 1. Karthik Sangaiah. Perl Text Processing. Developed by Larry Wall “There’s more than one way to do it” “Easy things should be easy and hard things should be possible” Main purpose of Perl was for text manipulation

hoskinsc
Download Presentation

Regular Expressions in Perl – Part 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions in Perl – Part 1 Karthik Sangaiah

  2. Perl Text Processing • Developed by Larry Wall • “There’s more than one way to do it” • “Easy things should be easy and hard things should be possible” • Main purpose of Perl was for text manipulation • Regular Expressions fundamental to text processing

  3. Regular Expressions(Regex) • String that describes a pattern • Simplest regex is a word • A regex consisting of a word matches any string that contains that word • Ex: • “Hello World” =~ /World/

  4. Basic Regex Operators • “!~” operator produces TRUE of regex does NOT match a string • Ex: • if (“Sample Words” !~ /Sample/) { print “It doesn’t match\n”; else { print “It matches\n”; } • “=~” operator produces TRUE if regex matches a string • Ex: • if (“Sample Words” =~ /Sample/) { print “It matches\n”; else { print “It doesn’t match\n”; }

  5. Variables in Regex – Part 1 • Can use variable as regex • Ex: $temp = “ls” “ls - l” =~ /$temp/ • If using default variable “$_”: • “$_ =~” can be omitted • Ex: $_ = “ls -l”; if (/ls/) { print “It matches\n”; } else { print “It doesn’t match\n”; }

  6. Variables in Regex – Part 2 • Regexs in Perl are mostly treated as double-quoted Strings • Values of variables in regex will be subtituted in before regex is evaluated for matching • Ex:$foo = ‘vision’;‘television’ =~ /tele$foo/;

  7. Delimiters • “/ /” default delimiters can be changed to arbitrary delimiters by using “=~ m” • Ex:“Sample Text” =~ m!Text!;“Sample Text” =~ m{Text};“Sample Text” =~ m“Text”;

  8. Metacharacters • Reserved for use in regex notations • { }, [ ], ( ), ^, $, ., |, *, +, ?, \ • Need to use “\” before use of a metacharacter in the regex • Ex: • “5*2=10" =~ /5\*2/; • "/usr/bin/perl" =~ /\/usr\/bin\/perl/; • “/” also needs to be backslashed if it’s used as the delimiter

  9. Anchor Metacharacters • “^” matches at beginning of string • “$” matches at end of string or before new line at end of string • Ex:“television” =~ /^tele/;“television” =~ /vision$/; • When using “^” and “$”, regex has to match in beginning and end of string (i.e. match whole string). • Ex:“vision” =~ /^vision$/;

  10. Character Classes – Part 1 • Allows a set of possible characters, rather than a single character to match • Character classes denoted by […] with a set of characters matched inside • Ex./[btc]all/; #Matches ball, tall, or call/word[0123456789]/; #Matches word0…word9

  11. Character Classes – Part 2 • Special characters in character class are handled with backslash as well • Special characters within character class: • “-”, “]”, “\”, “^”, “$”, “.”, “]” • Ex:/[\$c]w/; #matches $w or cw$x = ‘btc’;/[$x]all/; #matches ball, tall, or call/[\$x]all/; #matches $all or xall/[\\$x]all/; #matches \all, ball, tall, or call

  12. Character Classes – Part 3 • Special Char. “-” used as range operator • Ex:/word[0-9]/; #matches word0…word9/word[0-9a-z] /; #matches word0… word9, or worda… wordz • Special Char. “^” in first position of character class denotes a negated character class • Ex:/[^0-9]/; #matches a non-numeric character

  13. Character Classes – Part 4 • Common character class abbreviations: • \d – digit, [0-9] • \s – whitespace character, [\ \t\r\n\f] • \w – word character(alphanumeric or _), • \D – negated \d • \S – negated \s • \W – negated \w • . – any character but “\n” • Abbreviations can be used inside and outside character classes

  14. Word Anchors • “\b” matches boundary between a word character and a non-word character • Ex: • $x = “Exam1 Question from Sample Exam”; • $x =~ /Exam/; #matches Exam in Exam1 • $x =~ /\bExam/; #matches cat in Exam • $x =~ /\bExam\b/; #matches cat at end of string

  15. Single line vs. Multi-line – Part 1 • Often, we want to match against lines and ignore newline characters • Sometimes we need to keep track of newlines. • //s – Single line matching • //m – Multi-line matching • These modifiers affect two aspects how the regex is interpreted: • How the ‘.’ character class is defined • Where the anchor, ^ and $, are able to match

  16. Single line vs. Multi-line – Part 2 • No modifier (//) – Default • . matches all characters but \n • ^ matches at beginning of string • $ matches at end of string or before a newline at the end of string • String as Single long line (//s) • . matches any character • ^ matches at beginning of string • $ matches end of string or before a newline at the end of string

  17. Single line vs. Multi-line – Part 3 • String as Multiple lines (//m) • . matches all characters but \n • ^ matches at beginning of any line within the string • $ matches end of any line within the string • String as Single long line but detect mutliple lines (//sm) • . matches any character • ^ matches at beginning of any line within the string • $ matches end of any line within the string

  18. Single line vs. Multi-line – Part 4 • $x = “You will know how to use Perl\nFor text processing\n"; • $x =~ /^For/; # No match, “For" not at start of string • $x =~ /^For/s; # No match, “For" not at start of string • $x =~ /^For/m; # match, “For" at start of second line • $x =~ /^For/sm; # match, “For" at start of second line

  19. Alternation • Alternation metacharacter “|” • Used to match different possible words or character strings • Word 1 or word 2 -> /word1|word2/; • Perl tries to match the regex at earliest possible point in the string • Ex.“shoes and strings” =~ /shoes/strings/and/; #matches shoes“shoes” =~ /s|sh|sho|shoes/; #matches “s”“shoes” =~ /shoes|sho|s/; #matches “cats”

  20. Resources • Perl Resource 5: Perl Regular Expressions Tutorial • http://www.cs.drexel.edu/~knowak/cs265_fall_2010/perlretut_2007.pdf • Perl History • http://www.xmluk.org/perl-cgi-history-information.htm • Perl Special Variables • http://www.kichwa.com/quik_ref/spec_variables.html

  21. Questions?

More Related