170 likes | 325 Views
LING/C SC/PSYC 438/538. Lecture 7 Sandiway Fong. Administrivia. Reminder Perl homework on repeated word detection due Thursday!. Chapter 2: JM. Today Let’s use your newly acquired Perl skills on Regular Expressions (section 2.1 of the textbook) Online tutorials
E N D
LING/C SC/PSYC 438/538 Lecture 7 Sandiway Fong
Administrivia • Reminder • Perl homework on repeated word detection due Thursday!
Chapter 2: JM • Today • Let’s use your newly acquired Perl skills on Regular Expressions (section 2.1 of the textbook) • Online tutorials • http://perldoc.perl.org/perlrequick.html • http://perldoc.perl.org/perlretut.html
Pattern Matching JM, Chapter 2, pg 17 Merriam-Webster online
Chapter 2: JM • Perl regular expression (re) matching: • $a =~ /foo/ • /…/ contains a regular expression • will evaluate to true/false depending on what’s contained in $a • Perl regular expression (re) match and substitute: • $a =~ s/foo/bar/ • s/…match… /…substitute… / contains two expressions • will modify $a by looking for a single occurrence of match and replacing that with substitute • s/…match… /…substitute… /gglobal match and substitute
Chapter 2: JM • Most useful with the code template for reading in a file line-by-line: open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = <$txtfile>) { do RE stuff with $line }
Chapter 2: JM character class: Perl lingo
Chapter 2: JM backslash lowercase letter for class Uppercase variant for all but class
Chapter 2: JM Sheeptalk
Chapter 2: JM • Precedence of operators • Example: Column 1 Column 2 Column 3 … • /Column [0-9]+ */ • /(Column [0-9]+ *)*/ • /house(cat(s|)|)/ • Perl: • in a regular expression the pattern matched by within the pair of parentheses is stored in designated variables $1 (and $2 and so on) • Precedence Hierarchy: space
Chapter 2: JM http://perldoc.perl.org/perlretut.html returns 1 (true) or “” (empty if false) A shortcut: list context for matching returns a list
Chapter 2: JM • s/([0-9]+)/<\1>/ what does this do? Backreferences give Perl regexps more expressive power than finite state automata (fsa)
Shortest vs. Greedy Matching • default behavior • in Perl RE match: longest possible matching string • aka “greedy matching” • This behavior can be changed, see following slide • RE search is supposed to be fast • but searching is not necessarily proportional to the length of the input being searched • in fact, Perl RE matching can can take exponential time (in length) • non-deterministic • may need to backtrack (revisit) if it matches incorrectly part of the way through linear time time length length exponential
Shortest vs. Greedy Matching from http://www.perl.com/doc/manual/html/pod/perlre.html • Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*)bar/ ) { print ”matched <$1>\n"; } • Notes: • default variable $_ is also the default variable for matching • variable $1 refers to the parenthesized part of the match (.*) • Output: • matched <d is under the bar in the > Default variable implicit $_ =~
Shortest vs. Greedy Matching from http://www.perl.com/doc/manual/html/pod/perlre.html • Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched <$1>\n"; } • Notes: • ? immediately following a repetition operator like * makes the operator work in non-greedy mode • Output: • matched <d is under the >