260 likes | 400 Views
LING/C SC/PSYC 438/538. Lecture 5 9/8 Sandiway Fong. Administrivia. Homework 1 (from lecture 3) was due last night (at midnight). Today’s Topics. Review Homework 1 We’ll go through it in class today Chapter 2 of JM Section 2.1 on regular expressions ( which you’ve already read … ).
E N D
LING/C SC/PSYC 438/538 Lecture 5 9/8 Sandiway Fong
Administrivia • Homework 1 (from lecture 3) • was due last night (at midnight)
Today’s Topics • Review • Homework 1 • We’ll go through it in class today • Chapter 2 of JM • Section 2.1 on regular expressions • (which you’ve already read…)
Safari Book available online (Thanks! Don Merson) • UA Library has been given access to the full Safari Books Online service. • This allows you to read a vast number of technical books via your browser. • However, it is currently only a trial. http://proquest.safaribooksonline.com.ezproxy1.library.arizona.edu/
Homework Review • Question 1: 438 and 538 (7 points) • Given • @sentence1 = (I, saw, the, the, cat, on, the, mat); • @sentence2 = (the, cat, sat, on, the, mat); • Write a simple Perl program which detects repeated words (many spell checker/grammar programs have this capability) • It should print a message stating the repeated word and its position if one exists • e.g. word 3 “the” is repeated in the case of sentence1 • No repeated words found in the case of sentence2 • note: output multiple messages if there are multiple repeated words • Hint: use a loop • Submit your Perl code and show examples of your program working
Homework Review • Thinking algorithmically… w1 w2 w3 w4 w5 Compare w1 with w2 Compare w2 with w3 Compare w3 with w4 Compare w4 with w5
Homework Review • Turning an algorithm into Perl code: Array indices start from 0… array @words words0 ,words1 … wordsn-1 Compare w1 with w1+1 for ($i=0; $i<$#words; $i++) { compare word indexed by $i to word indexed by $i+1 if same string, print message } Compare w2 with w2+1 Compare wn-2 with wn-2+1 Array indices end at $#words… Compare wn-1 with wn “for” loop implementation
Homework Review • First iteration (there are many ways to do this…) • (the basic for-loop) my @sentence1 = (I, saw, the, the, cat, on, the, mat); my @sentence2 = (the, cat, sat, on, the, mat); my @words = @sentence1; for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n" } }
Homework Review • 2nd iteration • (setting a flag when a repeated word is found) • (condition the output based on the value of the flag) my $flag = 0; for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n"; $flag = 1 } } print "No words repeated\n" unless $flag
Homework Review • 3rd iteration • (encapsulating the loop in a subroutine) sub check_repeated { my @words = @_; my $flag = 0; for ($i=0; $i<$#words; $i++) { if ($words[$i] eq $words[$i+1]) { print "word $i \"$words[$i]\" is repeated\n"; $flag = 1 } } print "No words repeated\n" unless $flag } print "@sentence1\n"; check_repeated(@sentence1); print "@sentence2\n"; check_repeated(@sentence2);
Homework Review • Question 2: 438 and 538 (3 points) • Describe what would it take to stop a repeated word program from flagging legitimate examples of repeated words in a sentence • (No spell checker/grammar program that I know has this capability) • Examples of legitimately repeated words: • I wish that that question had an answer • Because he had had too many beers already, he skipped the Friday office happy hour
Homework Review • Question 3: 538 (10 points), (438 extra credit) • Write a simple Perl program that outputs word frequencies for a sentence • E.g. given • @sentence1 = (I, saw, the, cat, on, the, mat, by, the, saw, table); • output a summary that looks something like: • the occurs 4 times • saw occurs twice • I, car, mat, on, by, table occurs once only • Hint: build a hash keyed by word with value frequency • Submit your Perl code and show examples of your program working
Homework Review • Thinking algorithmically… w0 w0 w1 w2 w3 w4 w5 foreach $word (@sentence) hash data structure = “labeled medicine cabinet”
Homework Review • Sample answer @sentence = (the, cat, sat, on, the, mat, that, the, cat, likes, most); %freq = (); foreach $word (@sentence) { if (exists $freq{$word}) { $freq{$word}++; } else { $freq{$word} = 1; } } foreach $word (keys %freq) { print "$word occurs $freq{$word} time(s)\n"; } perl e2.prl on occurs 1 time(s) the occurs 3 time(s) cat occurs 2 time(s) most occurs 1 time(s) sat occurs 1 time(s) likes occurs 1 time(s) that occurs 1 time(s) mat occurs 1 time(s) Further simplifications to the code are possible but the basic logic remains
Chapter 2: JM • Today • using your Perl skills on • Section 2.1 Regular Expressions • Online tutorials • http://perldoc.perl.org/perlrequick.html • http://perldoc.perl.org/perlretut.html
Pattern Matching JM, Chapter 2, pg 17 Merriam-Webster online
Chapter 2: JM • Perl regular expression (re) matching: • $a =~ /foo/ • /…/ contains a regular expression • will evaluate to true/false depending on what’s contained in $a • Perl regular expression (re) match and substitute: • $a =~ s/foo/bar/ • s/…match… /…substitute… / contains two expressions • will modify $a by looking for a single occurrence of match and replacing that with substitute • s/…match… /…substitute… /gglobal match and substitute
Chapter 2: JM • Most useful with code for reading in a file line-by-line: open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = <$txtfile>) { do RE stuff with $line }
Chapter 2: JM Sheeptalk
Chapter 2: JM • Precedence of operators • Example: Column 1 Column 2 Column 3 … • /Column [0-9]+ */ • /(Column [0-9]+ *)*/ • /house(cat(s|)|)/ • Perl: • In a regular expression the pattern matched by within the pair of parentheses is stored in $1 (and $2 and so on) • Precedence Hierarchy:
Chapter 2: JM http://perldoc.perl.org/perlretut.html A shortcut: list context for matching
Chapter 2: JM • s/([0-9]+)/<\1>/