120 likes | 219 Views
LING 388: Language and Computers. Sandiway Fong Lecture 4: 8/30. Today’s Lecture. Recap More on Perl and regexps Homework 1 due next Thursday my mailbox by midnight. Variables: always prefixed by $ e.g. $count , $i Assignment and arithmetic expressions: e.g. $count = 0;
E N D
LING 388: Language and Computers Sandiway Fong Lecture 4: 8/30
Today’s Lecture • Recap • More on Perl and regexps • Homework 1 • due next Thursday • my mailbox by midnight
Variables: always prefixed by $ e.g. $count, $i Assignment and arithmetic expressions: e.g. $count = 0; $count = $count + 1; $count++; (auto-increment) Arithmetic operators: + addition - subtraction * multiplication ** exponentiation / division Variables and strings: $i = “this”; $i = $i . “ moment”; . is the string concatentation operator Perl: recap
Example: $i = 99; $j = 100; if ($j > $i) { print “$j greater than $i\n” } else { print “$j less than $i\n” } substitute gt for > and a surprising result obtains reason: string comparison proceeds character by character (left to right) and ASCII representation of 1 is 49 < 57 the representation of 9 Numeric comparisons: == equality != inequality < less than > greater than <= less than or equal >= greater than or equal String comparisons: eq equality ne inequality lt less than gt greater than le less than or equal ge greater than or equal Perl: recap
Iteration: (while loop) $i = 10; while ($i>0) { $i-- } counts down $i: 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 (for loop) $max = 7; for ($i=0; $i <= $max; $i++) {... } counts up $i: 0, 1, 2, 3, 4, 5, 6, 7, 8 More Perl
We have already seen how to incorporate regexp matching in a Perl program: open (F,$ARGV[0]) or die "$ARGV[0] not found!\n"; while (<F>) { print $_ if (/regexp/); } by default /regexp/ matches against the value of the variable $_ (filled by <F> ) We can also match against a variable of our own choosing using the =~ operator: $x = “this string”; if ($x =~ /^this/) { print “ok” } Matching is by default case sensitive: this can be changed using the modifier i /regexp/i Perl and regexps
Multiple matches within a string can be made using the g modifier with a loop: $x = “the cat sat on the mat”; while ( $x =~ /the/ ) { print “match!\n” } goes into an infinite loop and keeps printing match! whereas: while ( $x =~ /the/g ) { print “match!\n” } prints match! twice Perl and regexps
Grouping uses the metacharacters ( and ) to delimit a group inside a regexp, each group can be referenced using \1, \2, and so on... outside a regexp, each group is stored in a variable $1, $2, and so on... Example: doubled vowel ([aeiou])\1 matches heed and book but not head cf. [aeiou][aeiou] Perl and regexps
Homework 1 • out today • due next Thursday • in my mailbox by midnight
Homework 1 • Data: • text file wsj500.txt • download from course webpage • make sure the newlines are correct for your platform • 500 sentences from the Wall Street Journal (WSJ) part of the Penn Treebank • one sentence per line • words are separated by spaces, also punctuation
Question 1 Write a Perl program to count the number of lines in a file and print the result Submit your program Demonstrate it works on the test file (copy the output of the cmd interpreter) Homework 1
Question 2 Write a Perl program to count the number of words in wsj500.txt that satisfy the following criteria: there are two identical vowels in a row within the word, and the word also ends in (lowercase) s Question 3: modify your Perl program from Question 2 to print out what those words are Homework 1