430 likes | 451 Views
Introduction to Perl. Pawel Sirotkin 28.11-01.12.2008, Riga. Overview. About programming Why Perl? How to write, how to run Variables Operations Basic input and output Conditionals and loops Regular expressions. About programming. Working with algorithms
E N D
Introduction to Perl Pawel Sirotkin 28.11-01.12.2008, Riga
Overview • About programming • Why Perl? • How to write, how to run • Variables • Operations • Basic input and output • Conditionals and loops • Regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
About programming • Working with algorithms • Program needs to contain exact commands • (Mostly) not: Go buy some bread • But: Put on your coat and shoes, open the door, go through it, close the door, go down the stairs… • Has a certain input • Processes it • Produces a certain output Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Why Perl? • Easy to learn • Simple syntax • Good at manipulating text • Good at dealing with regular expressions Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
How to write a Perl program • Perl programs can be written in any text editor • Notepad, vim, even Word… • Recommended: A simple text editor with syntax highlighting • Write the program code • Save the file as xxx.pl • .pl extension not necessary, but useful Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
What is a Perl program like? # This *very* simple program prints "Hello World!“ print "Hello World!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
What is a Perl program like? • The content of a line after the # is commentary. It is ignored by the program • What are commentaries for, then? • They are for you, and others who will have to read the code • Imaging looking at a complex program in a few months and trying to figure out what it does • Write as much commentary as you can # This *very* simple program prints "Hello World!“ print "Hello World!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
What is a Perl program like? • This is a Perl command • In this case, for printing text on the screen • Every command should start at a new line • Not a Perl requirement, but crucial for readability • Every command should end with a semicolon; • Many commands take arguments • Here: “Hello World!” # This *very* simple program prints "Hello World!“ print "Hello World!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
What to do with the program? • Perl works from the command line • Windows: „Start“ „Run…“ • Go to the directory where you saved the program • E.g.: cd C:\Perl\MyPrograms • Run the program: • perl myprogram.pl • See the results of your labours! Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (1) • Create a folder for your Perl programs • Open the editor of your choice and write the „Hello World“ program • The command is print „Hello World!“; • Don‘t forget the commentary! • Save the program • Run it! • What happens if you misprint the print command? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Variables • The „Hello World“ program always has the same output • Not a very useful program, as such • We need to be able to change the output • Variables are objects that can hold different values Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Defining variables # We define a variable „a“ and assign it a value of „42“ $a = 42; • To define a variable, write a dollar sign followed by the variable’s name • Names should consist of letters, numbers and the underscore • They should start with a letter • Variable names are case-sensitive! • $a and $A are different variables! • Generally, a variable’s name should tell you what the variable does Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Defining variables # We define a variable „a“ and assign it a value of „42“ $a = 42; • Variables can be assigned values • String: text (character sequence) in quotes/double quotes • Numbers • $a = 42; • $a = “some text”; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Changing variables • Arithmetic operations • $a = 42 / 2; # division • $a = 42 + 5; # addition • $a = $b * 2; # multiplication • $a = $a - $b; # subtraction • Also useful: • $a += 42; # the same as $a = $a + 42; • The same for +, -, / • String operations • $a = “some“ . “ text“; # concatenation • $a = $a . “ more text“; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Basic output • We have already seen an output command • print “text“; • print $a; • print “text $a“; • print “text “ . $a+$b . “ more text.“; • Special characters: • \n – new line • \t – tabulator Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (2) • Define a variable • Assign it a value of 15 • Print it • Double the value • Print it again • Define another variable with the string „apples“ • Print both variables • Change the first variable to its square and the second to „pears“ • Print both variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Basic input • The <> operator returns input from the standard source (usually, the keyboard) • Syntax: • $a = <>; • Don’t forget to tell the user what he’s supposed to enter! • Try the following program: # This program asks the user for his name and greets him print "What is your name? "; $name = <>; print "Hello $name!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Input, output and new lines • As the user input is followed by the [Enter] key, the string in $nameends in a new line • The chomp function deletes the new line at the end of a string • Try the following, modified program: # This program asks the user for his name and greets him print "What is your name? "; $name = <>; chomp($name); print "Hello $name!"; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (3) • Let the user enter the radius of a circle • Tell him the diameter (2r), circumference (2πr) and area (πr²) of the circle • Try doing this using one variable for each measure • Try doing this using only one variable Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
If, else • Until now, the course the program runs is fixed • The if clause allows us to take different actions in different circumstances # Let‘s try out a conditional clause print "Please enter password: "; $password = <>; if ($password == 42) { print "Correct password! Welcome."; } else { print "Wrong password! Access denied."; } Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
If, else • Note: = is the assignment operator, == is the comparison operator • Else is an optional operator triggering if the if condition fails # Let‘s try out a conditional clause print "Please enter password: "; $password = <>; if ($password == 42) { print "Correct password! Welcome."; } else { print "Wrong password! Access denied."; } Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (4) • Try out the password program. • Why doesn‘t it work correctly? Fix it. • Tell the user if the number he entered is too large or too small • Hint: The comparison operators you’ll need are < and > • Ask the user for a geometrical form (circle or square), and then for a radius or side length. Return the area and perimeter. Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
While • What if we want to do checks until something happens? • The while loop repeats commands until its criteria are met • Note: in the example below, $password has no value, so it specifically doesn’t have the value 42 # Now on to a "while" loop while ($password != 42) { print "Access denied.\n"; print "Please enter password: "; $password = <>; chomp($password); } print "Correct password! Welcome."; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (5) • Write a small game: take a number, and make the user guess it. Tell him if it‘s too high or too low. If the user gets it right, the program terminates. • If you like, you can take a random number: $random = int (rand(10) ); Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Perl regular expressions • Regular expressions very useful for text processing • Perl matching character: =~ • Perl non-matchingcharacter: !~ • The regular expression must be in backslashes: /regex/ • The program below accepts any password that contains the characters „42“ anywhere # A "while" loop with regular expressions while ($password !~/42/) { # While the entered line doesn’t contain “42” print "Access denied.\n"; print "Please enter password: "; $password = <>; chomp($password); } print "Correct password! Welcome."; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Perl regular expressions • Simple string: some text • One of a number of symbols: [aA] • Matches a or A • Also possible: [tT]he, matching the or The • One of a continuous string of symbols: [a-h][1-8] • Matches any two-character string from a1 to h8 • Special characters • ^ matches the beginning of a line • $matches the end of a line Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Perl regular expressions • More special characters • Wildcard: the dot . Matches any single character • b.d matches bad, bed, bid, bud… • Don‘t forget: it also matches forbid, badly… • +matches one or more of the previous character • re+d matches red and reed (and also reeedand so on!) • * matches zero or more occurrences of the previous character • bel*matches be, beland bell (and belll…) • ? matches zero or one occurrences of the previous character • soo?n Matches son or soon Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Perl regular expressions • Character classes • \d: digits • Rule \d+matches Rule 1, Rule 2, ..., Rule 334... • \w: “word characters” – letters, digits, _ • \w \w – any two “words” separated by a blank • \s: any whitespace (blanks, tabs) • ^\s+\d– any line where the first character is a digit • Capitalize the symbols to get the opposite • \S is anything but whitespace, \D are non-digits… Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (6) • Write a program which asks the user for his e-mail address. • Check if the address is syntactically correct. • Possible rules: • Must contain an @ character • At least one symbol before it • Must contain a dot • At least two symbols between @ and . • At least two symbols after . • No fancy symbols like {§* • Do you accept addresses with more than one dot? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Perl regular expressions • Switches • Tell Perl how to deal with the regular expression • /regex/i: ignore lower/upper case • /wiebke/imatches Wiebke and wiebke • s/regex/regex2/: substitute regex with regex2 • $text =~ s/Mark/Euro/ • /regex/g: repeat match until end of the line # What the //g switch does $text = “The meat costs 10 Mark, the fish costs 15 Mark.”; $text2 = $text1; $text =~ s/Mark/Euro/; # “The meat costs 10 Euro, the fish costs 15 Mark.” $text2 =~s/Mark/Euro/g; # “The meat costs 10 Euro, the fish costs 15 Euro.” Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Perl regular expressions • Grouping • Allows us to use matched string • /(text)/ matches text and stores it in a variable • The first group is stored in $1, the second in $2... # Substitution and grouping $sum = 0; # initializing the variable with zero $text = “The meat costs 10 Mark, the fish costs 15 Mark.” while ($text =~ s/(\d+) Mark/$1 Euro/) { # numbers-spaces-”Mark” $sum = $sum + $1; # adding amount to $sum value } print “Substituted $sum Mark for Euro!”; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Reading files • What if we want to have input from a file, not from the user? • Open file for reading: • open(INPUT, "<file.ext"); • Read a line: • $line = <SOURCE>; • $line = <>; # is just a special case Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Writing files • What if we want to print to a file, not to the screen? • Open file for writing: • open(OUTPUT, “>file.ext"); • Write: • print OUTPUT “Some text...”; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Reading files • A program for testing e-mail addresses • Note: If we want to use a special character literally, we need to escape it with a backslash • In strings : " • In regular expressions: . + * ^ $ and the backslash \ itself open(INPUT, "<test.txt"); while ($line = <INPUT>) { chomp($line); if ($line =~ /^.+@..+\...+$/) { # testing for e-mail: x@xx.xx print "\"$line\" is a valid e-mail address.\n"; } else { print "E-mail address \" $line\" not valid.\n"; } } Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (7) • Make a text file and fill it with a Wikipedia article • Count the number of definite and indefinite articles • Count the number of numbers and digits • Insert a <number!> tag before every number Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Arrays • Arrays contain lists of variables • Syntax: • @days = [“Monday“, “Tuesday“, “Friday“]; • $days[0] = “Saturday“; • $day = $days[2]; • Useful for storing linear sequences of variables • Note: @ for whole lists, $ for single variables Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Arrays • Useful array commands • push(@array, “element“); • Adds a new element to the end of the array • Creates the array if necessary • $element = pop(@array); • Moves the last value of @array to $element # Trying out arrays @tags = (“N”, “V”, “Adj”); $tag1 = pop(@tags); # $tag1 is now “Adj”, @tags is (“N”, “V”) $tag2 = pop(@tags); # $tag2 is now “V”, @tags is (“N”) Push(@tags, „V“, $tag2); # @tags is now again (“N”, “V”, “Adj”) Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Hashes • Hashes are associative arrays • They are lists where the elements are not ordered, but identified by a „name“ • Syntax: • %probability = (”verb“, 0.32, “adjective“, 0.02, “adverb“, 0); • $probability{“noun”} = 0.52; Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Exercise (7) • What happens if you try to print an array? • What about a hash? • What happens if you convert an array into a hash, or the other way round? Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Practical: Tokenizer • Take a Wikipedia article and put it into a text file • Clean it up if necessary • Tokenize it! • We only want one word per line • Insert a „sentence boundary“ symbol where appropriate • The output should be another file • Think about what choices you make and why! Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Practical: Tagger • Take the POS-annotated corpus from treebank.txt • Clean and tokenize it • Count the tag-token probabilities • Count the transition probabilities • For the first time, I strongly recommend bigrams • Apply the Viterbi algorithm and tag an input file of your choice! Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin
Practical: Tagger++ • If it‘s still too easy, or if you want a long-term aim: • Implement smoothing: words can have tags you haven‘t seen them with, or appear in contexts you never saw them before • Try to figure out a way to guess the tags for unknown words better • Write a program to train on 9/10 of the corpus, and test it on the rest. • Compare your results to the actual annotations • Do this 10 times for every 9/10 • Still too easy? Implement trigrams and compare the results. Introduction to Perl, NLL Riga 2008, by Pawel Sirotkin