290 likes | 575 Views
Perl Basics. A Perl Tutorial NLP Course - 2006. What is Perl?. P ractical E xtraction and R eport L anguage Interpreted Language Optimized for String Manipulation and File I/O Full support for Regular Expressions. Running Perl Scripts. Windows Download ActivePerl from ActiveState
E N D
Perl Basics A Perl Tutorial NLP Course - 2006
What is Perl? • Practical Extraction and Report Language • Interpreted Language • Optimized for String Manipulation and File I/O • Full support for Regular Expressions
Running Perl Scripts • Windows • Download ActivePerl from ActiveState • Just run the script from a 'Command Prompt' window • UNIX – Cygwin • Put the following in the first line of your script #!/usr/bin/perl • Run the script % perl script_name
Basic Syntax • Statements end with semicolon ‘;’ • Comments start with ‘#’ • Only single line comments • Variables • You don’t have to declare a variable before you access it • You don't have to declare a variable's type
Scalars and Identifiers • Identifiers • A variable name • Case sensitive • Scalar • A single value (string or numerical) • Accessed by prefixing an identifier with '$' • Assignment with '=' $scalar = expression
Strings • Quoting Strings • With ' (apostrophe) • Everything is interpreted literally • With " (double quotes) • Variables get expanded • With ` (backtick) • The text is executed as a separate process, and the output of the command is returned as the value of the string Check 01_printDate.pl
String Operators $string1 = "potato"; $string2 = "head"; $newstring = $string1 . $string2; #"potatohead" $newerstring = $string1 x 2; #"potatopotato" $string1 .= $string2; #"potatohead" Check concat_input.pl
Perl Functions • Perl functions are identified by their unique names (print, chop, close, etc) • Function arguments are supplied as a comma separated list in parenthesis. • The commas are necessary • The parentheses are often not • Be careful! You can write some nasty and unreadable code this way! Check 02_unreadable.pl
Lists • Ordered collection of scalars • Zero indexed (first item in position '0') • Elements addressed by their positions • List Operators • (): list constructor • , : element separator • []: take slices (single or multiple element chunks)
List Operations • sort(LIST) a new list, the sorted version of LIST • reverse(LIST) a new list, the reverse of LIST • join(EXPR, LIST) a string version of LIST, delimited by EXPR • split(PATTERN, EXPR) create a list from each of the portions of EXPR that match PATTERN Check 03_listOps.pl
Arrays • A named list • Dynamically allocated, can be saved • Zero-indexed • Shares list operations, and adds to them • Array Operators • @: reference to the array (or a portion of it, with []) • $: reference to an element (used with [])
Array Operations • push(@ARRAY, LIST) add the LIST to the end of the @ARRAY • pop(@ARRAY) remove and return the last element of @ARRAY • unshift(@ARRAY, LIST) add the LIST to the front of @ARRAY • shift(@ARRAY) remove and return the first element of @ARRAY • scalar(@ARRAY) return the number of elements in the @ARRAY Check 04_arrayOps.pl
Associative Arrays - Hashes • Arrays indexed on arbitrary string values • Key-Value pairs • Use the "Key" to find the element that has the "Value" • Hash Operators • % : refers to the hash • {}: denotes the key • $ : the value of the element indexed by the key (used with {})
Hash Operations • keys(%ARRAY) return a list of all the keys in the %ARRAY • values(%ARRAY) return a list of all the values in the %ARRAY • each(%ARRAY) iterates through the key-value pairs of the %ARRAY • delete($ARRAY{KEY}) removes the key-value pair associated with {KEY} from the ARRAY
Arrays Example #!/usr/bin/perl # Simple List operations # Address an element in the list @stringInstruments = ("violin","viola","cello","bass"); @brass = ("trumpet","horn","trombone","euphonium","tuba"); $biggestInstrument = $stringInstruments[3]; print("The biggest instrument: ", $biggestInstrument); # Join elements at positions 0, 1, 2 and 4 into a white-space delimited string print("orchestral brass: ", join(" ",@brass[0,1,2,4]), "\n"); @unsorted_num = ('3','5','2','1','4'); @sorted_num = sort( @unsorted_num ); # Sort the list print("Numbers (Sorted, 1-5): ", @sorted_num, "\n");
Hashes Example #!/usr/bin/perl # Simple List operations $player{"clarinet"} = "Susan Bartlett"; $player{"basson"} = "Andrew Vandesteeg"; $player{"flute"} = "Heidi Lawson"; $player{"oboe"} = "Jeanine Hassel"; @woodwinds = keys(%player); @woodwindPlayers = values(%player); # Who plays the oboe? print("Oboe: ", $player{'oboe'}, "\n"); $playerCount = scalar(@woodwindPlayers); while (($instrument, $name) = each(%player)) { print( "$name plays the $instrument\n" ); }
Pattern Matching • A pattern is a sequence of characters to be searched for in a character string • /pattern/ • Match operators • =~: tests whether a pattern is matched • !~: tests whether patterns is not matched
Backreferences • Memorize the matched portion of input Use of parentheses. • /[a-z]+(.)[a-z]+\1[a-z]+/ • asd-eeed-sdsa, sd-sss-ws • NOT as_eee-dfg • They can even be accessed immediately after the pattern is matched • \1 in the previous pattern is what is matched by (.)
Substitutions • Substitution operator • s/pattern/substitution/options • If $string = "abc123def"; • $string =~ s/123/456/ Result: "abc456def" • $string =~ s/123// Result: "abcdef" • $string =~ s/(\d+)/[$1]/ Result: "abc[123]def“ Use of backreference!
String - Pattern Examples A simple Example #!/usr/bin/perl print ("Ask me a question politely:\n"); $question = <STDIN>; # what about capital P in "please"? if ($question =~ /please/) { print ("Thank you for being polite!\n"); } else { print ("That was not very polite!\n"); }
String – Pattern Example #!/usr/bin/perl print ("Enter a variable name:\n"); $varname = <STDIN>; chop ($varname); # Try asd$asdas... It gets accepted! if ($varname =~ /\$[A-Za-z][_0-9a-zA-Z]*/) { print ("$varname is a legal scalar variable\n"); } elsif ($varname =~ /@[A-Za-z][_0-9a-zA-Z]*/) { print ("$varname is a legal array variable\n"); } elsif ($varname =~ /[A-Za-z][_0-9a-zA-Z]*/) { print ("$varname is a legal file variable\n"); } else { print ("I don't understand what $varname is.\n"); }