220 likes | 398 Views
Perl Basics. A Perl Tutorial NLP Course - 2003. What is Perl?. Practical Extraction and Report Language Interpreted Language Optimized for String Manipulation and File I/O Full support for Regular Expressions. Running Perl Scripts. Windows Download ActivePerl from ActiveState
E N D
Perl Basics A Perl Tutorial NLP Course - 2003
What is Perl? • Practical Extraction and Report Language • Interpreted Language • Optimized for String Manipulation and File I/O • Full support for Regular Expressions
Running Perl Scripts • Windows • Download ActivePerl from ActiveState • Just run the script from a 'Command Prompt' window • UNIX – Cygwin • Put the following in the first line of your script #!/usr/local/bin/perl • Make the script executable % chmod +x script_name • Run the script % ./script_name
Basic Syntax • Statements end with semicolon • Comments start with ‘#’ • Only single line comments • Variables • You don’t have to declare a variable before you access it • You don't have to declare a variable's type
Scalars and Identifiers • Identifiers • A variable name • Case sensitive • Scalar • A single value (string or numerical) • Accessed by prefixing an identifier with '$' • Assignment with '=' $scalar = expression
Strings • Quoting Strings • With ' (apostrophe) • Everything is interpreted literally • With " (double quotes) • Variables get expanded • With ` (backtick) • The text is executed as a separate process, and the output of the command is returned as the value of the string Check 01_printDate.pl
String Operators $string1 = "potato"; $string2 = "head"; $newstring = $string1 . $string2; #"potatohead" $newerstring = $string1 x 2; #"potatopotato" $string1 .= $string2; #"potatohead" Check concat_input.pl
Perl Functions • Perl functions are identified by their unique names (print, chop, close, etc) • Function arguments are supplied as a comma separated list in parenthesis. • The commas are necessary • The parentheses are often not • Be careful! You can write some nasty and unreadable code this way! Check 02_unreadable.pl
Lists • Ordered collection of scalars • Zero indexed (first item in position '0') • Elements addressed by their positions • List Operators • (): list constructor • , : element separator • []: take slices (single or multiple element chunks)
List Operations • sort(LIST) a new list, the sorted version of LIST • reverse(LIST) a new list, the reverse of LIST • join(EXPR, LIST) a string version of LIST, delimited by EXPR • split(PATTERN, EXPR) create a list from each of the portions of EXPR that match PATTERN Check 03_listOps.pl
Arrays • A named list • Dynamically allocated, can be saved • Zero-indexed • Shares list operations, and adds to them • Array Operators • @: reference to the array (or a portion of it, with []) • $: reference to an element (used with [])
Array Operations • push(@ARRAY, LIST) add the LIST to the end of the @ARRAY • pop(@ARRAY) remove and return the last element of @ARRAY • unshift(@ARRAY, LIST) add the LIST to the front of @ARRAY • shift(@ARRAY) remove and return the first element of @ARRAY • scalar(@ARRAY) return the number of elements in the @ARRAY Check 04_arrayOps.pl
Associative Arrays - Hashes • Arrays indexed on arbitrary string values • Key-Value pairs • Use the "Key" to find the element that has the "Value" • Hash Operators • % : refers to the hash • {}: denotes the key • $ : the value of the element indexed by the key (used with {})
Hash Operations • keys(%ARRAY) return a list of all the keys in the %ARRAY • values(%ARRAY) return a list of all the values in the %ARRAY • each(%ARRAY) iterates through the key-value pairs of the %ARRAY • delete($ARRAY{KEY}) removes the key-value pair associated with {KEY} from the ARRAY
Pattern Matching • A pattern is a sequence of characters to be searched for in a character string • /pattern/ • Match operators • =~: tests whether a pattern is matched • !~: tests whether patterns is not matched
Backreferences • Memory of matched portion of input • /[a-z]+(.)[a-z]+\1[a-z]+/ • asd-eeed-sdsa, sd-sss-ws • NOT as.eee-dfg • They can even be accessed immediately after the pattern is matched • (.) in the previous pattern is $1
Substitutions • Substitution operator • s/pattern/substitution/options • If $string = "abc123def"; • $string =~ s/123/456/ Result: "abc456def" • $string =~ s/123// Result: "abcdef" • $string =~ s/(\d+)/[$1]/ Result: "abc[123]def"