590 likes | 809 Views
Introduction to PERL. ICS 215. About Perl. P(ractical) E(xtraction and) R(eporting) L(anguage) P(athologically) E(clectic) R(ubbish) L(ister) many simple one-line scripts with useful command-line arguments perl single step interpreter (or compiler). Perl History.
E N D
Introduction to PERL ICS 215
About Perl • P(ractical) E(xtraction and) R(eporting) L(anguage) • P(athologically) E(clectic) R(ubbish) L(ister) • many simple one-line scripts with useful command-line arguments • perl • single step interpreter (or compiler)
Perl History • 1987: original version by Larry Wall • goal: a Unix scripting language • 1994: standard version 5.x • now two separate languages • Perl 5 • 5/2013: current latest stable revision 5.18 • Perl 6, • 2000: redesign of Perl
Perl 6 • goal • remove "historical warts” • Motto • easy things should stay easy,hard things should get easier, andimpossible things should get hard • cleanup of APIs (and the internal design) • for our purposes • no substantial differences
Perl's Influencers • syntax • C • lists • LISP • hashes • AWK • regular expressions • sed
New in Perl 5 • data structures • functional programming • first-class functions • i.e. "closures" as values • see later • object-oriented programming
Perl as Programming Language • C-like syntax • common features in procedural languages • variables • expressions • assignments • control structures: branching, loops • brace-delimited blocks • subroutines • modules
Data Types • data typing is automatic • automatic type conversions, e.g., • from number to string • from string to number • illegal type conversions are fatal errors • automatic memory management • storage for data types is allocated and freed automatically • using reference counting • circular data structures can't be deallocated
Readability of Perl • canbe VERY unreadable ($_=q|cevag"znkvzhz:";@_=(2..<>);juvyr($a=fuvsg@_){cevag"$a";@_=terc{$_%$a}@_;}|)=~tr|a-z|n-za-m|;eval"$_"; • don't do that • Perl'smotto: • There'smore thanoneway to do it.
Readability of Perl cont. • typically much better: # prompt and read a number from userprint "maximum range; 2..: ";$maximum = <STDIN>; #what the user entered# make array of numbers up to maximum@numbers = (2..$maximum);# keep finding primeswhile ($prime = shift @numbers) { # print the next prime print "$prime\n"; # remove multiples of $prime @numbers = grep {$_ % $prime} @numbers;}
Pragma • compiler directive • changes how program is compiled • mandatory • use strictand use warnings • use strict • mandatory declaration of variables • prevents common errors • disallows unsafe constructs • more formal, less casual • use warnings • helpful diagnostics
Convention • whitespace is ignored • use spaces for readability • statements end with semicolons ; • sequence of statements can be combined into blocks with curly braces {} • good practices • single statement per line • indent statements in a block • use empty lines to separate subroutines • comment logical "blocks", rather than separate then with empty lines (hard to scroll)
Comments • inline comments start with # • no multiline comments (out of the box) as in other languages • use comments for documentation, be • terse, • but expressive
Sigil Convention • variables’ name start with a sigil • sigil identifies the data type • scalar: $ • $my_course = 'ICS 215'; • array: @ • @courses = ('ICS 665', $my_course); • hash: % • %age = {"Jan" => 61, "Parvina" => 9, "Kian" => 4}; • but:$age("Jan"); # because it's a scalar ! • subroutines: & (optional) • &quicksort(@numbers_to_sort); • typically, use underscore _ rather than camel naming
Strings • in single quotes • 'ICS 215' • allowed special characters • \' single quote • \\ backslash • or in double quotes • "ICS 215" • other special characters • \n new line • \" double quote • \' single quote • \x77 hex value • etc.
Interpolation in Strings • a variable can occur in a double quoted string $hi ="Hi. My name is $first_name $last_name"; • the current value of the variable is inserted into the string where it occures • Note: • not allowed in single-quoted string
Simple Data Structures • scalar • single value • holds the value that is assigned until reassigned • array • multiple ordered values • access by index • hash • mulitpleunordered values • access by key • HashMapin Java • my qualifier limits a variable to the local scope
Scalar Constants • number • integer 12 1E+100 • real 3.14159 2.71828182845905 • octal, hexadecimal 017, 0xF • string • a single character "a" • many characters "A quick brown..." • unicode "\x{263A}", UTF-8 format • reference
Scalar Variables • can't begin with a digit (after the sygil$) • case-sensitive • reserved special names $_ $1 $/ • any scalar value or variable can be assigned to another scalar variable • a variable can hold a number at one point and a string at another time • Perl is a losely, dynamically typed language • unlike in Java, a strongly typed language • declared but not assigned variable is undefined my $favorite_food; # undefined
Arrays • stores any number of ordered scalars • numbers, strings, references • any combination thereof • indexed by number • starting at 0 • accessed by index in square brackets [] • each item is a scalar • e.g.$array[0] • last index is $#array • last item is $array[$#array] • negative numbers count from end of list
Arrays cont. • are made longer or shorter dynamically • pushadds an item after the last one • popremoves the last item • recall stack • unshiftadds an item before the first one • shiftremoves the first item • deleteremoves the item given its index
Array Slices • sub-arrays or "slices" • made with @array[@indices] • @array[0]is slice • contains a single scalar • contrast with $array[0]- a scalar
Array Construction • enumerating items @array = (215, "215", '215', 3.14); • range @numbers = (215..665); @letters = ("a" .. "z"); • combination thereof @array = (215..665, "215");
foreach and Arrays • foreach loop iterates over entire array my @fruits = qw(papaya pineapple guava);foreach my $fruit(@fruits) { print "Let's have $fruit smoothie!\n";} • output Let's have papaya smoothie!Let's have pineapple smoothie!Let's have guava smoothie!
Hashes • a set of key/value pairs – a "map" • values can be any scalars • numbers, strings, references • accessed by the key • item values assigned scalars • assigning a new value to a key overwrites the old value • items (key/value pairs) can be added or removed • can be sliced • simple to iterate
Hash Construction • use => (rocket) %capitals = ( us => 'Washington',ch => 'Bern',cz => 'Prague'); • possible, but not recommended %capitals = ('us', 'Washington', 'ch', 'Bern'); • using variables $usa = 'us';$swiss= 'cz';@cities = ('Washington', 'Bern');%capitals = ( $usa => $cities[0], $swiss => $cities[0]);
Hash Construction cont. • using hashes %new_capitals= (ca => 'Ottawa',hi => 'Honolulu');%all_capitals= (%capitals, %new_capitals);print "The capital of Hawaii is $all_capitals{hi}"; • output The capital of Hawaii is Honolulu
Accessing Hashes • items retrieved as scalars $hi_capital = $all_capitals{hi}; • items assigned to scalars $all_capitals{cz} = 'Prague'; • assigning value to a new key adds the item pair to the hash
Hashes as a Set • dynamically assumes the size needed • hashes can grow or shrink • canbe empty (but defined) %capitals = (); • items can be deleted by deleting the key delete $capitals{hi}; • we can check whether an items with a key exists print "exists" if exists $capitals{hi};
Hash Slices • a sub-set of a hash's values – "slice" • a slice is an array • constructed with @hash{@some_keys} • e.g.: @foreign = @all_capitals{'ch', 'cz', 'ca'}; • watch out: • @hash{$key}is an array with one item – the key's value • $hash{$key}is a scalar
List of Hash Components • keysreturns an array of hash keys • keys returned in random order • values returns a list of hash values • values returned in random order • each returns a list of hash key/value pairs • items returned in random order • used in while loop
whileand Hashes • while-eachloops over entire hash while (my ($country, $capital) = each %capitals) { print "The capital of $country is $capital\n";} • output The capital of ch is BernThe capital of cz is PragueThe capital of hi is Washington
Control Statements • Branches • if • unless • Loops • while • until • do • for • foreach • modifiers • all of the above, • but as suffix of a simple statement
if Statement • if (condition) {statements } elsif (condition) {statements } else {statements } • else and elsif are optional • semantics as in Java • unless is the opposite of if
Loop Statements • while (condition) {} • loops while condition is true • possibly not at all • until(condition) {} • loops until condition becomes true • the opposite of while • do {} while (condition) • loops least once while condition is true • do {} until (condition) • loops least once until condition becomes true • for (initialization; condition;increment) • as in Java • foreach • loops over a list or array
Modifiers • if, unless, while, until, foreach • following a statement • to be used only with single statement • e.g. • attend_215() unless $is_holliday • print print $_ . "\n" foreach @weekdays • advantages • make programs more readable • emphasize the statement, rather than the control • parentheses () may be unnecessary
Operators • Numeric • +, -, *, /, % • assignment: =, *=, -=, etc., ++, -- • bitwise:<<, >> • String • concatenation:. • repetition:x • assignment:.=, x= • Boolean • <, >, <=, >=, ==, != • lt, gt, le, gt, eq, ne(on strings) • &&, ||, ! (high precedence); and, or, not(low precedence) • ternary conditional ? : • my $max = $x > $y ? $x : $y;
Array Functions • push, pop, unshift, shift • split my @pets = split(", ", "cat, dog, bird"); # ("cat","dog","bird") • join my $pets = join(", ", @pets); # "cat, dog, bird" • sort • sorts alphabetically by default • reverse • as list: reverses it my @pets = reverse sort qw(cat, dog, bird); # ("dog","cat","bird") • as scalar: concatenates into a string then reverses it my $semordnilap = reverse "deliver no evil"; # "live on reviled" • grep
grep • Finds matching array items • typically based on a regular expression my @courses = qw(ics111 art211 ics215 com415);my @ics = grep(/ics/, @courses); # ["ics111","ics215"] • or based on a condition my @numbers = (22, 13, 51, 70, 111, 33, 22);my @odds= grep {$_ % 2 == 1} @numbers; # (13, 51, 111, 33) • Note: • grepassigns consecutive items of @numbers to $_ • $_ is the current item in @numbers
References • refer to other data • syntax: \ my $courses_ref= \@courses; • references are scalars • dereferencing yields the data • syntax: -> my $course = $courses_ref->[1]; • allows to • build hierarchical data structures • pass arguments by reference • create anonymous data
Hierarchical/Anonymous Data • via references my $courses = {ics => 215, lis => 699};print "I need ICS $courses->{ics}"; • output I need ICS 215 • for array and hashes -> is optional my @ics= (215, 311, 465);my @lis= (699, 691);my @courses = (\@ics, \@lis);print "I added ICS $courses[0]->[0]" . " and LIS $courses[1][1]\n"; • output I added ICS 215 and LIS 691
Subroutines • declared with sub • return a value • returncommand • otherwise, the last statement's value • scalar or an array • wantarraytells what context is wanted • sygil&is optional • recursive calls are ok • subroutines are a data type, too
Arguments • arguments in() parentheses • often not needed • but good pra tice • arguments are are available in @_ array • readability alert: assign @_ to local variables • also accessible via $_[i] • whereiis the index in @_ • passing by value • passing by reference
Subroutine Example sub fibonacci { my ($n) = @_; die "Argument must be > 0" if $n < 1; return 1 if $n <= 2; fibonacci($n - 1) + fibonacci($n - 2); } my @series = (); foreach my $n (1..5) { push(@series, fibonacci($n)); } print "fibonacci numbers\n@series\n";
Regular Expressions • abbreviated as "regex" (also regexp) • textual pattern matching • based on regular automata theoretical concepts – they are a language! • matching • letters, numbers, white-space, other characters • can exclude specific characters from a matche • match boundaries between words, line begin, line end, etc. • match subpatterns • match repetitions
RegEx Gotchas • gotchas • typically "line oriented" • difficult to match across line-end characters • some characters have special meaning • if you mean the actual character you must "escape" it • use \ • numerous regex operators in Perl
Regex Samples • ask questions about a string my $string = "Did the fox jump over the dog?"; • whether it • contains the substring "fox"? print "$string\n" if $string =~ /fox/; # yes • doesn't contain the letter "q"? print "$string\n" if $string !~ /q/; # yes • begins with the letters "z" or "1"? print "$string\n" if $string =~ /^[z1]/; # no • ends with a question mark? print "$string\n" if $string =~ /\?$/; # yes • contains only letters or digits? print "$string\n" if $string =~ /^[a-zA-Z0-9]*$/; # no • contains only digits? print "$string\n" if $string =~ /^\d*$/; # no
Regex Operators & Quoting • use regex comparison operators to test for a match string=~regex# true if matches string!~regex# true if doesn't match • regexes may be quoted in several ways • default slashes /regex/ • may use other quotes with the match operator m print "$string\n" if $string =~ m(bird); print "$string\n" if $string !~ m|cat|; • if regex contains slash /use another quote with m
Regex Language • sets of character to match enclosed in [] print "$string\n" if $string =~ /[bf]ox/; # box or fox • exclude characters in a set with ^ print "$string\n" if $string =~ m/fo[^g]/;# fox matches • predefined character sets (there are others) \swhite-space \Sanything but white-space \ddigits \Danything but digits \wword characters: letters, digits, underscore \Wanything but word characters .anything except "end-of-line" print "$string\n" if $string =~ m/\s\w.x\s/;# matches " fox " • if you need to match.(dot) escape it:\.
Regex Boundaries • start of the string/line ^ • end of the string/line $ • word boundary \b my $string = "Did the fox jump over the dog?"; print "$string\n" if $string =~ /^[^D]/; # not print "$string\n" if $string =~ /\?$/; # yes print "$string\n" if $string =~ /\bfox\b/; # yes • if you need to match^or$ escape it: \^, \$