1 / 54

Perl Tutorial

Practical extraction and report language. Perl Tutorial. http://www.comp.leeds.ac.uk/Perl/start.html. Why Perl?. Perl is built around regular expressions REs are good for string processing Therefore Perl is a good scripting language Perl is especially popular for CGI scripts

cooper
Download Presentation

Perl Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical extraction and report language Perl Tutorial http://www.comp.leeds.ac.uk/Perl/start.html

  2. Why Perl? • Perl is built around regular expressions • REs are good for string processing • Therefore Perl is a good scripting language • Perl is especially popular for CGI scripts • Perl makes full use of the power of UNIX • Short Perl programs can be very short • “Perl is designed to make the easy jobs easy, without making the difficult jobs impossible.” -- Larry Wall, Programming Perl

  3. Why not Perl? • Perl is very UNIX-oriented • Perl is available on other platforms... • ...but isn’t always fully implemented there • However, Perl is often the best way to get some UNIX capabilities on less capable platforms • Perl does not scale well to large programs • Weak subroutines, heavy use of global variables • Perl’s syntax is not particularly appealing

  4. What is a scripting language? • Operating systems can do many things • copy, move, create, delete, compare files • execute programs, including compilers • schedule activities, monitor processes, etc. • A command-line interface gives you access to these functions, but only one at a time • A scripting language is a “wrapper” language that integrates OS functions

  5. Major scripting languages • UNIX has sh, Perl • Macintosh has AppleScript, Frontier • Windows has no major scripting languages • probably due to the weaknesses of DOS • Generic scripting languages include: • Perl (most popular) • Tcl (easiest for beginners) • Python (new, Java-like, best for large programs)

  6. Perl Example 1 #!/usr/local/bin/perl # # Program to do the obvious # print 'Hello world.'; # Print a message

  7. Comments on “Hello, World” • Comments are # to end of line • But the first line, #!/usr/local/bin/perl, tells where to find the Perl compiler on your system • Perl statements end with semicolons • Perl is case-sensitive • Perl is compiled and run in a single operation

  8. Variables • A variable is a name of a place where some information is stored. For example: • $yearOfBirth = 1976; • $currentYear = 2000; • $age = $currentYear-$yearOfBirth; • print $age; • Same name can store strings: • $yearOfBirth = ‘None of your business’; • The variables in the example program can be identified as such because their names start with a dollar ($). Perl uses different prefix characters for structure names in programs. Here is an overview: • $: variable containing scalar values such as a number or a string • @: variable containing a list with numeric keys • %: variable containing a list with strings as keys • &: subroutine

  9. Operations on numbers • Perl contains the following arithmetic operators: • +: sum • -: subtraction • *: product • /: division • %: modulo division • **: exponent • Apart from these operators, Perl contains some built-in arithmetic functions. Some of these are mentioned in the following list: • abs($x): absolute value • int($x): integer part • rand(): random number between 0 and 1 • sqrt($x): square root

  10. Test your understanding • $text =~ s/bug/feature/; • $text =~ s/bug/feature/g; • $text =~ tr/[A-Z]/[a-z]/; • $text =~ tr/AEIOUaeiou//d; • $text =~ tr/[0-9]/x/cs; • $text =~ s/[A-Z]/CAPS/g;

  11. Examples • # replace first occurrence of "bug" • $text =~ s/bug/feature/; • # replace all occurrences of "bug" • $text =~ s/bug/feature/g; • # convert to lower case • $text =~ tr/[A-Z]/[a-z]/; • # delete vowels • $text =~ tr/AEIOUaeiou//d; • # replace nonnumber sequences with a single x • $text =~ tr/[0-9]/x/cs; • # replace each capital character by CAPS • $text =~ s/[A-Z]/CAPS/g;

  12. Regular expressions Examples: 1. Clean an HTML formatted text 2. Grab URLs from a Web page 3. Transform all lines from a file into lower case • \b: word boundaries • \d: digits • \n: newline • \r: carriage return • \s: white space characters • \t: tab • \w: alphanumeric characters • ^: beginning of string • $: end of string • .: any character • [bdkp]: characters b, d, k and p • [a-f]: characters a to f • [^a-f]: all characters except a to f • abc|def: string abc or string def • *: zero or more times • +: one or more times • ?: zero or one time • {p,q}: at least p times and at most q times • {p,}: at least p times • {p}: exactly p times

  13. Lists and arrays • @a = (); # empty list • @b = (1,2,3); # three numbers • @c = ("Jan","Piet","Marie"); # three strings • @d = ("Dirk",1.92,46,"20-03-1977"); # a mixed list • Variables and sublists are interpolated in a list • @b = ($a,$a+1,$a+2); # variable interpolation • @c = ("Jan",("Piet","Marie")); # list interpolation • @d = ("Dirk",1.92,46,(),"20-03-1977"); # empty list • # don’t get lists containing lists – just a simple list • @e = ( @b, @c ); # same as (1,2,3,"Jan","Piet","Marie")

  14. Lists and arrays • Practical construction operators • ($x..$y) • @x = (1..6); # same as (1, 2, 3, 4, 5, 6) • @z = (2..5,8,11..13); # same as (2,3,4,5,8,11,12,13) • qw() "quote word" function • qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie").

  15. Split • It takes a regular expression and a string, and splits the string into a list, breaking it into pieces at places where the regular expression matches. $string = "Jan Piet\nMarie \tDirk";@list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) # remember \s is a white space • $string = " Jan Piet\nMarie \tDirk\n"; # empty string at begin and end!!!@list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" )$string = "Jan:Piet;Marie---Dirk"; # use any regular expression... @list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" )$string = "Jan Piet"; # use an empty regular expression to split on letters @letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")

  16. More about arrays • @array = ("an","bert","cindy","dirk"); • $length = @array; # $length now has the value 4 • print $length; # prints 4 • print $#array; # prints 3, last valid subscript • print $array[$#array] # prints "dirk" • print scalar(@array) # prints 4

  17. Working with lists Subscripts convert lists to strings @array = ("an","bert","cindy","dirk"); print "The array contains $array[0] $array[1] $array[2] $array[3]"; # interpolate print "The array contains @array"; function join STRING LIST. $string = join ":", @array; # $string now has the value "an:bert:cindy:dirk" Iteration over lists for( $i=0 ; $i<=$#array; $i++){ $item = $array[$i]; $item =~ tr/a-z/A-Z/; print "$item "; } foreach $item (@array){ $item =~ tr/a-z/A-Z/; print "$item "; # prints a capitalized version of each item }

  18. More about arrays – multiple value assignments • ($a, $b) = ("one","two"); • ($onething, @manythings) = (1,2,3,4,5,6) • # now $onething equals 1 • # and @manythings = (2,3,4,5,6) • ($array[0],$array[1]) = ($array[1],$array[0]); • # swap the first two • Pay attention to the fact that assignment to a variable first evaluates the right hand-side of the expression, and then makes a copy of the result • @array = ("an","bert","cindy","dirk"); • @copyarray = @array; # makes a deep copy • $copyarray[2] = "XXXXX";

  19. Manipulating lists and their elements PUSH • push ARRAY LIST • appends the list to the end of the array. • if the second argument is a scalar rather than a list, it appends it as the last item of the array. • @array = ("an","bert","cindy","dirk"); • @brray = ("eve","frank"); • push @array, @brray; • # @array is ("an","bert","cindy","dirk","eve","frank") • push @brray, "gerben"; • # @brray is ("eve","frank","gerben")

  20. Manipulating lists and their elements POP • pop ARRAY does the opposite of push. it removes the last item of its argument list and returns it. • If the list is empty it returns undef. • @array = ("an","bert","cindy","dirk"); • $item = pop @array; • # $item is "dirk" and @array is ( "an","bert","cindy") • shift @array removes the first element - works on the left end of the list, but is otherwise the same as pop. • unshift (@array, @newStuff) puts stuff on the left side of the list, just as push does for the right side.

  21. Grep • grep CONDITION LIST • returns a list of all items from list that satisfy some condition. • For example: • @large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25) • @i_names = grep /i/, @array; # returns ("cindy","dirk")

  22. map • map OPERATION LIST • is an extension of grep, and performs an arbitrary operation on each element of a list. • For example: • @array = ("an","bert","cindy","dirk"); • @more = map $_ + 3, (1,2,4,8,16,25); • # returns (4,5,7,11,19,28) • @initials = map substr($_,0,1), @array; • # returns ("a","b","c","d")

  23. Hashes (Associative Arrays) • associate keys with values – named with % • allows for almost instantaneous lookup of a value that is associated with some particular key Examples if %wordfrequency is the hash table, $wordfrequency{"the"} = 12731; # creates key "the", value 12731 $phonenumber{"An De Wilde"} = "+31-20-6777871"; $index{$word} = $nwords; $occurrences{$a}++; # if this is the first reference, # the value associated with $a will # be increased from 0 to 1

  24. Hash Operations • %birthdays = ("An","25-02-1975","Bert","12-10-1953","Cindy","23-05-1969","Dirk","01-04-1961"); • # fill the hash • %birthdays = (An => "25-02-1975", Bert => "12-10-1953", Cindy => "23-05-1969", Dirk => "01-04-1961" ); • # fill the hash; the same as above, but more explicit • @list = %birthdays; # make a list of the key/value pairs • %copy_of_bdays = %birthdays; # copy a hash

  25. Hashes (What if not there?) • Existing, Defined and true. • If the value for a key does not exist in the hash, the access to it returns the undef value. • special test function exists(HASHENTRY), which returns true if the hash key exists in the hash • if($hash{$key}){...}, or if(defined($hash{$key})){...} • return false if the key $key has no associated value • print "Exists\n" if exists $array{$key};

  26. Perl Example 2 #!/ex2/usr/bin/perl # Remove blank lines from a file # Usage: singlespace < oldfile > newfile while ($line = <STDIN>) { if ($line eq "\n") { next; } print "$line"; }

  27. More Perl notes • On the UNIX command line; • < filename means to get input from this file • > filename means to send output to this file • In Perl, <STDIN> is the input file, <STDOUT> is the output file • Scalar variables start with $ • Scalar variables hold strings or numbers, and they are interchangeable • Examples: • $priority = 9; • $priority = '9'; • Array variables start with @

  28. Perl Example 3 #!/usr/local/bin/perl # Usage: fixm <filenames> # Replace \r with \n -- replaces input files foreach $file (@ARGV) { print "Processing $file\n"; if (-e "fixm_temp") { die "*** File fixm_temp already exists!\n"; } if (! -e $file) { die "*** No such file: $file!\n"; } open DOIT, "| tr \'\\015' \'\\012' < $file > fixm_temp" or die "*** Can't: tr '\015' '\012' < $ file > $ fixm_temp\n"; close DOIT; open DOIT, "| mv -f fixm_temp $file" or die "*** Can't: mv -f fixm_temp $file\n"; close DOIT; }

  29. Comments on example 3 • In # Usage: fixm <filenames>, the angle brackets just mean to supply a list of file names here • In UNIX text editors, the \r (carriage return) character usually shows up as ^M (hence the name fixm_temp) • The UNIX command tr '\015' '\012' replaces all \015 characters (\r) with \012 (\n) characters • The format of the open and close commands is: • openfileHandle,fileName • closefileHandle,fileName • "| tr \'\\015' \'\\012' < $file > fixm_temp"says: Take input from $file, pipe it to the tr command, put the output onfixm_temp

  30. Arithmetic in Perl $a = 1 + 2; # Add 1 and 2 and store in $a $a = 3 - 4; # Subtract 4 from 3 and store in $a $a = 5 * 6; # Multiply 5 and 6 $a = 7 / 8; # Divide 7 by 8 to give 0.875 $a = 9 ** 10; # Nine to the power of 10, that is, 910 $a = 5 % 2; # Remainder of 5 divided by 2 ++$a; # Increment $a and then return it $a++; # Return $a and then increment it --$a; # Decrement $a and then return it $a--; # Return $a and then decrement it

  31. String and assignment operators $a = $b . $c; # Concatenate $b and $c $a = $b x $c; # $b repeated $c times $a = $b; # Assign $b to $a $a += $b; # Add $b to $a $a -= $b; # Subtract $b from $a $a .= $b; # Append $b onto $a

  32. Single and double quotes • $a = 'apples'; • $b = 'bananas'; • print $a . ' and ' . $b; • prints: apples and bananas • print '$a and $b'; • prints: $a and $b • print "$a and $b"; • prints: apples and bananas

  33. Arrays • @food = ("apples", "bananas", "cherries"); • But… • print $food[1]; • prints "bananas" • @morefood = ("meat", @food); • @morefood == ("meat", "apples", "bananas", "cherries"); • ($a, $b, $c) = (5, 10, 20);

  34. push and pop • push adds one or more things to the end of a list • push (@food, "eggs", "bread"); • push returns the new length of the list • pop removes and returns the last element • $sandwich = pop(@food); • $len = @food; # $len gets length of @food • $#food # returns index of last element

  35. foreach # Visit each item in turn and call it $morsel foreach $morsel (@food) { print "$morsel\n"; print "Yum yum\n"; }

  36. Tests • “Zero” is false. This includes:0, '0', "0", '', "" • Anything not false is true • Use == and != for numbers, eq and ne for strings • &&, ||, and ! are and, or, and not, respectively.

  37. for loops • for loops are just as in C or Java • for ($i = 0; $i < 10; ++$i){ print "$i\n";}

  38. while loops #!/usr/local/bin/perl print "Password? "; $a = <STDIN>; chop $a; # Remove the newline at end while ($a ne "fred") { print "sorry. Again? "; $a = <STDIN>; chop $a;}

  39. do..while and do..until loops #!/usr/local/bin/perl do { print "Password? "; $a = <STDIN>; chop $a; } while ($a ne "fred");

  40. if statements if ($a) { print "The string is not empty\n"; } else { print "The string is empty\n"; }

  41. if - elsif statements if (!$a) { print "The string is empty\n"; } elsif (length($a) == 1) { print "The string has one character\n"; } elsif (length($a) == 2) { print "The string has two characters\n"; } else { print "The string has many characters\n"; }

  42. Why Perl? • Two factors make Perl important: • Pattern matching/string manipulation • Based on regular expressions (REs) • REs are similar in power to those in Formal Languages… • …but have many convenience features • Ability to execute UNIX commands • Less useful outside a UNIX environment

  43. Basic pattern matching • $sentence =~ /the/ • True if $sentence contains "the" • $sentence = "The dog bites.";if ($sentence =~ /the/) # is false • …because Perl is case-sensitive • !~ is "does not contain"

  44. RE special characters . # Any single character except a newline ^ # The beginning of the line or string $ # The end of the line or string * # Zero or more of the last character + # One or more of the last character ? # Zero or one of the last character

  45. RE examples ^.*$ # matches the entire string hi.*bye # matches from "hi" to "bye" inclusive x +y # matches x, one or more blanks, and y ^Dear # matches "Dear" only at beginning bags? # matches "bag" or "bags" hiss+ # matches "hiss", "hisss", "hissss", etc.

  46. Square brackets [qjk] # Either q or j or k [^qjk] # Neither q nor j nor k [a-z] # Anything from a to z inclusive [^a-z] # No lower case letters [a-zA-Z] # Any letter [a-z]+ # Any non-zero sequence of # lower case letters

  47. More examples [aeiou]+ # matches one or more vowels [^aeiou]+ # matches one or more nonvowels [0-9]+ # matches an unsigned integer [0-9A-F] # matches a single hex digit [a-zA-Z] # matches any letter [a-zA-Z0-9_]+ # matches identifiers

  48. More special characters \n # A newline \t # A tab \w # Any alphanumeric; same as [a-zA-Z0-9_] \W # Any non-word char; same as [^a-zA-Z0-9_] \d # Any digit. The same as [0-9] \D # Any non-digit. The same as [^0-9] \s # Any whitespace character\S # Any non-whitespace character \b # A word boundary, outside [] only \B # No word boundary

  49. Quoting special characters \| # Vertical bar \[ # An open square bracket \) # A closing parenthesis \* # An asterisk \^ # A carat symbol \/ # A slash \\ # A backslash

  50. Alternatives and parentheses jelly|cream # Either jelly or cream (eg|le)gs # Either eggs or legs (da)+ # Either da or dada or # dadada or...

More Related