1 / 58

An Introduction to Perl

An Introduction to Perl. Sources and inspirations: http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom Christiansen, “Learning Perl” 2nd ed., O’Reilly Randal L. Schwartz and Tom Phoenix, “Learning Perl” 3rd ed., O’Reilly

iria
Download Presentation

An Introduction to Perl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Perl Sources and inspirations: http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom Christiansen,“Learning Perl” 2nd ed., O’Reilly Randal L. Schwartz and Tom Phoenix,“Learning Perl” 3rd ed., O’Reilly Dr. Nathalie Japkowicz, Dr. Alan Williams Go O'Reilly! CSI 3125, Perl, page 1

  2. Perl overview (1) • Perl = Practical extraction and report language • Perl = Pathologically eclectic rubbish lister  • It is a powerful general-purpose language, which is particularly useful for writing “quick and dirty” programs. • Invented by Larry Wall, with no apologies for its lack of elegance (!). • If you know C and a fair bit of Unix (or Linux), you can learn Perl in days (well, some of it...).

  3. Perl overview (2) • In the hierarchy of programming language, Perl is located half-way between high-level languages such as Pascal, C and C++, and shell scripts (languages that add control structure to the Unix command line instructions) such as sh, sed and awk. • By the way: • awk = Aho, Weinberger, Kernighan • sed = Stream Editor.

  4. Advantages of Perl (1) • Perl combines the best (according to its admirers ) features of: Unix/Linux shell programming, The commands sed, grep, awk and tr, C, Cobol. • Shell scripts are usually written in many small files that refer to each other. Perl achieves the functionality of such scripts in a single program file.

  5. Advantages of Perl (2) • Perl offers extremely strong regular expression capabilities, which allow fast, flexible and reliable string handling operations, especially pattern matching. As a result, Perl works particularly well in text processing applications. • As a matter of fact, it is Perl that allowed a lot of text documents to be quickly moved to the HTML format in the early 1990s, allowing the Web to expand so rapidly.

  6. Disadvantages of Perl • Perl is a jumble! It contains many, many features from many languages and tools. • It contains different constructs for the same functionality (for example, there are at least 5 ways to perform a one-line if statement). It is not a very readable language. • You cannot distribute a Perl program as an opaque binary. That is, you cannot really commercialize products you develop in Perl.

  7. Perl resources and versions • http://www.perl.org tells you everything that you want to know about Perl. • What you will see here is Perl 5. • Perl 5.8.0 has been released in July 2002. • Perl 6 (http://dev.perl.org/perl6/) is the next version, still under development, but moving along nicely. The first book on Perl 6 is in stores (http://www.oreilly.com/catalog/perl6es).

  8. Scalar data: strings and numbers Scalars need not to be defined or their types declared:Perl understands from context. % cat hellos.pl #!/usr/bin/perl -w print "Hello" . " " . "world\n"; print "hi there " . 2 . " worlds!" ."\n"; print (("5" + 6) . " eggs\n" . " in " . " 3 + 2 = " . ("3" + "2") . " baskets\n" ); invoke Perl % hellos.pl Hello world hi there 2 worlds! 11 eggs in 3 + 2 = 5 baskets

  9. Scalar variables Scalar variable names start with a dollar sign. They do not have to be declared. % cat scalar.pl #!/usr/bin/perl -w $i = 1; $j = "2"; print "$i and $j \n"; $k = $i + $j; print "$k\n"; print $i . $j . "\n"; print '$k\n' . "\n"; % scalar.pl 1 and 2 3 12 $k\n

  10. Quotes and substitution Suppose $x = 3 Single-quotes ' ' allow no substitution except for the escape sequences \\ and \'. print('$x\n');gives$x\n and no new line. Double-quotes " " allow substitution of variables like $x and control codes like \n (newline). print("$x\n");gives 3 (and a new line). Back-quotes ` ` also allow substitution, then try to execute the result as a system command, returning as the final value whatever the system command outputs. $y = `date`;print($y); results in Sun Aug 10 07:04:17 EDT 2003

  11. Control statements: if, else, elsif % cat names.pl #!/usr/bin/perl -w $name = <STDIN>; chomp($name); if ($name gt 'fred') { print "'$name' follows 'fred'\n";} elsif ($name eq 'fred') { print "both names are 'fred'\n";} else { print "'$name' precedes 'fred'\n";} % names.pl stan 'stan' follows 'fred' standard input cut newline % names.pl Stan 'Stan' precedes 'fred' my input Perl's output

  12. Control statements: loops (1) % cat oddsum_while.pl #!/usr/bin/perl -w # Add up some odd numbers $max = <STDIN>; $n = 1; while ($n < $max) { $sum += $n; $n += 2; } # On to the next odd number print "The total is $sum.\n"; % oddsum_while.pl 10 Use of uninitialized value at oddnums.pl line 6, <STDIN> chunk 1. The total is 25. my input a warning Perl's output

  13. Control statements: loops (2) • End-line comments begin with # • It is okay, though not nice, to use a variable without initialization (like $sum). Such a variable is initialized to 0 if it is first used as a number or to the empty string "" if it is first used as a string. In fact, it is always undef, variously converted. • Perl can, if asked, issue a warning (use the -w flag). • Of course, while is only one of many looping constructs in Perl. Read on...

  14. Control statements: loops (3) % cat oddsum_until.pl #!/usr/bin/perl -w # Add up some odd numbers $max = <STDIN>; $n = 1; $sum = 0; until ($n >= $max) { $sum += $n; $n += 2; } # On to the next odd number print "The total is $sum.\n"; % oddsum_until.pl 10 The total is 25.

  15. Control statements: loops (4) % cat oddsum_for.pl #!/usr/bin/perl -w # Add up some odd numbers $max = <STDIN>; $sum = 0; for ($n = 1 ; $n < $max ; $n += 2) { $sum += $n; } print "The total is $sum.\n"; % oddsum_for.pl 10 The total is 25. We also have do-while and do-until, and we have foreach. Read on.

  16. Control statements: loops (5) % cat oddsum_foreach.pl #!/usr/bin/perl -w # Add up some odd numbers $max = <STDIN>; $sum = 0; foreach $n ( (1 .. $max) ) { if ( $n % 2 != 0 ) { $sum += $n; } } print "The total is $sum.\n"; % oddsum_foreach.pl 10 The total is 25.

  17. Control constructs compared

  18. Lists and arrays • A list is an ordered collection of scalars. An array is a variable that contains a list. • Each element is an independent scalar value. A list can hold numbers, strings, undef values—any mixture of kinds of scalar values. • To use an array element, prefix the array name with a $; place a subscript in square brackets. • To access the whole array, prefix its name with a @. • You can copy an array into another. You can use the operators sort, reverse, push, pop, split.

  19. Command-line arguments Suppose that a Perl program stored in the file cleanUp is invoked in Unix/Linux with the command: cleanUp -o result.htm data.htm The built-in list named @ARGV then contains three elements: ('-o', 'result.htm', 'data.htm') These three element can be accessed as: $ARGV[0] $ARGV[1] $ARGV[2]

  20. Array examples (1) % cat arraysort.pl #!/usr/bin/perl -w $i = 0; while ($k = <STDIN>) { $a[$i++] = $k; } print "===== sorted =====\n"; print sort(@a); % arraysort.pl Nathalie Frank hello John Zebra notary nil ===== sorted ===== Frank John Nathalie Zebra hello nil notary control-D here

  21. Array examples (2A) Reversing a text file (whole lines). % cat whole_rev.pl #!/usr/bin/perl -w while ($k = <STDIN>) { push(@a, $k); } print "== reversed ==\n"; while ($oldval = pop(@a)) { print $oldval; } % whole_rev.pl a b c d e f g h i == reversed == g h i e f a b c d control-D here

  22. Array examples (2B) Reversing each line in a text file % cat each_rev.pl #!/usr/bin/perl -w while($k = <STDIN>) { @a = split(/\s+/, $k); $s = ""; for ($i = @a; $i > 0; $i--) { $s = "$s$a[$i-1] "; } chop($s); print "$s\n" } % each_rev.pl a bc d efg efg d bc a hi j j hi klm nopq st st nopq klm control-D split cuts the line on white space (we will see regular expressions soon) output

  23. Array examples (3) Reversing a text file (whole lines) print reverse(<STDIN>); Reversing each line in a text file while($k = <STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/, $k))) { $s = "$s$i "; } chop($s); print "$s\n"; }

  24. A digression:Perl's favourite default variable by default,Perl reads into $_ while(<STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/, $_))) { $s = "$s$i "; } chop($s); print "$s\n"; } by default,Perl splits $_ too! while(<STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/ ))) { $s = "$s$i "; } chop($s); print "$s\n"; }

  25. Hashes • A hash is similar to an array, but instead of subscripts, we can have anything as a key, and we use curly brackets rather than square brackets. • The official name is associative array (known to be implemented by hashing ). • Keys and values can be any scalars; keys are always converted to strings. • To refer to a hash as a whole, prefix its name with a %. • If you assign a hash to an array, it becomes a simple list.

  26. Hash examples I (1) % cat hash_array.pl #!/usr/bin/perl -w %some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello", "wilma", 1.72e30, "betty", "bye\n"); @an_array = %some_hash; print "@an_array\n========\n"; foreach $key (keys %some_hash) { print "$key: "; print delete $some_hash{$key}; print "\n"; }

  27. Hash examples I (2) % hash_array.pl betty bye wilma 1.72e+30 foo 35 2.5 hello bar 12.4 ======== betty: bye wilma: 1.72e+30 foo: 35 2.5: hello bar: 12.4 %some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello", "wilma", 1.72e30, "betty", "bye\n"); @an_array = %some_hash; print "@an_array\n========\n"; foreach $key (keys %some_hash) { print "$key: "; print delete $some_hash{$key}; print "\n"; }

  28. Hash examples II % cat hash_arrows.pl #!/usr/bin/perl -w my %hash = ( "a" => 1, "b" => 2, "c" => 3); foreach $key (sort keys %hash) { $value = $hash{$key}; print "$key => $value\n"; } % hash_arrows.pl a => 1 b => 2 c => 3

  29. A brief interlude:the diamond operator % cat concat #!/usr/bin/perl -w while ( <> ) { print $_; } % cat a one-a two-a % cat b three-b four-b five-b % concat a b one-a two-a three-b four-b five-b <> loops over the files listed as command-line arguments; $_ is the current input line % concat a b >c % cat c one-a two-a three-b four-b five-b

  30. Hash examples III:character frequency count % cat frequency.pl #!/usr/bin/perl -w while (<>) { # split $_ into single characters, loop foreach $c (split //) { # Increment $count of $c ++$count{$c}; } } # end of input, print %count for $c (sort keys %count) { print "$c\t$count{$c}\n"; }

  31. Character frequency count (2) \n 8 2 1 2 F 2 J 2 N 2 a 5 e 3 h 4 i 1 l 3 n 2 o 3 r 4 t 3 y 1 space % frequency.pl Nathalie Fran hello John rather Notary F 1 J 1 ^D

  32. Subroutines • A subroutine is a user-defined function. The syntax is very simple; so is the semantics. #!/usr/bin/perl sub max { if ( $x > $y ) { $x } else { $y } } $x = 10; $y = 11; print &max . "\n"; • There are no arguments; the script accesses two global variables. The subroutine call is marked with &. The value returned is that of the last expression evaluated.

  33. Subroutines (2) A few housekeeping rules. • You can place your definitions anywhere in the file, though it is recommended to have them at the beginning. • Perl always uses the latest definition in the file—any preceding one is ignored. • Certain elements of the syntax are optional. • The & might sometimes be omitted (but it is not a good idea). • The return operator may precede a value to be returned (this can be useful): if ( $x > $y ) { return $x } else { return $y }

  34. Subroutines (3) • Clearly, the use of global variables is much too limited. Subroutines take arguments, and work on them via a predefined list variable @_ or its elements $_[0], $_[1] and so on. #!/usr/bin/perl sub max { if ( $_[0] > $_[1] ) { $_[0] } else { $_[1] } } print &max ( 12, 13 ) . "\n";

  35. Subroutines (4) • $_[0], $_[1] are not fun to work with. We can rename them locally, using the my operator—it creates a sub's private variables. Here, we declare two such variables and right away initialize them. #!/usr/bin/perl sub max { my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b } } print &max ( 15, 14 ) . "\n";

  36. Subroutines (5) • But: this is not a safe max calculation. #!/usr/bin/perl sub max { my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b } } print &max ( 16, 19, 23 ) . "\n"; print &max ( 26 ) . "\n"; • This produces 19 (23 gets ignored) and 26 (the second value is undef, that is, 0).

  37. Subroutines (6) • We could stop the subroutine if the number of arguments is wrong. The (generally very useful!) operator die does that for us. #!/usr/bin/perl sub max { if ( @_ != 2 ) { die "max needs two arguments: @_\n"; } my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b } } print &max ( 16, 19, 23 ) . "\n"; The script is stopped after printing this: max needs two arguments: 16 19 23

  38. Subroutines (7) • We can have just a warning, if we use the operator warn instead. #!/usr/bin/perl sub max { if ( @_ != 2 ) { warn "max needs two arguments: @_\n"; } my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b } } print &max ( 16, 19, 23 ) . "\n"; The script prints this: max needs two arguments: 16 19 23 19

  39. Subroutines (8) • It is, by the way, not a bad idea to generalize max by allowing it to take any number of arguments. #!/usr/bin/perl sub max { my ( $curr_max ) = shift @_; foreach ( @_ ) { if ( $_ > $curr_max ) { $curr_max = $_; } } $curr_max } print &max ( 15, 14 ) . "\n"; print &max ( 16, 19, 23 ) . "\n"; print &max ( 26 ) . "\n";

  40. Subroutines (9) • This even works for empty lists. #!/usr/bin/perl sub max { my ( $curr_max ) = shift @_; foreach ( @_ ) { if ( $_ > $curr_max ) { $curr_max = $_; } } $curr_max } $z = &max ( ); if ( defined $z ) { print $z . "\n"; } else { print "undefined\n"; }

  41. Regular expressions (1) • A regular expression (also called a pattern) is a template that describes a class of strings. A string can either match or not match the pattern. • The simplest pattern is one character. • A character class—the pattern matches any of these characters—is written in square brackets: [01234567] an octal digit [0-7] an octal digit [0-9A-F] a hex digit [^A-Za-z] not a letter (^ "negates") [0-9-] a decimal digit or a minus

  42. Regular expressions (2) • Metacharacters: . (dot) any character except \n • Anchors: ^ the beginning of a string $ the end of a string • Multipliers: * repeat the preceding item 0 or more times + repeat the preceding item 1 or more times ? make the preceding item optional {n} repeat n times {n, m} repeat n to m times (n <= m) {n,} repeat n or more times

  43. Regular expressions (3) • The Boolean operator =~ tries to match a string with a regular expression written inside slashes. $x = "01239876AGH"; if ( $x =~ /^0[1-9]{4,}/ ) { print "yes1\n"; } if ( $x =~ /[A-Z]{3}$/ ) { print "yes2\n"; } if ( $x =~ /^.*[A-Z]{4}$/ ) { print "yes3\n"; }

  44. Regular expressions (4) $x = "01239876AGH"; if ( $x =~ /([0-9]{4}|[A-Z]{3}){2,}/ ) { print "yes4\n"; } if ( $x =~ /(0?|4)(5|[1abc]{1,})/ ) { print "yes5\n"; } • Patterns can be grouped by parentheses (the whole pattern becomes one item).Alternative is denoted by the bar |.

  45. Regular expressions (5) • The precedence of pattern elements: parentheses ( ) multipliers * + ? {n} {n,m} {n,} sequence, anchors ^ $ alternation | • Some character classes are predefined: class not class digit \d\D word char [a-zA-Z0-9_] \w\W whitespace \s\S • Some additional anchors: word boundary \b\B

  46. Regular expression examples (1) $i = "Jim"; match $i =~ /Jim/;yes $i =~ /J/;yes $i =~ /j/;no $i =~ /j/i;yes $i =~ /\w/;yes $i =~ /\W/;no Case is ignored in matching if the postfix i is used.

  47. Regular expression examples (2) $j = "JjJjJjJj"; $j =~ /j*/;yes: matches anything $j =~ /j+/;yes: matches the first j $j =~ /j?/;yes: matches the first j $j =~ /j{2}/;no $j =~ /j{2}/i;yes: ignores case $j =~ /(Jj){3}/;yes

  48. Regular expression examples (3) $k = "Boom Boom, out go the lights!"; $k =~ /Jim|Boom/;# yes: matches Boom $k =~ /(Boom){2}/;# no: a space between Booms $k =~ /(Boom ){2}/;# no: fails on the comma $k =~ /(Boom\W){2}/;# yes: \W is space, comma $k =~ /\bBoom\b/;# yes $k =~ /\bBoom.*the\b/;# yes $k =~ /\Bgo\B/;# no: "go" is a complete word $k =~ /\Bgh\B/;# yes: the "gh" inside "lights"

  49. Regular expression substitution (1) We can modify a string variable by applying a substitution. The operator is =~ and the substitution is written as: s/pattern1/pattern2/ $v = "a string to play with"; $v =~ s/^\w+/just a single/; print "$v\n"; just a single string to play with

  50. Regular expression substitution (2) Matched patterns are remembered in built-in variables$1, $2, $3 etc. These variables keep their values till the next matching operation. Each set of paretheses in a pattern corresponds to a "memory" variable. # $v == "just a single string to play with" $v =~ s/(\b\w*\b)(.*)/'$1'$2/; print "$v\n"; print "$2, $1 $1\n"; 'just' a single string to play with a single string to play with, just just

More Related