560 likes | 628 Views
Useful links & books. http://www.activestate.com http://www.netcat.co.uk/rob/perl/win32perltut.html http://www.ebb.org/PickingUpPerl Randal L. Schwartz, Learning Perl , 3 rd Ed, July 2001 David N. Blank-Edelman, Perl for System Administration , O’Reilly, 1 st Ed, July 2000.
E N D
Useful links & books • http://www.activestate.com • http://www.netcat.co.uk/rob/perl/win32perltut.html • http://www.ebb.org/PickingUpPerl • Randal L. Schwartz, Learning Perl, 3rd Ed, July 2001 • David N. Blank-Edelman, Perl for System Administration, O’Reilly, 1st Ed, July 2000
Basic Concepts What is Perl? • Perl is a programming language • The best language for processing text • Cross platform, free & open • Microsoft have invested heavily in ActiveState to improve support for Windows in Perl • Has excellent connection to the operating system • Has enormous range of modules for thousands of application types
Basic Concepts Eclectic • Borrows ideas from many languages, including: • C, C++ • Shell • Lisp • BASIC • …even Fortran • Many others…
Basic Concepts Why should I learn it? • Consider a real-life sys-admin problem: • You must make student accounts for 1500 students • TEACHING BEGINS TOMORROW!!! • The admission system gives you student enrollment data ….… but it can only give you PDF files with irregular format which • Has a variable number of lines before the student data begins • Has a variable number of columns between different files • Has many rows per enrolled student ……….
Basic Concepts Sample data for new courses: 15 N CHAN Tai Man M 991111111 X123456(7) 28384858 CHEUNG 10-SEP-01 10-SEP-01 29394959 TAI MO BROTHER 91234567 • There is a different number of lines above the student records • There is a different number of characters within each column from file to file • There are many files
Basic Concepts Solution in Perl elsif ( m!^\s*Course :\s(\d+)/(\d)\s! ) { $course = $1; $year = $2; next; } if ( my ( $name, $gender, $student_id, $hk_id ) #= m!\s\s+([A-Z]+(?: [A-Z][a-z]*)+)\s\s+([MF])\s+(\d{9})\s\s+([a-zA-Z]\d{6}\([\dA-Z]\))! ) = m{ \s\s+ # at leaset 2 spaces ( # this matches $name [A-Z]+ # family name is upper case (?:\s[A-Z][a-z]*)+ # one or more given names ) \s\s+ # at leaset 2 spaces ([MF]) # gender \s+ # at least one space (\d{9}) # student id is 9 digits \s\s+ # at leaset 2 spaces ([a-zA-Z]\d{6}\([\dA-Z]\)) # HK ID }x ) { print "sex=$gender, student ID = $student_id, ", "hkID = $hk_id, course = $course, name=$name, ", defined $year ? "year = $year\n" : "\n"; next; } warn "POSSIBLE UNMATCHED STUDENT: $_\n" if m!^\s*\d+\s+!; } #! /usr/bin/perl -w use strict; my $course; my $year; while ( <> ) { chomp; if ( /^\s*Course :\s(\d+)\s/ ) { $course = $1; undef $year; next; }
Basic Concepts Why not using other language? • This program took a very short time to write. It is very robust, and has great portability • Perl is strongly supported by Microsoft • Perl is optimized for text and systems administration programming. For problems such as the irregular admission data described before, Perl is second to no other programming language • Other language will require much more coding. • The Perl solution given in last slide has: • comments • plenty of space to show structure • …and handles exceptional situations (i.e., it is robust)
Basic Concepts A simple Perl program • A simple Perl program: print “Hello World\n”; • Use a text editor to type the above program • Save the file as hello.pl • (be careful not to save the file as .txt or .doc) • ‘run’ the file in a command prompt, such as • c:\perl>perl hello.pl • Where c:\perl is the directory in which you saved the hello.pl file. • Note that each Perl program is compiled and run each time
Basic Concepts Variables • There are three basic types of variable: • Scalar (can be a number or string or…) • Array (an ordered array of scalars) • Hash (an unordered array of scalars indexed by strings/keys instead of numbers)
Basic Concepts @array and %hashes • @array • Starts with a @ • Indexes start at 0, like in C • %hashes • Unfamiliar concept to many of you • Like an array, but indexed by strings/keys • A data structure like a database
Basic Concepts $scalars: • Start with a dollar sign $ • Hold a single value, not a collection • A string is a scalar, so is a number • Example program: $string = "snm"; $num1 = 100; $num2 = 1.01; print "The string is $string, number 1 is $num1 and number 2 is $num2\n"; • Run this program, result being displayed is The string is snm, number 1 is 100 and number 2 is 1.01
Basic Concepts Typing • Typing: declaring the type of variable • A basic step in almost all programming languages. • Example: int x=10 • In previous example, there is no typing, no declaration of the type of variable. • Typing is important in large programming projects. • In short programs, it is much easier and efficient not using typing/declaration • The variable prefix $ makes it easy for you to see where the variables are and what type of variable it is. • Perl is said to be a loosely typed language as opposed to other strongly typed languages such as C++
Basic Concepts Variable Interpolation $string="perl"; $num=20; print "Doubles: The string is $string and the number is $num\n"; print 'Singles: The string is $string and the number is $num\n'; • Result of running the above program: Doubles: The string is perl and the number is 20 Singles: The string is $string and the number is $num\n • Note that double quotes supports variable interpolation, single quotes does not.
Basic Concepts Escaping • Escaping means that just the special symbols/characters ($, @, #, etc.) are printed instead of referring to the variables. • The special characters, such as $, @ and #, are also called metacharacters. • Escaping also turns some non-special characters into something special, like the new line \n example. • What will the following command print? print "home directory is c:\\perl\\\n"; $num1 = 10; print “\$num1 is $num1";
Basic Concepts Strings and Increments $string="perl"; $num=20; print "The string is $string and the number is $num\n"; $num++; $string++; print "The incremented string is $string and the incremented number is $num\n"; • A number can be incremented, so is a string. • What will be displayed when the above program is run?
Basic Concepts Print: a list operator • The print function is a list operator. It accepts a list of things, separated by commas, to print. For example, $var="Perl"; $num=10; print "Two \$nums are $num * 2 and adding one to \$var makes $var++\n"; print "Two \$nums are ", $num * 2," and adding one to \$var makes ", $var++,"\n"; print "\$var is now $var\n"; • Notice the difference between the three lines being printed.
Basic Concepts Subroutines • A subroutine is a user-defined function that is used more than once in a program. Subroutines can be in the beginning, middle or end of a Perl program (or script). • A subroutine is defined by starting with sub and then its name. A pair of curly bracket { } is required to enclose the code of the subroutine. The area between the two brackets is called a block. The prefix & is used when calling a subroutine. • For example, $num=10; # sets $num to 10 &print_results; # prints variable $num $num++; &print_results; $num*=3; &print_results; sub print_results { print "\$num is $num\n"; }
Basic Concepts Test: ‘truth’ in Perl • To test for truth, we can use if, while and unless functions. • What is ‘true’ in Perl • Any string is true except for "" and "0". • Any number is true except for 0. This includes negative numbers. • Any undefined variable is false. A undefined variable is one which doesn't have a value, ie has not been assigned to. • For example; • $day=“Monday"; • if ($day eq “Monday") { print “Go to school\n"; }
Basic Concepts Test for truth: single variable &isit; # $test1 is at this moment undefined $test1="hello"; # a string, not equal to "" or "0" &isit; $test1=0.0; # $test1 is now a number, effectively 0 &isit; $test1="0.0"; # $test1 is a string, but NOT effectively 0 ! &isit; sub isit { if ($test1) {print "$test1 is true\n"; } # tests $test1 for truth or not else {print "$test1 is false\n"; } # else statement if it is not true }
Basic Concepts Test for truth: expression $x=5; $y=5; if ($x - $y) { print '$x - $y is ',$x-$y," which is true\n"; } else { print '$x - $y is ',$x-$y," which is false\n"; }
Basic Concepts Multiple tests with elseif $age=25; $max=30; $min=18; #Example 1, using two if statements if ($age > $max) { print "Too old !\n"; } if ($age < $min) { print "Too young !\n"; } #Example 2, using elseif if ($age > $max) { print "Too old !\n"; } elsif ($age < $min) { print "Too young !\n"; } else { print "Just right !\n"; } • Compare example 1 and 2 to note the usefulness of elseif
Basic Concepts Comparison: Equality $num1=15; $num2=15; if ($num1 == $num2) { print "num1 equal num2.\n"; } else { print "num1 not equal num2.\n";} $name1 = 'Mark'; $name2 = 'Tony'; if ($name1 == $name2) { print "name1 equal name2.\n"; } else { print "name1 not equal name2\n"; } • Is Mark equal to Tony? Yes, numerically! • To compare string with string, use eq rather than ==.
Basic Concepts Comparison Operators • Comparison Numeric String Equal == eq Not equal != ne Greater than > gt Less than < lt Greater than or equal to >= ge Less than or equal to <= le
Basic Concepts User Input: STDIN • To get input from user through keyboard, use the <STDIN> function. For example, print "Please tell me your name: "; $name=<STDIN>; print "Your input is, $name\n"; • Compare the above example with the one below print "Please tell me your name: "; $name=<STDIN>; # these two lines can be reduced to just chop $name; # one line as chop($name=<STDIN>) print "Your input is, $name\n" # it is better/safer to use chomp rather than chop • The first example will take in the name input as well as the enter (new line) character • In the second example, the chop function remove the last character input.
Basic Concepts Arrays • Perl has two types of array: arrays and associative arrays (or, hashes). We will first talk about arrays which you are quite familiar with. • Array is an ordered list of scalar variables, as demonstrated in the example below. @names=("Muriel","Gavin","Susanne","Sarah","Anna"); print "The elements of \@names are @names\n"; print "The first element is $names[0] \n"; print "The third element is $names[2] \n"; print 'There are ',scalar(@names)," elements in the array\n"; • Notice the @ prefix for referring to arrays. Also notice that you can refer to individual variable in the list, or, the whole collection of variables.
Basic Concepts $scalar and @array • The following example shows the difference between arrays (with @ prefix) and single scalar variable (with $ prefix). $myvar="scalar variable"; @myvar=("one","element","of","an","array","called","myvar"); print $myvar; # refers to the contents of a scalar variable called myvar print "\n"; print $myvar[1]; # refers to the 2nd element of the array, notice the $ prefix! print "\n"; print @myvar; # refers to all the elements of array myvar print "\n"; print "@myvar "; # notice the better print format
Basic Concepts Accessing arrays • Example of accessing array print "Enter a number :"; chomp ($x=<STDIN>); @names=("Muriel","Gavin","Susanne","Sarah","Anna"); print "You requested element $x who is $names[$x]\n"; print "The first two elements are @names[0,1]\n"; print "The first three elements are @names[0..2]\n"; print "The last element is @names[-1]\n"; print "The index number of the last element is $#names \n"; • Notice that an element in an array is index by a number ($x = 1, 2, 3, etc.). • Also notice that the last line returns the index number of the last element.
Basic Concepts Arrays and For Loops • A simple for loop for accessing elements in array. @names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon"); for ($x=0; $x <= $#names; $x++) { print "$names[$x]\n"; } • For loop with range operator for $x (0 .. $#names) { print "$names[$x]\n"; } • The foreach function foreach $person (@names) { print "$person"; } • The Default Input and Pattern Searching Variable. foreach (@names) { print "$_"; } # If a variable is not specified, $_ is used by default foreach (@names) { print ; } # $_ is printed by default. $_ is used even it is written.
Basic Concepts Premature end to for loop • Stopping a for loop while (1) { $x++; print "$x: You can press CTRL-C to interrupt a perl program?\n"; } • The last operator @names=('Mrs Smith','Mr Jones','Ms Samuel','Dr Jansen','Sir Philip'); foreach $person (@names) { print "$person\n"; last if $person=~/Dr /; } • The ~/Dr/ is an example of regular expression that we would discuss later.
Basic Concepts Changing the elements of an array • Adding a new element with push operator print "Enter a name :"; chomp ($x=<STDIN>); @names=("Muriel","Gavin","Susanne","Sarah"); print "@names\n"; push (@names, $x); print "@names\n"; • Other operators are • Pop : removes and returns value from end of array • Shift : removes and returns value from beginning of array • Unshift : add value to the beginning of array
Regular Expressions Regular Expressions (regex) • Perl is widely renowned for excellence in text processing, and regular expressions are one of the big factors behind this fame. • What is a regular expression? A regular expression is simply a string that describes a pattern. 一Patterns are in common use these days; examples are the patterns typed into a search engine to find web pages and the patterns used to list files in a directory, e.g., ls *.txt or dir *.*. • In Perl, the patterns described by regular expressions are used to search strings, extract desired parts of strings, and to do search and replace operations. • Regular expressions are constructed using simple concepts like conditionals and loops. • 'regular expression' is often abbreviated as regexp or regex.
Regular Expressions Regex - a simple word matching "Hello World" =~ /World/; # matches • "Hello World" is a simple double quoted string • World is the regular expression • the // enclosing /World/ tells Perl to search a string for a match. • The operator =~ associates the string with the regexp match and produces a true value if the regexp matched, or false if the regexp did not match. • In this case, World matches the 2nd word in "Hello World", so the expression is true. • Expressions like this are useful in conditionals: if ("Hello World" =~ /World/) { print "It matches\n"; } else { print "It doesn't match\n“;}
Regular Expressions Why use regex • The following 2 program, they can do the same function of “exact match” print "What do you read before joining any Perl discussion ? "; chomp ($_=<STDIN>); print "Your answer was : $_\n"; if ($_ eq "the faq") {print "Right ! Join up !\n"; } else {print "You should enter 'the faq' in order to join !\n";} print "What do you read before joining any Perl discussion ? "; chomp ($_=<STDIN>); print "Your answer was : $_\n"; if ($_=~/the faq/) {print "Right ! Join up !\n"; } else {print "You should enter 'the faq' in order to join !\n";}
Regular Expressions Why use regex • The previous regex based program is slightly modified as below. • Using the /iswitch, which specifies case-insensitivity, the program works for all variations, such as "the Faq" and "the FAQ". And, the “the faq” can be within a text, such as, “I would like to read the FAQ section before I join Perl discussion”. print "What do you read before joining any Perl discussion ? "; chomp ($_=<STDIN>); print "Your answer was : $_\n"; if (/the faq/i) {print "Right ! Join up !\n"; } else {print "You should enter 'the faq' in order to join !\n";}
Regular Expressions Match & not match $_="perl for Win32"; # sets the string to be searched if ($_=~/perl/) { print "1 Found perl\n" }; # is 'perl' inside $_ ? $_ is "perl for Win32". if (/perl/) { print "2 Found perl\n" }; # same as above. Don't need the =~ if (/PeRl/) { print "3 Found PeRl\n" }; # this will fail because of case sensitivity if (/win32/i) { print “4 Found win32 (i)\n" }; # with /i, case insensitivity print "5 Found!\n" if / /; # if is put after the print, looking for a space print "6 Found!!\n" unless $_!~/ /; # ‘unless’ & ‘!’ are both negative $find=32; $find2=" for "; # create some variables to search for if (/$find/) { print "7 Found '$find'\n" }; # you can search for variables like numbers if (/$find2/) { print "8 Found '$find2'\n" }; # and of course strings !
Regular Expressions Character Class • In the example below, @namesis initialized using whitespace as a delimiter instead of a comma. A word ends with whitespace (like tabs, spaces, newlines etc). • qwrefers to 'quote words', which means split the list by words. • The square brackets enclose single characters to be matched. Here either Karlor Carlmust be in each element. @names=qw(Karlson Carleon Karla Carla Karin Carina KCarl Needanotherword); foreach (@names) { # sets each element of @names to $_ in turn if (/[KC]arl/) {print "Match ! $_\n";} else {print "Sorry. $_\n";}}
Regular Expressions Character Class – more example • The following example matches if something begins with K, C, or Z, then arl, then either s or a. Karl, however, will not match because there is no s or a. @names=qw(Karlson Carleon Karla Carla Karin Carina KCarl Needanotherword); foreach (@names) { # sets each element of @names to $_ in turn if (/[KCZ]arl[sa]/) {print "Match ! $_\n";} else {print "Sorry. $_\n";}} • /[KCZ]arl[^sa]/ will match with K, C or Z, then arl, and then anything EXCEPT s or a. The caret ^ is put at the start so that it produce a negation (except) effect. • /[abcdeZ]arl/ or /[a-eZ]arl/ will match with a, b, …. e, or Z, and then arl.
Regular Expressions More examples of regex matching • Matching at specific points • To match at the end of the line, use $, for example, /a$/ match those characters ending with a. • The caret ^ negates a character class, for example, [^KCZ]arl. • The caret ^ also matches at the beginning of the string, for example, /^n/i • Negating the regex • To negate the entire regex, change =~ to !~ (Remember ! means 'not equal to'.) • For example, ($_ !~/[KC]arl/)or, just (!/[KC]arl/)
Regular Expressions Return the match • With parenthesis ( ) around the regex, the first match is put into a variable called $1. • Regex can also return what is found to a specified variable, as show below. • $star has no value assigned because [0-9 ]* match 0 or more characters from 0 to 9 at the very start of the regex • $plus get the 2200 value because [0-9]+ match one or more characters from 0 to 9. Therefore, unless one 0 to 9 is found the match will fail. Once a 0-9 is found, the match continues as long as the next character is 0-9, then it stops. $_='The number is 2200';($one)=/([0-9])/; # return one digit only($star)=/([0-9]*)/; # * match zero or more character ($plus)=/([0-9]+)/; # + match one or more character print "$one, $star and $plus\n"; Result display is: 2, and 2200 $_='name is John';/(john)/i;print "name is $1 \n"; Result display is: name is John
Regular Expressions Return the match: $1, $2 • In the example below, the regex begin matching for ‘<‘, when ‘<‘ has been found, the first parenthesis ( ) start to match for ‘robert’ and put the value into the variable $1 when matched. • The regex then carry on by matching for the character @ ( \ escape the special function of @), when @ has been found, the 2nd parenthesis ( ) start to match for ‘netcat.co.uk’ and put the value into the variable $2 when matched. $_='My email address is <Robert@NetCat.co.uk>.'; /<(robert)\@(netcat.co.uk)>/i; print "$1 at $2\n"; # print result: Robert at NetCat.co.uk
Regular Expressions Match & return the unkown • The example on last slide requires that the email address be know in advanced. • In the example below, the regex does not know what to match for. • In the /(.*)/ regex, the dot or period ‘.’ match for any character, and the * match for zero or more characters before the dot/period. • Hence, the first line will print the whole sentence assigned to $_ • In the second line, the regex /<(.*)>/ match start with ‘<‘ and end with ‘>’, hence the parenthesis ( ) only put the string starting < and ending > into $1 $_='My email address is <staff@abc.com>.'; print "$1 \n" if /(.*)/i; #print result: My email …. abc.com>. print "$1 \n" if /<(.*)>/i; #print result: staff@abc.com print "$1\n" if /<([^>]+)/i; #same result as /<(.*)>/
Files Opening Files • Assume you have a file called data.txt in the c:\perl directory, the following program can open the file and read the content. • If the openoperation fails, the code next to or is evaluated. The code ‘dies’ means exiting the script. The $! tells at which line the program exits. • The special variable $. is the current line number, starting at 1. $data="c:/perl/data.txt"; open DATA, $data or die "Cannot open $data for read :$!"; while (<DATA>) { print "Line $. is : $_"; }
Files Opening Files • The line input operator, the angle brackets <>, reads from the beginning of the file up until and including the first newline. The read data goes into $_. On the next iteration of the loop data is read from where the last read left off, up to the next newline. And so on until there is no more data. When that happens the condition is false and the loop terminates. # This program is similar to the one on last slide $data="c:\\perl\\data.txt"; open DATA or die "Cannot open $data for read :$!"; print "Line $. is : $_" while (<DATA>);
Files Writing & appending data to a File • To open a file for writing data, add > to the filename. • To open a file for appending data, add >> to the filename. • When print to a file, just specify the filehandle name. • Closing a file after use is not mandatory unless you wish to open another file. $out="c:/perl/out.txt"; open OUT, ">$out" or die "Cannot open $out for write :$!"; # to append data use ">>$out" instead of ">$out " for $i (1..10) { print OUT "$i : The time is now : ",scalar(localtime),"\n"; } close OUT;
%myhash @myarray Key Index No. Value Value 0 NL The Netherlands The Netherlands 1 BE Belgium Belgium DE 2 Germany Germany 3 MC Monaco Monaco ES 4 Spain Spain Hash Associative Arrays • Arrays are an ordered list of scalar variables, which you access by their index number starting at 0. The elements in arrays always stay in the same order. • Hashes are a list of scalars, but instead of being accessed by index number, they are accessed by a key.
Hash @Array & %Hash • So if we want 'Belgium' from @myarray and also from %myhash , it'll be: • print "$myarray[1]"; • print "$myhash{'BE'}"; • Notice that the $ prefix is used, because it is a scalar variable. • Also, notice that hash uses braces { } instead of square brackets. • So why use hashes ? When you want to look something up by a keyword. • Suppose we wanted to create a program which returns the name of the country when given a country code. E.g., input ES, and the program return Spain. • You could do it with arrays, but more complicated.
Hash Hash example • Notice the way %countries is defined - exactly the same as a normal array, except that the values are put into the hash in key/value pairs. • Since the keys in the example below are all uppercase, need to use tr to change the input to uppercase. • Each key of a hash must be unique. You can not have two keys with the same name. If you do define a certain key twice, the second value overwrites the first. • The values of a hash can be duplicates, but never the keys. %countries=('NL','Netherlands','BE','Belgium','DE','Germany','MC','Monaco','ES','Spain'); print "Enter the country code:"; chop ($find=<STDIN>); $find=~tr/a-z/A-Z/; # change to uppercase, if input is lowercase print "$countries{$find} has the code $find\n";
Hash Array Vs Hash • So why use arrays ? One excellent reason is because when an array is created, its variables stay in the same order you created them in. • With a hash, Perl reorders elements for quick access. Use print to (as in the example below) to see that there is no recognizable sequence at all. • If you were writing code that stored a list of variables over time and you wanted it back in the order you found it in, don't use a hash. %countries=('NL','Netherlands','BE','Belgium','DE','Germany','MC','Monaco','ES','Spain'); print %countries;
Assigning $countries{PT}='Portugal'; Deleting delete $countries{NL}; All the keys print keys %countries; All the values print values %countries; A Slice of Hash print @countries{'NL','BE'}; How many elements ? print scalar(keys %countries); Does the key exist ? print "It's there !\n" if exists $countries{'NL'}; Accessing Hash
Hash: iteration • keys and values return a list. • we can iterate over a list, such as in the example below foreach (keys %countries) { print "The key $_ contains $countries{$_}\n"; } • Another example similar to above while (($code,$name)=each %countries) { print "The key $code contains $name\n"; } • The each function returns each key/value pair of the hash, and is slightly faster. In this example we assign them to a list (with the parens). Eventually there are no more pairs, which returns false to the while loop and it stops.