1k likes | 1.32k Views
Perl. Practical Extraction and Report Language. PERL language. Windows Perl-Win32 ActiveState Perl Linux use the whereis command to locate Perl sources Learning Perl , O’Reilly,ISBN 0-596-10105-8 http://www.comp.leeds.ac.uk/Perl/ Perl for Dummies , 2 nd ed., ISBN 0-7645-0460-6
E N D
Perl Practical Extraction and Report Language
PERL language • Windows • Perl-Win32 • ActiveState Perl • Linux • use the whereis command to locate Perl • sources • Learning Perl, O’Reilly,ISBN 0-596-10105-8 • http://www.comp.leeds.ac.uk/Perl/ • Perl for Dummies, 2nd ed., ISBN 0-7645-0460-6 • Perl by Example, Quigley, ISBN 0-13-028251-0
Command line • perl filename.pl • runs as a command line interface • use a text editor to make / save the .pl file
PERL • First line of the program • #!/usr/bin/perl –w • instructs perl to run with the warning option • not required in Windows versions • options • -c check syntax • -w many warnings enabled • -W all warnings enabled • -X disable all warnings • -v version • -e one line programs (immediate mode) • -d debugger
Comments • # character at the beginning of a line indicates a comment • can also appear in the middle of a line after a command • rest of line is ignored • blank lines are ignored
System Commands • the ` character (“backtick”) executes a system command
Perl statements • Conditional tests • Loops • Direct statements • open(INFILE, $TheFile) or die “The file $TheFile could not be found.\n”; • $LineCount = $LineCount + 1; • Statements end in “;”
Simple starts • print “This is a test”; • case sensitivity (print not PRINT) • Looping • while(condition) • { } #End of while loop
Scalar variables • Hold both strings and numbers • completely interchangeable • $priority = 9; • $priority = 'high'; • Accepts numbers as strings • $priority = '9'; • $default = '0009'; • can still cope with arithmetic and other operations quite happily • Variable names consists of numbers, letters and underscores • Case sensitive • should not start with a number • $_ is a special variable (many exist)
Math Operators • Perl uses all the usual C arithmetic operators: • $a = 1 + 2; # Add 1 and 2 and store in $a • $a = 3 - 4; # Subtract 4 from 3 and store in $a • $a = 5 * 6; # Multiply 5 and 6 • $a = 7 / 8; # Divide 7 by 8 to give 0.875 • $a = 9 ** 10; # Nine to the power of 10 • $a = 5 % 2; # Remainder of 5 divided by 2 • ++$a; # Increment $a and then return it • $a++; # Return $a and then increment it • --$a; # Decrement $a and then return it • $a--; # Return $a and then decrement it
String Operators • $a = $b . $c; # Concatenate $b and $c • $a = $b x $c; # $b repeated $c times • type man perlop for other operators
Perl Assignments • $a = $b; # Assign $b to $a • $a += $b; # Add $b to $a • $a -= $b; # Subtract $b from $a • $a .= $b; # Append $b onto $a
Interpolation • $a = 'apples'; • $b = 'pears'; • print $a.' and '.$b; • prints apples and pears using concatenation • Single quotes versus double quotes • print '$a and $b'; • prints literally$a and $b • print "$a and $b"; • double quotes force interpolation of any codes, including interpreting variables • Other codes that are interpolated include special characters such as newline (\n) and tab (\t)
Printing words • When printing a list of words to STDOUT • unquoted word must start w/alphanumeric character • remainder is a/n and underscore • Perl words are case sensitive • if unquoted, word could conflict with identifiers • If word has no special meaning to Perl • treated as if surrounded by single quotes
Literals • numeric • 12345 integer • 0b1101 binary • 0x456fff hex • 0777 octal (leading zero) • 23.45 float • .234E-2 scientific notation
Literals • string literals • \n newline • \t tab • \r carriage return • \f form feed • \b backspace • \a alarm/bell • \e escape • \0333 octal character • \xff hex character • \c[ control character • \l convert next char to lowercase • \u convert next to uppercase • \L convert chars to lower until “\E” found • \U • \Q backslash all following non-a/n until “\E” • \E ends upper / lower conversion • \\ backslash
Literals • special literals • _ _LINE_ _ • current line of the script • _ _FILE_ _ • name of the script • _ _END_ _ • logical end of the file • trailing text following will be ignored • CTRL-d (\004) in Unix • CTRL-z (\032) in MS-DOS • _ _DATA_ _ • indicates data contained in script instead of external file • _ _PACKAGE_ _ • current package (default is main)
Print function • prints a string or list of csv to Perl filehandle STDOUT • success = 1, fail = 0 • print “Hello”, “world”, “\n”; • Helloworld • print “Hello world\n”; • Hello world • print Hello, world, “\n”; • no comma allowed after filehandle at ./perl.s. line 1 • Perl thinks that ‘Hello’ is a filehandle • print STDOUT Hello, world, “\n”; • Helloworld • (no comma after STDOUT)
Printing literals • print “The price is $100.\n”; • The price is . • print “The price is \$100.\n”; • The price is $100. • print “The price is \$”,100,”.\n”; • The price is $100. • print “The binary number is converted to: “ 0b10001,”.\n”; • The binary number is converted to: 17. • print “The octal number is converted to: “,0777,”.\n”; • The octal number is converted to: 511. • print “The hex number is converted to: “,0xAbcF,”.\n”; • The hex number is converted to: 43983. • print “The unformatted number is “, 14.56,”.\n”; • The unformatted number is 14.56.
printf • prints a formatted string to a filehandle (STDOUT is default) • printf(“The name is %s and the number is %d\n”, John, 50); • John subs for the %s • 50 subs for %d
Printing without quotes • the “here” document • print from ‘here to here’ • delimited text $price = 1000; print <<EOF; the consumer said, “As I look over my budget, I’d say the price of $price is right. I’ll give you \$500 to start.”\n EOF • The consumer said, “As I look over my budget, I’d say the price of $1000 is right. I’ll give you $500 to start.” • $price is interpolated (between double quotes)
Printing without quotes $price = 1000; print <<‘FINIS’; the consumer said, “As I look over my budget, I’d say the price of $price is too much.\n I’ll settle for $500.” FINIS • The consumer said, “As I look over my budget, I’d say the price of $price is too much.\n I’ll settle for $500.” • $price is not interpolated (delimiter is in single quotes)
Printing without quotes print << x 2; Here’s to a new day. Woo-hoo! (blank line) print “\nLet’s do some stuff.\n”; print <<`END`; # backtick executes system commands echo Today is date END • Output Here’s to a new day. Woo-hoo! Here’s to a new day. Woo-hoo! Let’s do some stuff. Today is Sun Mar 19 12:48:36 EST 2006
Arrays • @food = ("apples", "pears", "eels"); • @music = ("whistle", "flute"); • $food[2] • returns “eels” (index is 0-based) • $ used as it’s a scalar now and not an array • @moremusic = ("organ", @music, "harp"); • explodes the @music • equivalent to…@moremusic = ("organ", "whistle", "flute", "harp"); • push(@food, "eggs"); • adds the element to the array
Arrays • push two or more items • push(@food, "eggs", "lard"); • push(@food, ("eggs", "lard")); • push(@food, @morefood); • push function returns the length of the new list • pop function • removes the last item from a list and returns it
Arrays • $f = @food; • assigns the length of food to $f • $f = "@food"; • turns array into space delimited string and assigns it to $f
Arrays • Multiple assignments • ($a, $b) = ($c, $d); # Same as $a=$c; $b=$d; • ($a, $b) = @food; # $a and $b are the first #two items of @food • ($a, @somefood) = @food; # $a is the first item of @food.. #@somefood is a list of the # others • (@somefood, $a) = @food; # @somefood is @food and # $a is undefined
Arrays • Finding the last index of an array • $#food • not to be confused with the number of elements • Displaying arrays • print @food; # By itself • print "@food"; # Embedded in double quotes • print @food.""; # In a scalar context
File Handling • Example • $file = '/etc/passwd'; # Name the file • open(INFO, $file); # Open the file • @lines = <INFO>; # Read it into an array • close(INFO); # Close the file • print @lines; # Print the array • Modes • open(INFO, $file); # Open for input • open(INFO, ">$file"); # Open for output • open(INFO, ">>$file"); # Open for appending • open(INFO, "<$file"); # Also open for input
Special Variables • $_ default input • $/ input record separator. OS dependent • $[ index of the first list element • $| Force flushing to file handle if set to true (false is default). • $] Perl version • $0 name of the file containing Perl being run • $^T Time of program start • $, input line number of last file handle read • $ARGV name of current file when using <ARGV> • @ARGV command line arguments • @INC list of directories for do, require and use • %INC files that have been used by do and require • %ENV OS environment variables
File Handling • print something to a file you've already opened for output • print INFO "This line goes to the file.\n"; • open the standard input (usually the keyboard) and standard output (usually the screen) • open(INFO, '-'); # Open standard input • open(INFO, '>-'); # Open standard output
Testing $a == $b # Is $a numerically equal to $b? # Beware: Don't use the = operator. $a != $b # Is $a numerically unequal to $b? $a eq $b # Is $a string-equal to $b? $a ne $b # Is $a string-unequal to $b? You can also use #logical and, or and not: ($a && $b) # Is $a and $b true? ($a || $b) # Is either $a or $b true? !($a) # is $a false? non-zero #’s and non-empty strings are true in Perl
if if ($a) { print "The string is not empty\n"; } else { print "The string is empty\n"; }
if / else if (!$a) # The ! is the not operator { print "The string is empty\n"; } elsif (length($a) == 1) # If above fails, try this { print "The string has one character\n"; } elsif (length($a) == 2) # If that fails, try this {print "The string has two characters\n"; } else # Now, everything has failed { print "The string has lots of characters\n"; }
for for ($i = 0; $i < 10; ++$i) # Start with $i = 1 # Do it while $i < 10 # Increment $i before repeating { print "$i\n"; }
for each foreach $morsel (@food) # Visit each item in turn # and call it $morsel { print "$morsel\n"; # Print the item print "Yum yum\n"; # That was nice }
while / until #!/usr/local/bin/perl print "Password? "; # Ask for input $a = <STDIN>; # Get input chop $a; # Remove the newline at end while ($a ne "fred") # While input is wrong... { print "sorry. Again? "; # Ask again $a = <STDIN>; # Get input again chop $a; # Chop off newline again }
while / until #!/usr/local/bin/perl do { "Password? "; # Ask for input $a = <STDIN>; # Get input chop $a; # Chop off newline } while ($a ne "fred") # Redo while wrong input
Regular Expressions Matching Strings and String Manipulation
Regular Expressions • regular expression is contained in slashes, and matching occurs with the =~ operator • following expression is true if the string the appears in variable $sentence • $sentence =~ /the/ • case sensitive!!! • $sentence !~ /the/ • true if no match found
Regular Expressions • /abc/ • Any string matching this pattern • ?abc? • Only the first occurrence matching this patter
RE Characters and Meanings . # Any single character except a newline ^ # The beginning of the line or string $ # The end of the line or string * # Zero or more of the last character + # One or more of the last character ? # Zero or one of the last character
RE expressions t.e matches the, tre, tle .. does not match te or tale ^f matches f at the beginning of a line ^ftp matches ftp at the beginning of a line e$ matches e at the end of a line tle$ matches tle at the end of a line und* matches un with zero or more d characters.. matches un, und, undd, unddd
RE expressions .* Any string without a newline. This is because the . matches any character except a newline and the * means zero or more of these. ^$ A line with nothing in it. (beginning/end of line.
RE Options [qjk] # Either q or j or k [^qjk] # Neither q nor j nor k [a-z] # Anything from a to z inclusive [^a-z] # No lower case letters [a-zA-Z] # Any letter [a-z]+ # Any non-zero sequence of lower # case letters
RE Expressions The vertical bar “ | “ is used as an “or” operator jelly|cream # Either jelly or cream (eg|le)gs # Either eggs or legs (da)+ # Either da or dada or # dadada or...
Special Characters \n # A newline \t # A tab \w # Any alphanumeric (word) character. # The same as [a-zA-Z0-9_] \W # Any non-word character. # The same as [^a-zA-Z0-9_] \d # Any digit. The same as [0-9] \D # Any non-digit. The same as [^0-9] \s # Any whitespace character: space, # tab, newline, etc \S # Any non-whitespace character \b # A word boundary, outside [] only \B # No word boundary
Special Characters • When you need to match a special character, use the backslash to indicate the character (literal character follows) \| # Vertical bar \[ # An open square bracket \) # A closing parenthesis \* # An asterisk \^ # A carat symbol \/ # A slash \\ # A backslash