Introduction to Perl Part II

Introduction to PerlPart II By: Cédric Notredame (Adapted from BT McInnes)

Passing Arguments To Your Program

Command Line Arguments • Command line arguments in Perl are extremely easy. • @ARGV is the array that holds all arguments passed in from the command line. • Example: • ./prog.pl arg1 arg2 arg3 • @ARGV would contain ('arg1', ‘arg2', 'arg3’) • $#ARGV returns the number of command line arguments that have been passed. • Remember $#array is the size of the array!

Reading/Writing Files

File Handlers • Opening a File: open (SRC, “my_file.txt”); • Reading from a File $line = <SRC>; # reads upto a newline character • Closing a File close (SRC);

File Handlers cont... • Opening a file for output: open (DST, “>my_file.txt”); • Opening a file for appending open (DST, “>>my_file.txt”); • Writing to a file: print DST “Printing my first line.\n”; • Safeguarding against opening a non existent file open (SRC, “file.txt”) || die “Could not open file.\n”;

File Test Operators • Check to see if a file exists: if ( -e “file.txt”) { # The file exists! } • Other file test operators: -r readable -x executable -d is a directory -T is a text file

Quick Program with File Handles • Program to copy a file to a destination file #!/usr/bin/perl -w open(SRC, “file.txt”) || die “Could not open source file.\n”; open(DST, “>newfile.txt”); while ( $line = <SRC> ) { print DST $line; } close SRC; close DST;

Some Default File Handles • STDIN : Standard Input $line = <STDIN>; # takes input from stdin • STDOUT : Standard output print STDOUT “File handling in Perl is sweet!\n”; • STDERR : Standard Error print STDERR “Error!!\n”;

The <> File Handle • The “empty” file handle takes the command line file(s) or STDIN; • $line = <>; • If program is run ./prog.pl file.txt, this will automatically open file.txt and read the first line. • If program is run ./prog.pl file1.txt file2.txt, this will first read in file1.txt and then file2.txt ... you will not know when one ends and the other begins.

The <> File Handle cont... • If program is run ./prog.pl, the program will wait for you to enter text at the prompt, and will continue until you enter the EOF character • CTRL-D in UNIX

Example Program with STDIN • Suppose you want to determine if you are one of the three stooges #!/usr/local/bin/perl %stooges = (larry => 1, moe => 1, curly => 1 ); print “Enter your name: ? “; $name = <STDIN>; chomp $name; if($stooges{ lc($name) }) { print “You are one of the Three Stooges!!\n”; } else { print “Sorry, you are not a Stooge!!\n”; }

Combining File Content Given The two Following Files: File1.txt 1 2 3 And File2.txt a b c Write a program that takes the two files as arguments and outputs a third file that looks like: File3.txt 1 a 2 b 3 Tip: ./mix_files File1.txt File2.txt File3.txt

Combining File Content #! /usr/bin/perl open (F, “$ARGV[0]); open (G, “$ARGV[1]); open (H, “>$ARGV[2]); while ( defined (F) && defined (G) && ($l1=<F>) && ($l2=<G>)) { print H “$l1$l2”; } close (F); close (G); close (H);

Chomp and Chop • Chomp : function that deletes a trailing newline from the end of a string. • $line = “this is the first line of text\n”; • chomp $line; # removes the new line character • print $line; # prints “this is the first line of # text” without returning • Chop : function that chops off the last character of a string. • $line = “this is the first line of text”; • chop $line; • print $line; #prints “this is the first line of tex”

Matching Regular Expressions

Regular Expressions • What are Regular Expressions .. a few definitions. • Specifies a class of strings that belong to the formal / regular languages defined by regular expressions • In other words, a formula for matching strings that follow a specified pattern. • Some things you can do with regular expressions • Parse the text • Add and/or replace subsections of text • Remove pieces of the text

Regular Expressions cont.. • A regular expression characterizes a regular language • Examples in UNIX: • ls *.c • Lists all the files in the current directory that are postfixed '.c' • ls *.txt • Lists all the files in the current directory that are postfixed '.txt'

Simple Example for ... ? Clarity • In the simplest form, a regular expression is a string of characters that you are looking for • We want to find all the words that contain the string 'ing' in our text. • The regular expression we would use : /ing/

The Match Operator • What would are program then look like: if($word=~m/ing/) { print “$word\n”;}

Exercise: • Download any text you wish from the internet and count all the words in “ing” it contains… • wget “http://www.trinity.edu/~mkearl/family.html”

Exercise: #!/usr/local/bin/perl while(<>) { chomp; @words = split/ /; foreach $word(@words) { if($word=~m/ing/) { print “$word\n”;$ing++; } } } print “$ing Words in ing\n”;

Regular Expressions Types • Regular expressions are composed of two types of characters: • Literals • Normal text characters • Like what we saw in the previous program ( /ing/ ) • Metacharacters • special characters • Add a great deal of flexibility to your search

Metacharacters • Match more than just characters • Match line position • ^ start of a line ( carat ) • $ end of a line ( dollar sign ) • Match any characters in a list : [ ... ] • Example : • /[Bb]ridget/ matches Bridget or bridget • /Mc[Ii]nnes/ matches McInnes or Mcinnes

Our Simple Example Revisited • Now suppose we only want to match words that end in 'ing' rather than just contain 'ing'. • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing / • New Regular Expression: $word=~m/ ing$ /

Ranges of Regular Expressions • Ranges can be specified in Regular Expressions • Valid Ranges • [A-Z] Upper Case Roman Alphabet • [a-z] Lower Case Roman Alphabet • [A-Za-z] Upper or Lower Case Roman Alphabet • [A-F] Upper Case A through F Roman Characters • [A-z] Valid but be careful • Invalid Ranges • [a-Z] Not Valid • [F-A] Not Valid

Ranges cont ... • Ranges of Digits can also be specified • [0-9] Valid • [9-0] Invalid • Negating Ranges • / [^0-9] / • Match anything except a digit • / [^a] / • Match anything except an a • / ^[^A-Z] / • Match anything that starts with something other than a single upper case letter • First ^ : start of line • Second ^ : negation

Our Simple Example Again • Now suppose we want to create a list of all the words in our text that do not end in 'ing' • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing$ / • New Regular Expression: !($word=~m/ (ing)$ /)

Matching Interogations • $string=~/([^.?]+\?)/ • $string=~/[.?]([A-Z0-9][^.?]+\?)/ • $string=~/([\w\s]+\?)/

Removing HTML Tags • $string=~s/\<[^>]+\>/ /g • g: substitute EVERY instance

Literal Metacharacters • Suppose that you actually want to look for all strings that equal ‘$' in your text • Use the \ symbol • / \$ / Regular expression to search for • What does the following Regular Expressions Match? / [ ABCDEFGHIJKLMNOP$] \$/ / [ A-P$ ] \$ / • Matches any line that contains ( A-P or $) followed by $

Patterns provided in Perl • Some Patterns • \d [ 0 – 9 ] • \w [a – z A – Z 0 – 9_] • \s [ \r \t \n \f ] (white space pattern) • \D [^ 0 - 9] • \W [^ a – z A – Z 0 – 9_] • \S [^ \r \t \n \f] • Example : ( 19\d\d ) • Looks for any year in the 1900's

Using Patterns in our Example • Commonly words are not separated by just a single space but by tabs, returns, ect... • Let's modify our split function to incorporate multiple white space #!/usr/local/bin/perl while(<>) { chomp; @words = split/\s+/, $_; foreach $word(@words) { if($word=~m/ing$/) { print “$word\n”; } }

Word Boundary Metacharacter • Regular Expression to match the start or the end of a 'word' : \b • Examples: • / Jeff\b / Match Jeff but not Jefferson • / Carol\b / Match Carol but not Caroline • / Rollin\b / Match Rollin but not Rolling • /\bform / Match form or formation but not Information • /\bform\b/ Match form but neither information nor formation

DOT Metacharacter • The DOT Metacharacter, '.' symbolizes any character except a new line • / b . bble/ • Would possibly return : bobble, babble, bubble • / . oat/ • Would possibly return : boat, coat, goat • Note: remember '.*' usually means a bunch of anything, this can be handy but also can have hidden ramifications.

PIPE Metacharacter • The PIPE Metacharacter is used for alternation • / Bridget (Thomson | McInnes) / • Match Bridget Thomson or Bridget McInnes but NOT Bridget Thomson McInnes • / B | bridget / • Match B or bridget • / ^( B | b ) ridget / • Match Bridget or bridget at the beginning of a line

Our Simple Example • Now with our example, suppose that we want to not only get all words that end in 'ing' but also 'ed'. • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing / • New Regular Expression: $word=~m/ (ing|ed)/

The ? Metacharacter • The metacharacter, ?, indicates that the character immediately preceding it occurs zero or one time • Examples: • / worl?ds / • Match either 'worlds' or 'words' • / m?ethane / • Match either 'methane' or 'ethane'

The * Metacharacter • The metacharacter, *, indicates that the character immediately preceding it occurs zero or more times • Example : • / ab*c/ Match 'ac', 'abc', 'abbc', 'abbbc' ect... • Matches any string that starts with an a, if possibly followed by a sequence of b's and ends with a c. • Sometimes called Kleene's star

Our Simple Example again • Now suppose we want to create a list of all the words in our text that end in 'ing' or 'ings' • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing$ / • New Regular Expression: $word=~m/ ings?$ /

Exercise • For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc.? • 1) the quick brown fox jumped over the lazy dog • 2) The Sea! The Sea! • 3) (.+)\s*\1 • 4) 9780471975632 • 5) C:\DOS\PATH\NAME • 1) /[a-z]/ • 2) /(\W+)/ • 3) /\W*/ • 4) /^\w+$/ • 5) /[^\w+$]/ • 6) /\d/ • 7) /(.+)\s*\1/ • 8) /((.+)\s*\1)/ • 9) /(.+)\s*((\1))/ • 11) /\DOS/ • 12) /\\DOS/ • 13) /\\\DOS/

Exercise • For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc.? • 1) the quick brown fox jumped over the lazy dog 1,2,3,5 • 2) The Sea! The Sea! 1,2,3,5,7,9 • 3) (.+)\s*\1 1,2,3, 5, 6 • 4) 9780471975632 3,4,6 • 5) C:\DOS\PATH\NAME 2,3,5,10,11,12 • 1) /[a-z]/ 1,2,3 • 2) /(\W+)/ 1,2,3,5 • 3) /\W*/ 1,2,3,5 • 4) /^\w+$/ 4 • 5) /[^\w+$]/ 1,2,3,5 • 6) /\d/ 3,4 • 7) /(.+)\s*\1/ 2, • 8) /((.+)\s*\1)/ • 9) /(.+)\s*((\1))/ 2 • 10) /\DOS/ 5 • 11) /\\DOS/ 5 • 12) /\\\DOS/ 5

Modifying Text With Regular Expressions

Modifying Text • Match • Up to this point, we have seen attempt to match a given regular expression • Example : $variable =~m/ regex / • Substitution • Takes match one step further : if there is a match, then replace it with the given string • Example : $variable =~s/ regex / replacement/ $var =~ s/ Cedric / Notredame /g; $var =~ s/ing/ed /;

Substitution Example • Suppose when we find all our words that end in 'ing' we want to replace the 'ing' with 'ed'. #!/usr/local/bin/perl -w while(<>) { chomp $_; @words = split/ \s+/, $_; foreach $word(@words) { if($word=~s/ing$/ed/) { print “$word\n”; } } }

Special Variable Modified by a Match • $target=“I have 25 apples”  $target=~/(\d+)/ • $& => 25 • Copy of text matched by the regex • $' =>”I have “ • A copy of the target text until the first match • $` => “ apples” • A copy of the target text after the last match • $1, $2, $3, ect $1=25 • The text matched by 1st, 2nd, ect., set of parentheses. Note : $0 is not included here • $+ • A copy of the highest numbered $1, $2, $3, ect..

Our Simple Example once again • Now lets revise our program to find all the words that end in 'ing' without splitting our line of text into an array of words #!/usr/local/bin/perl -w while(<>) { chomp $_; if($_=~/([A-Za-z]*ing\b)/g) { print "$&\n"; } }

Example #!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/^([A-Za-z+\s]*)\bcrave\b([\sA-Za-z]+)/) { print “$1\n”; print “$2\n”; } • Run Program with string : I crave to rule the world! • Results: • “I “ • to rule the world!

Example #!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/\bcrave\b/) { print “$`\n”; print “$&\n”; print “$’\n”; } • Run Program with string : I crave to rule the world! • Results: • I • crave • to rule the world!

Thank you 

Introduction to Perl Part II

Introduction to Perl Part II

Presentation Transcript

Introduction to Perl

An Introduction to Perl Part 3

Introduction to Perl

Introduction to Perl Part I

Introduction to Part II

Introduction to Perl

Introduction to Perl

Introduction to PERL

Introduction to Perl Part II

Introduction to Perl

Introduction to Perl

Introduction to Perl

Introduction to Perl Part III

Introduction to perl

Introduction to Perl

Introduction to Perl

Introduction to Perl

An Introduction to Perl Part 2

An Introduction to Perl – Part II

An Introduction to Perl Part 3