370 likes | 466 Views
Introduction to Perl Part II. By: Bridget Thomson McInnes 22 January 2004. File Handlers. Very simple compared to C/ C++ !!! Are not prefixed with a symbol ($, @, %, ect) Opening a File: open (SRC, “my_file.txt”); Reading from a File $line = <SRC>; # reads upto a newline character
E N D
Introduction to PerlPart II By: Bridget Thomson McInnes 22 January 2004
File Handlers • Very simple compared to C/ C++ !!! • Are not prefixed with a symbol ($, @, %, ect) • Opening a File: open (SRC, “my_file.txt”); • Reading from a File $line = <SRC>; # reads upto a newline character • Closing a File close (SRC);
File Handlers cont... • Opening a file for output: open (DST, “>my_file.txt”); • Opening a file for appending open (DST, “>>my_file.txt”); • Writing to a file: print DST “Printing my first line.\n”; • Safeguarding against opening a non existent file open (SRC, “file.txt”) || die “Could not open file.\n”;
File Test Operators • Check to see if a file exists: if ( -e “file.txt”) { # The file exists! } • Other file test operators: -r readable -x executable -d is a directory -T is a text file
Quick Program with File Handles • Program to copy a file to a destination file #!/usr/local/bin/perl -w open(SRC, “file.txt”) || die “Could not open source file.\n”; open(DST< “>newfile.txt”); while ( $line = <SRC> ) { print DST $line; } close SRC; close DST;
Some Default File Handles • STDIN : Standard Input $line = <STDIN>; # takes input from stdin • STDOUT : Standard output print STDOUT “File handling in Perl is sweet!\n”; • STDERR : Standard Error print STDERR “Error!!\n”;
The <> File Handle • The “empty” file handle takes the command line file(s) or STDIN; • $line = <>; • If program is run ./prog.pl file.txt, this will automatically open file.txt and read the first line. • If program is run ./prog.pl file1.txt file2.txt, this will first read in file1.txt and then file2.txt ... you will not know when one ends and the other begins.
The <> File Handle cont... • If program is run ./prog.pl, the program will wait for you to enter text at the prompt, and will continue until you enter the EOF character • CTRL-D in UNIX
Example Program with STDIN • Suppose you want to determine if you are one of the three stooges #!/usr/local/bin/perl %stooges = (larry => 1, moe => 1, curly => 1 ); print “Enter your name: ? “; $name = <STDIN>; chomp $name; if($stooges{lc($name)}) { print “You are one of the Three Stooges!!\n”; } else { print “Sorry, you are not a Stooge!!\n”; }
Chomp and Chop • Chomp : function that deletes a trailing newline from the end of a string. • $line = “this is the first line of text\n”; • chomp $line; # removes the new line character • print $line; # prints “this is the first line of # text” without returning • Chop : function that chops off the last character of a string. • $line = “this is the first line of text”; • chop $line; • print $line; #prints “this is the first line of tex”
Regular Expressions • What are Regular Expressions .. a few definitions. • Specifies a class of strings that belong to the formal / regular languages defined by regular expressions • In other words, a formula for matching strings that follow a specified pattern. • Some things you can do with regular expressions • Parse the text • Add and/or replace subsections of text • Remove pieces of the text
Regular Expressions cont.. • A regular expression characterizes a regular language • Examples in UNIX: • ls *.c • Lists all the files in the current directory that are postfixed '.c' • ls *.txt • Lists all the files in the current directory that are postfixed '.txt'
Simple Example for ... ? Clarity • In the simplest form, a regular expression is a string of characters that you are looking for • We want to find all the words that contain the string 'ing' in our text. • The regular expression we would use : /ing/
Simple Example cont... • What would are program then look like: #!/usr/local/bin/perl while(<>) { chomp; @words = split/ /; foreach $word(@words) { if($word=~m/ing/) { print “$word\n”; } } }
Regular Expressions Types • Regular expressions are composed of two types of characters: • Literals • Normal text characters • Like what we saw in the previous program ( /ing/ ) • Metacharacters • special characters • Add a great deal of flexibility to your search
Metacharacters • Match more than just characters • Match line position • ^ start of a line ( carat ) • $ end of a line ( dollar sign ) • Match any characters in a list : [ ... ] • Example : • /[Bb]ridget/ matches Bridget or bridget • /Mc[Ii]nnes/ matches McInnes or Mcinnes
Our Simple Example Revisited • Now suppose we only want to match words that end in 'ing' rather than just contain 'ing'. • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing / • New Regular Expression: $word=~m/ ing$ /
Ranges of Regular Expressions • Ranges can be specified in Regular Expressions • Valid Ranges • [A-Z] Upper Case Roman Alphabet • [a-z] Lower Case Roman Alphabet • [A-Za-z] Upper or Lower Case Roman Alphabet • [A-F] Upper Case A through F Roman Characters • [A-z] Valid but be careful • Invalid Ranges • [a-Z] Not Valid • [F-A] Not Valid
Ranges cont ... • Ranges of Digits can also be specified • [0-9] Valid • [9-0] Invalid • Negating Ranges • / [^0-9] / • Match anything except a digit • / ^a / • Match anything except an a • / ^[^A-Z] / • Match anything that starts with something other than a single upper case letter • First ^ : start of line • Second ^ : negation
Our Simple Example Again • Now suppose we want to create a list of all the words in our text that do not end in 'ing' • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing$ / • New Regular Expression: $word=~m/ [^ ing]$ /
Literal Metacharacters • Suppose that you actually want to look for all strings that equal '^' in your text • Use the \ symbol • / \^ / Regular expression to search for • What does the following Regular Expressions Match? / [ A - Z ^ ] ^ / • Matches any line that contains ( A-Z or ^) followed by ^
Patterns provided in Perl • Some Patterns • \d [ 0 – 9 ] • \w [a – z A – z 0 – 9 _ ] • \s [ \r \t \n \f ] (white space pattern) • \D [^ 0 - 9] • \W [^ a – z A – Z 0 – 9 ] • \S [^ \r \t \n \f] • Example : [ 19\d\d ] • Looks for any year in the 1900's
Using Patterns in our Example • Commonly words are not separated by just a single space but by tabs, returns, ect... • Let's modify our split function to incorporate multiple white space #!/usr/local/bin/perl while(<>) { chomp; @words = split/\s+/, $_; foreach $word(@words) { if($word=~m/ing/) { print “$word\n”; } }
Word Boundary Metacharacter • Regular Expression to match the start or the end of a 'word' : \b • Examples: • / Jeff\b / Match Jeff but not Jefferson • / Carol\b / Match Chris but not Caroline • / Rollin\b / Match Rollin but not Rolling • /\bform / Match form or formation but not Information • /\bform\b/ Match form but neither information nor formation
DOT Metacharacter • The DOT Metacharacter, '.' symbolizes any character except a new line • / b . bble/ • Would possibly return : bobble, babble, bubble • / . oat/ • Would possibly return : boat, coat, goat • Note: remember '.*' usually means a bunch of anything, this can be handy but also can have hidden ramifications.
PIPE Metacharacter • The PIPE Metacharacter is used for alternation • / Bridget (Thomson | McInnes) / • Match Bridget Thomson or Bridget McInnes but NOT Bridget Thomson McInnes • / B | bridget / • Match B or bridget • / ^( B | b ) ridget / • Match Bridget or bridget at the beginning of a line
Our Simple Example • Now with our example, suppose that we want to not only get all words that end in 'ing' but also 'ed'. • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing$ / • New Regular Expression: $word=~m/ (ing|ed)$ /
The ? Metacharacter • The metacharacter, ?, indicates that the character immediately preceding it occurs zero or one time • Examples: • / worl?ds / • Match either 'worlds' or 'words' • / m?ethane / • Match either 'methane' or 'ethane'
The * Metacharacter • The metacharacter, *, indicates that the characterer immediately preceding it occurs zero or more times • Example : • / ab*c/ Match 'ac', 'abc', 'abbc', 'abbbc' ect... • Matches any string that starts with an a, if possibly followed by a sequence of b's and ends with a c. • Sometimes called Kleene's star
Our Simple Example again • Now suppose we want to create a list of all the words in our text that end in 'ing' or 'ings' • How would we change are regular expressions to accomplish this: • Previous Regular Expression: $word =~m/ ing$ / • New Regular Expression: $word=~m/ ings?$ /
Modifying Text • Match • Up to this point, we have seen attempt to match a given regular expression • Example : $variable =~m/ regex / • Substitution • Takes match one step further : if there is a match, then replace it with the given string • Example : $variable =~s/ regex / replacement $var =~ / Thomson / McInnes /; $var =~ / Bridgette / Bridget /;
Substitution Example • Suppose when we find all our words that end in 'ing' we want to replace the 'ing' with 'ed'. #!/usr/local/bin/perl -w while(<>) { chomp $_; @words = split/ \s+/, $_; foreach $word(@words) { if($word=~s/ing$/ed/) { print “$word\n”; } } }
Special Variable Modified by a Match • $& • Copy of text matched by the regex • $' • A copy of the target text in from of the match • $` • A copy of the target text after the match • $1, $2, $3, ect • The text matched by 1st, 2nd, ect., set of parentheses. Note : $0 is not included here • $+ • A copy of the highest numbered $1, $2, $3, ect..
Our Simple Example once again • Now lets revise are program to find all the words that end in 'ing' without splitting our line of text into an array of words #!/usr/local/bin/perl -w while(<>) { chomp $_; if($_=~/([A-Za-z]*ing\b)/) { print "$&\n"; } }
Example #!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/^([A-Za-z+\s)*\bcrave\b(\s[A-Za-z]+)*/) { print “$1\n”; print “$2\n”; } • Run Program with string : I crave to rule the world! • Results: • I • to rule the world!
Example #!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/\bcrave\b/) { print “$`\n”; print “$&\n”; print “$’\n”; } • Run Program with string : I crave to rule the world! • Results: • I • crave • to rule the world!