710 likes | 879 Views
Chapter 7 Matching Patterns and Files. Objectives. Use Perl pattern matching and regular expressions to filter input data Work with files to enable a program to store and retrieve data. Patterns in String Variables.
E N D
Chapter 7 Matching Patterns and Files
Objectives • Use Perl pattern matching and regular expressions to filter input data • Work with files to enable a program to store and retrieve data
Patterns in String Variables • Many programming problems require matching, changing, or manipulating patterns in string variables. • An important use is verifying input fields of a form • helps provide security against accidental or malicious attacks. • For example, if expecting a form field to provide a telephone number as input, your program needs a way to verify that the input comprises a string of seven digits.
Four Different Constructs • Will look at 4 different Perl String manipulation constructs: • Thematch operator enables your program to look for patterns in strings. • Thesubstitute operator enables your program to change patterns in strings. • Thesplit function enables your program to split strings into separate variables based on a pattern. • Regular expressions provide a pattern matching language that can work with these operators and functions to work on string variables.
The Match Operator • The match operator is used to test if a pattern appears in a string. • It is used with the binding operator (“=~”) to see whether a variable contains a particular pattern.
Other Delimiters? • Slash (“/”) is most common match pattern • Others are possible, For example, both use valid match operator syntax: • if ( $name =~ m!Dave! ) { • if ( $name =~ m<Dave> ) { • The reverse binding operator test if pattern is NOT found: if ( $color !~ m/blue/ ) {
The Substitution Operator • Similar to the match operator but also enables you to change the matched string. • Use with the binding operator (“=~”) to test whether a variable contains a pattern
How It Works • Substitutes the first occurrence of the search pattern for the change pattern in the string variable. • For example, the following changes the first occurrence of t to T: $name = “tom turtle”; $name =~ s/t/T/; print “Name=$name”; • The output of this code would be Name=Tom turtle
Changing All Occurrences • You can place a g (for global substitution) at the end of the substitution expression to change all occurrences of the target pattern string in the search string. For example, • $name = “tom turtle”; • $name =~ s/t/T/g; • print “Name=$name”; • The output of this code would be • Name= Tom TurTle
Using Translate • A similar function is called tr (for “translate”). Useful for translating characters from uppercase to lowercase, and vice versa. • The tr function allows you to specify a range of characters to translate from and a range of characters to translate to. : $name="smokeY"; $name =~ tr/[a-z]/[A-Z]/; print "name=$name"; Would output the following Name=SMOKEY
A Full Pattern Matching Example 1. #!/usr/bin/perl 2. use CGI ':standard'; 3. print header, start_html('Command Search'); 4. @PartNums=( 'XX1234', 'XX1892', 'XX9510'); 5. $com=param('command'); 6. $prod=param('uprod'); 7. if ($com eq "ORDER" || $com eq "RETURN") { 8. $prod =~ s/xx/XX/g; # switch xx to XX 9. if ($prod =~ /XX/ ) { 10. foreach $item ( @PartNums ) { 11. if ( $item eq $prod ) { 12. print "VALIDATED command=$com prodnum=$prod"; 13. $found = 1; 14. } 15. } 16. if ( $found != 1 ) { 17. print br,"Sorry Prod Num=$prod NOT FOUND"; 18. } 19. } else { 20. print br, "Sorry that prod num prodnum=$prod looks wrong"; 21. } 22. } else { 23. print br, "Invalid command=$com did not receive ORDER or RETURN"; 24. } 25. print end_html;
Using Regular Expressions • regular expressions to enable programs to more completely match patterns. • They actually make up a small language of special matching operators that can be employed to enhance the Perl string pattern matching.
The Alternation Operator • Alternation operator looks for alternative strings for matching within a pattern. • (That is, you use it to indicate that the program should match one pattern OR the other). The following shows a match statement using the alternation operator (left) and some possible matches based on the contents of $address (right); this pattern matches either com or edu.
Parenthesis For Groupings • You use parentheses within regular expressions to specify groupings. For example, the following matches a$name value of Dave or David.
Special Character Classes • Perl has a special set of character classes for short hand pattern matching • For example consider these two statements if ( $name =~ m/ / ) { if ($name =~ m/\s/ ) {
Setting Specific Patterns w/ Quantifiers • Character quantifiers let you look for very specific patterns • For example, use the dollar sign(“$”) to to match if a string ends with a specified pattern. if ($Name =~ /Jones$/ ) { • Matches “John Jones” but not “Jones is here” would not. Also, “The guilty party is Jones” would matches.
Building Regular Expressions That Work • Regular expressions are very powerful—but they can also be virtually unreadable. • When building one, tart with a simple regular expression and then refine it incrementally. • Build a piece and then test • The following example will build a regular expression for a date checker • dd/mm/yyyy format (for example, 05/05/2002 but not 5/12/01).
Building Regular Expressions That Work 1. Determine the precise field rules. - What is valid input and what is not valid input? • E.g., For a date field, think through the valid and invalid rules for the field. • You might allow 09/09/2002 but not 9/9/2002 or Sep/9/2002. • Work through several examples as follows:
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working • Build a sending form the input field • Build the receiving program that accepts the field. • For example, a first cut receiving program: $date = param(‘udate’); if ( $date =~ m/.+/ ) { print ‘Valid date=’, $date; } else { print ‘Invalid date=’, $date; } Any Sequence of characters
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working 3. Start with the most specific term possible. • For example, slashes must always separate two characters (for the month), followed by two more characters (for the day), followed by four characters (for the year). if ( $date =~ m{../../....} ) { Any 2 characters Any 4 characters Any 2 characters
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working 3. Start with the most specific term possible. 4. Anchor and refine. (Use ^ and $ when possible) • if ( $date =~ m{^\d\d/\d\d/\d\d\d\d$} ) { Starts with 2 digits 2 digits in middle Ends with 4 digits
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working 3. Start with the most specific term possible. 4. Anchor and refine. (Use ^ and $ when possible) 5. Get more specific if possible. • The first digit of the month can be only 0, 1, 2 or 3. For example, 05/55/2002 is clearly an illegal date. • Only years from this century are allowed. Because we don’t care about dates like 05/05/1999 or 05/05/3003.
Building Regular Expressions that Work • Add these rules below if ( $date =~ m{^\d\d/[0-3]\d/2\d\d\d$} ) { • Now the regular expression recognizes input like 09/99/2001 and 05/05/4000 as illegal. Month starts with a “0-3” Year starts with a “2”
Tip: Regular Expression Special Variables • Perl regexs set several special scalar variables: • $& will be equal to the first matching text • $`will be the textbefore the match, and • $’ will be the text after the first match. $name='*****Marty'; if ( $name =~ m/\w/ ) { print "got match at=$& "; print "B4=$` after=$'"; } else { print "Not match"; } • would output:got match at=M B4=***** after=arty
Full Example Program 1. #!/usr/bin/perl 2. use CGI ':standard'; 3. print header, start_html('Date Check'); 4. $date=param('udate'); 5. if ( $date =~ m{^\d\d/[0-3]\d/2\d\d\d$} ) { 6. print 'Valid date=', $date; 7. } else { 8. print 'Invalid date=', $date; 9.} 10. print end_html;
The Split Function • split() breaks a string into different pieces based on a field separator. 2 arguments: • a pattern to match (which can contain regular expressions) • and a string variable to split. (into as many pieces as there are matches for the pattern)
split() Example $line = “Please , pass thepepper”; @result = split( /\s+/, $line ); • Sets listt variable $result with the following: $result[0] = “Please”; $result[1] = “,” $result[2] = “pass”; $result[3] = “thepepper”; 1 or more spaces Variable to split Results into a list
Another split() Example • Another split() example: $line = “Baseball, hot dogs, apple pie”; @newline = split( /,/, $line ); print “newline= @newline”; • These lines will have the following output: • newline= Baseball hot dogs apple pie
The Split Function • When you know how many matches to expect: $line = “AA1234:Hammer:122:12”; ($partno, $part, $num, $cost) = split( /:/, $line ); print “partno= $partno part=$part num=$num cost=$cost”; • Would output the following: partno= AA1234 part=Hammer num=122 cost=12
A Program with split() 1. #!/usr/bin/perl 2. use CGI ':standard'; 3. print header, start_html('Date Check'); 4. $date=param('udate'); 5. if ( $date =~ m{^\d\d/[0-3]\d/2\d\d\d$} ) { 6. print 'OK from REG EXP date=', $date; 7. ($mon, $day, $year) = split( /\//, $date ); 8. if ( $mon >= 1 && $mon <= 12 ) { 9. if ( $day <= 31 ) { 10. print " Valid date mon=$mon day=$day year=$year"; 11. } else { 12. print " Illegal day specifed day=$day"; 13. } 14. } else { 15. print " Illegal month specifed mon=$mon"; 16. } 17. } else { 18. print 'Invalid date=', $date; 19. }
Working with Files • So far programs cannot store data values in-between times when they are started. • Working with files enable programs to store data, which can then be used at some future time. • Will describe ways to work with files in CGI/Perl programs, including • opening files, • closing files, • and reading from and writing to files
Using the open() Function • Use to connect a program to a physical file on a Web server. It has the following format: • file handle - Starts with a letter or number—not with “$”, “@”, or “%”. (Specify in all capital letters. (By Perl convention.) • filename - name of file to connect to. If resides in the same file system directory then just specify the filename (and not the entire full file path).
More On open() function • open() returns 1 (true) when it successfully opens and returns 0 (false) when this attempt fails. • A common way to use open() $infile = “mydata.txt”; open (INFILE, $infile ) || die “Cannot open $infile : $!”; Connect to mydata.txt. Execute die only when open fails Output system message
Sending System Messages to Browser • Many web servers direct die messages to their Web server logfiles. • The end user might still receive a generic Internal Server Errormessage. • CGI::Carp Perl module can re-direct messages. • Add the following line at the beginning of your program (after the use CGI ':standard'; line): use CGI::Carp "fatalsToBrowser"; • It will forward the message to the browser
Using the File Handle to Read Files • Use the file handle to refer to the file once opened • Combine with the file handle with the file input operator (“<>”) to read a file into your program. • For example, the following opens a file and then outputs the first and third lines of the file. • Program uses indata.txt containing: Apples are red Bananas are yellow Carrots are orange Dates are brown
Example Program $infile="mydata.txt"; open (INFILE, $infile ) || die "Cannot open $infile: $!"; @infile = <INFILE>; print $infile[0]; print $infile[2]; close (INFILE); • Then the output of this program would be Apples are red Carrots are orange