710 likes | 722 Views
Learn how to use Perl pattern matching, regular expressions, and file handling to filter and manipulate input data. Improve your program's security and efficiency by verifying input fields and working with string variables.
E N D
Chapter 7 Matching Patterns and Files
Objectives • Use Perl pattern matching and regular expressions to filter input data • Work with files to enable a program to store and retrieve data
Patterns in String Variables • Many programming problems require matching, changing, or manipulating patterns in string variables. • An important use is verifying input fields of a form • helps provide security against accidental or malicious attacks. • For example, if expecting a form field to provide a telephone number as input, your program needs a way to verify that the input comprises a string of seven digits.
Four Different Constructs • Will look at 4 different Perl String manipulation constructs: • Thematch operator enables your program to look for patterns in strings. • Thesubstitute operator enables your program to change patterns in strings. • Thesplit function enables your program to split strings into separate variables based on a pattern. • Regular expressions provide a pattern matching language that can work with these operators and functions to work on string variables.
The Match Operator • The match operator is used to test if a pattern appears in a string. • It is used with the binding operator (“=~”) to see whether a variable contains a particular pattern.
Other Delimiters? • Slash (“/”) is most common match pattern • Others are possible, For example, both use valid match operator syntax: • if ( $name =~ m!Dave! ) { • if ( $name =~ m<Dave> ) { • The reverse binding operator test if pattern is NOT found: if ( $color !~ m/blue/ ) {
The Substitution Operator • Similar to the match operator but also enables you to change the matched string. • Use with the binding operator (“=~”) to test whether a variable contains a pattern
How It Works • Substitutes the first occurrence of the search pattern for the change pattern in the string variable. • For example, the following changes the first occurrence of t to T: $name = “tom turtle”; $name =~ s/t/T/; print “Name=$name”; • The output of this code would be Name=Tom turtle
Changing All Occurrences • You can place a g (for global substitution) at the end of the substitution expression to change all occurrences of the target pattern string in the search string. For example, • $name = “tom turtle”; • $name =~ s/t/T/g; • print “Name=$name”; • The output of this code would be • Name= Tom TurTle
Using Translate • A similar function is called tr (for “translate”). Useful for translating characters from uppercase to lowercase, and vice versa. • The tr function allows you to specify a range of characters to translate from and a range of characters to translate to. : $name="smokeY"; $name =~ tr/[a-z]/[A-Z]/; print "name=$name"; Would output the following Name=SMOKEY
A Full Pattern Matching Example 1. #!/usr/bin/perl 2. use CGI ':standard'; 3. print header, start_html('Command Search'); 4. @PartNums=( 'XX1234', 'XX1892', 'XX9510'); 5. $com=param('command'); 6. $prod=param('uprod'); 7. if ($com eq "ORDER" || $com eq "RETURN") { 8. $prod =~ s/xx/XX/g; # switch xx to XX 9. if ($prod =~ /XX/ ) { 10. foreach $item ( @PartNums ) { 11. if ( $item eq $prod ) { 12. print "VALIDATED command=$com prodnum=$prod"; 13. $found = 1; 14. } 15. } 16. if ( $found != 1 ) { 17. print br,"Sorry Prod Num=$prod NOT FOUND"; 18. } 19. } else { 20. print br, "Sorry that prod num prodnum=$prod looks wrong"; 21. } 22. } else { 23. print br, "Invalid command=$com did not receive ORDER or RETURN"; 24. } 25. print end_html;
Using Regular Expressions • regular expressions to enable programs to more completely match patterns. • They actually make up a small language of special matching operators that can be employed to enhance the Perl string pattern matching.
The Alternation Operator • Alternation operator looks for alternative strings for matching within a pattern. • (That is, you use it to indicate that the program should match one pattern OR the other). The following shows a match statement using the alternation operator (left) and some possible matches based on the contents of $address (right); this pattern matches either com or edu.
Parenthesis For Groupings • You use parentheses within regular expressions to specify groupings. For example, the following matches a$name value of Dave or David.
Special Character Classes • Perl has a special set of character classes for short hand pattern matching • For example consider these two statements if ( $name =~ m/ / ) { if ($name =~ m/\s/ ) {
Setting Specific Patterns w/ Quantifiers • Character quantifiers let you look for very specific patterns • For example, use the dollar sign(“$”) to to match if a string ends with a specified pattern. if ($Name =~ /Jones$/ ) { • Matches “John Jones” but not “Jones is here” would not. Also, “The guilty party is Jones” would matches.
Building Regular Expressions That Work • Regular expressions are very powerful—but they can also be virtually unreadable. • When building one, tart with a simple regular expression and then refine it incrementally. • Build a piece and then test • The following example will build a regular expression for a date checker • dd/mm/yyyy format (for example, 05/05/2002 but not 5/12/01).
Building Regular Expressions That Work 1. Determine the precise field rules. - What is valid input and what is not valid input? • E.g., For a date field, think through the valid and invalid rules for the field. • You might allow 09/09/2002 but not 9/9/2002 or Sep/9/2002. • Work through several examples as follows:
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working • Build a sending form the input field • Build the receiving program that accepts the field. • For example, a first cut receiving program: $date = param(‘udate’); if ( $date =~ m/.+/ ) { print ‘Valid date=’, $date; } else { print ‘Invalid date=’, $date; } Any Sequence of characters
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working 3. Start with the most specific term possible. • For example, slashes must always separate two characters (for the month), followed by two more characters (for the day), followed by four characters (for the year). if ( $date =~ m{../../....} ) { Any 2 characters Any 4 characters Any 2 characters
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working 3. Start with the most specific term possible. 4. Anchor and refine. (Use ^ and $ when possible) • if ( $date =~ m{^\d\d/\d\d/\d\d\d\d$} ) { Starts with 2 digits 2 digits in middle Ends with 4 digits
Building Regular Expressions that Work 1. Determine the precise field rules. 2. Get form and form-handling programs working 3. Start with the most specific term possible. 4. Anchor and refine. (Use ^ and $ when possible) 5. Get more specific if possible. • The first digit of the month can be only 0, 1, 2 or 3. For example, 05/55/2002 is clearly an illegal date. • Only years from this century are allowed. Because we don’t care about dates like 05/05/1999 or 05/05/3003.
Building Regular Expressions that Work • Add these rules below if ( $date =~ m{^\d\d/[0-3]\d/2\d\d\d$} ) { • Now the regular expression recognizes input like 09/99/2001 and 05/05/4000 as illegal. Month starts with a “0-3” Year starts with a “2”
Tip: Regular Expression Special Variables • Perl regexs set several special scalar variables: • $& will be equal to the first matching text • $`will be the textbefore the match, and • $’ will be the text after the first match. $name='*****Marty'; if ( $name =~ m/\w/ ) { print "got match at=$& "; print "B4=$` after=$'"; } else { print "Not match"; } • would output:got match at=M B4=***** after=arty
Full Example Program 1. #!/usr/bin/perl 2. use CGI ':standard'; 3. print header, start_html('Date Check'); 4. $date=param('udate'); 5. if ( $date =~ m{^\d\d/[0-3]\d/2\d\d\d$} ) { 6. print 'Valid date=', $date; 7. } else { 8. print 'Invalid date=', $date; 9.} 10. print end_html;
The Split Function • split() breaks a string into different pieces based on a field separator. 2 arguments: • a pattern to match (which can contain regular expressions) • and a string variable to split. (into as many pieces as there are matches for the pattern)
split() Example $line = “Please , pass thepepper”; @result = split( /\s+/, $line ); • Sets listt variable $result with the following: $result[0] = “Please”; $result[1] = “,” $result[2] = “pass”; $result[3] = “thepepper”; 1 or more spaces Variable to split Results into a list
Another split() Example • Another split() example: $line = “Baseball, hot dogs, apple pie”; @newline = split( /,/, $line ); print “newline= @newline”; • These lines will have the following output: • newline= Baseball hot dogs apple pie
The Split Function • When you know how many matches to expect: $line = “AA1234:Hammer:122:12”; ($partno, $part, $num, $cost) = split( /:/, $line ); print “partno= $partno part=$part num=$num cost=$cost”; • Would output the following: partno= AA1234 part=Hammer num=122 cost=12
A Program with split() 1. #!/usr/bin/perl 2. use CGI ':standard'; 3. print header, start_html('Date Check'); 4. $date=param('udate'); 5. if ( $date =~ m{^\d\d/[0-3]\d/2\d\d\d$} ) { 6. print 'OK from REG EXP date=', $date; 7. ($mon, $day, $year) = split( /\//, $date ); 8. if ( $mon >= 1 && $mon <= 12 ) { 9. if ( $day <= 31 ) { 10. print " Valid date mon=$mon day=$day year=$year"; 11. } else { 12. print " Illegal day specifed day=$day"; 13. } 14. } else { 15. print " Illegal month specifed mon=$mon"; 16. } 17. } else { 18. print 'Invalid date=', $date; 19. }
Working with Files • So far programs cannot store data values in-between times when they are started. • Working with files enable programs to store data, which can then be used at some future time. • Will describe ways to work with files in CGI/Perl programs, including • opening files, • closing files, • and reading from and writing to files
Using the open() Function • Use to connect a program to a physical file on a Web server. It has the following format: • file handle - Starts with a letter or number—not with “$”, “@”, or “%”. (Specify in all capital letters. (By Perl convention.) • filename - name of file to connect to. If resides in the same file system directory then just specify the filename (and not the entire full file path).
More On open() function • open() returns 1 (true) when it successfully opens and returns 0 (false) when this attempt fails. • A common way to use open() $infile = “mydata.txt”; open (INFILE, $infile ) || die “Cannot open $infile : $!”; Connect to mydata.txt. Execute die only when open fails Output system message
Sending System Messages to Browser • Many web servers direct die messages to their Web server logfiles. • The end user might still receive a generic Internal Server Errormessage. • CGI::Carp Perl module can re-direct messages. • Add the following line at the beginning of your program (after the use CGI ':standard'; line): use CGI::Carp "fatalsToBrowser"; • It will forward the message to the browser
Using the File Handle to Read Files • Use the file handle to refer to the file once opened • Combine with the file handle with the file input operator (“<>”) to read a file into your program. • For example, the following opens a file and then outputs the first and third lines of the file. • Program uses indata.txt containing: Apples are red Bananas are yellow Carrots are orange Dates are brown
Example Program $infile="mydata.txt"; open (INFILE, $infile ) || die "Cannot open $infile: $!"; @infile = <INFILE>; print $infile[0]; print $infile[2]; close (INFILE); • Then the output of this program would be Apples are red Carrots are orange