150 likes | 270 Views
Regular Expressions. CISC/QCSE 810. Recognizing Matching Strings. ls *.exe translates to "any set of characters, followed by the exact string ".exe" The "*.exe" is a regular expression ls gets a list of all files, and then only returns those that match the expression "*.exe". In Perl.
E N D
Regular Expressions CISC/QCSE 810
Recognizing Matching Strings • ls *.exe • translates to "any set of characters, followed by the exact string ".exe" • The "*.exe" is a regular expression • ls gets a list of all files, and then only returns those that match the expression "*.exe"
In Perl • In Perl, can see if strings match using the =~ operator $s = "Cat In the Hat"; if ($s =~ /Cat/) { print "Matches Cat"; } if ($s =~ /Chat/) { print "Matches Chat"; }
Exercise 1 • Write a regexp that matches only on Canadian postal codes
Exercise 2 • Write a regexp that matches typical intermediate files (.o, .dvi, .tmp) • helpful if you want a systematic way to delete them
String Substitution • Found an input file (*.dat), looking for a matching output file (<same>.out) @input_files = <*.dat> foreach $input_file (@input_files) { # Copy to output name $output_file = $input_file; # replace .dat with .out $output_file =~ s/.dat/.out/; if (! -f $output_file) { print "Need to create output for $output_file\n"; } }
Translating • $s = "Alternate Ending"; • $s =~ tr/[a-z]/[A-Z]; • Can also use 'uc' and 'lc' (more generic for non-English languages)
Grabbing Substrings • Get root URL $url = "http://www.mast.queensu.ca/~math224/Slides/Week_09/driven_spring2.m"; $url =~ /(www[\w.]*)/; $short_url = $1; print "Full URL: $url\n"; print "Site URL: $short_url\n";
End options • s/a/A/g – global; swap all matches • changes "aaaba" to "AAAbA" • Compare with s/a/A/ • changes "aaaba" to "Aaaba" • /tmp/i - case insensitive • recognizes "tmp", "Tmp", "tMP", "TMP"…
Exercise • Write a regexp line that returns all the integers in the text • Can it be extended to handle floating point values?
Functions with Regex • split • split /\s+/, $line; • split /,/, $line; • split /\t/, $line • split //, $line; • grep • @v = qw( aaa bba bbc); • @matches = grep /bb/, @v;
Longer example – Log files • Parsing log files 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/new.gif HTTP/1.1" 200 926 195.5.23.103 - - [25/Mar/2003:02:22:11 -0800] "GET /gcs/update.gif HTTP/1.1" 200 971 proxy.skynet.be - - [25/Mar/2003:02:40:54 -0800] "GET /gcs/gc1hint.html HTTP/1.1" 200 16358 j3194.inktomisearch.com - - [25/Mar/2003:03:13:12 -0800] "GET /~gcs/K-12.html HTTP/1.0" 200 3235 kittyhawk.hhmi.org - - [25/Mar/2003:03:17:20 -0800] "HEAD /gcs/ HTTP/1.0" 200 0 j3104.inktomisearch.com - - [25/Mar/2003:03:54:43 -0800] "GET /gcs/pa.html HTTP/1.0" 200 5614 crawl11-public.alexa.com - - [25/Mar/2003:04:51:41 -0800] "GET /gcs/clinical.html HTTP/1.0" 200 20132 … livebot-65-55-208-64.search.live.com - - [24/Jul/2007:22:16:58 -0700] "GET /gcs/webstats/usage_200602.html HTTP/1.0" 200 128720 203.129.234.42 - - [24/Jul/2007:22:22:39 -0700] "GET /gcs/status/statuscheck.html HTTP/1.1" 200 1522624 livebot-65-55-208-65.search.live.com - - [24/Jul/2007:22:47:32 -0700] "GET /gcs/webstats/usage_200610.html HTTP/1.0" 200 132580 …
Alternate uses • If you write your own program, with many print statements, can • make print statements meaningful • "Time spent on loading: 23.5s" • can parse afterwards to process/store values • $line = m/: ([\d.])+s/; • $time = $1;
Resources • Any web search for "perl regular expression tutorial" • Perl reg exp by example • http://www.somacon.com/p127.php • Reference card • http://www.erudil.com/preqr.pdf • Perl site reference • http://perldoc.perl.org/perlre.html