350 likes | 527 Views
Perl. Perl. Perl - Practical extraction report language for text files system management combines C, SED, AWK, SH interpreted dynamic. Data Structures. scalars $num arrays @num associative arrays %num $num[50] 50th element of the array num $#num last index of num. Examples.
E N D
Perl • Perl - Practical extraction report language • for text files • system management • combines C, SED, AWK, SH • interpreted • dynamic
Data Structures • scalars $num • arrays @num • associative arrays %num • $num[50] • 50th element of the array num • $#num • last index of num
Examples #! /usr/local/bin/perl -w # find the sum of a list of numbers from STDIN # one number per line $sum = 0; while( <STDIN> ) { $sum += int $_; } print "the sum is $sum\n";
Examples #!/usr/bin/perl -w # find the sum of a list of numbers from STDIN # several numbers per line $sum = 0; while( <STDIN> ) { @nums = split; foreach (@nums) { $sum += int $_; } } print "the sum is $sum\n";
Average #!/usr/bin/perl -w # find the average of a list of # numbers from STDIN # several numbers per line $sum = 0; $count = 0; while( <STDIN> ) { @nums = split; foreach (@nums) { $sum += int $_; $count++; } } print "the average is ", $sum/$count, "\n";
median #!/usr/bin/perl -w # find the median of a list of number # from STDIN # several numbers per line @nums = (); while( <STDIN> ) { @nums = (@nums, split ); } @nums = sort @nums; if($#nums % 2) { $median = ($nums[($#nums - 1)/2] + $nums[($#nums + 1)/2])/2; } else { $median = $nums[$#nums/2]; } print "the median is $median\n";
Output? #!/usr/bin/perl -w @stuff = ("one", "two", "three"); print @stuff, "\n"; $stuff = ("one", "two", "three"); print $stuff, "\n"; $stuff = @stuff; print $stuff, "\n"; onetwothree8 three 3
Pattern Matching m// s/// Modifiers • i case-insensitive • m multiple lines • s single line • x extend
Regular Expressions Code Meaning \w Alphanumeric Characters \W Non-Alphanumeric Characters \s White Space \S Non-White Space \d Digits \D Non-Digits \b Word Boundary \B Non-Word Boundary \A ^ At the Beginning of a String \Z $ At the End of a String . Match Any Single Character
Regular Expressions * Zero or More Occurrences ? Zero or One Occurrence + One or More Occurrences { N } Exactly N Occurrences { N,M } Between N and M Occurrences .* <thingy> Greedy Match, up to the last thingy .*? <thingy> Non-Greedy Match, up to the first thingy [ set_of_things ] Match Any Item in the Set [ ^ set_of_things ] Does Not Match Anything in the Set ( some_expression ) Tag an Expression $1..$N Tagged Expressions used in Substitutions
Rules • Rule 1 • The engine tries to match as far left as it can • Rule 2 • The regular expression is regarded as set of alternatives. Tries them left to right. (see page 61) • Rule 3 • Items that have choices match from left to right /x*y*/ • Rule 4 • Assertions • ^ $ \b \B \A \Z \G (?…) (?!…)
Rules • Rule 5 • A quantified atom matches only if the atom itself matches some number of times allowed by the quantifier Maximal minimal {n,m} {n,m}? {n,} {n,}? At least n {n} {n}? Exactly n * *? 0 or more + +? 1 or more ? ?? 0 or 1
Rules • Rule 6 • Each atom matches according to its type • (…) ==> grouping + storage $1, $2 • . matches any char except \n • […] groups • Special characters \a \n \r … • \1 \2 ... backreference to (…) • \033 octal char • \xf7 hex char • \cD control char • any other \ matches the char itself
precedence • () (?: ) • Repetition • Sequence • | alteration
How do you fix it? /(‘[^’]’*’)/
Examples s/^([^ ]) +([^ ]+)/$2 $1/ /(\w+)\s*=\s*\1/ /.{40,}/ /^((\d+\.?\d*|\.\d+)$/ if (/Time: (..):(..):(..)/){ $hours = $1; $minutes = $2; $seconds = $3; }
Default arguments • $_, @_, @ARGV, STDIN sub foo{ my $x = shift; # @_ default • in the main program @ARGV while($_ = shift) { if(/^-(.*)/){ process_optein($1); } else { process_file($_); } }
Reading a stream open FIN, “myfile” or die; while (<FIN>){ # do something with $_ } foreach (<FIN>){ # do something with $_ } print sort <FIN>;
Reading a stream # print a window @f = <FIN>; foreach ( 0..$#f ) { if[$[$_] =~ /\bShazam\b/){ $lo = ($_ > 0)? $_ -1 : $_; $hi = ($_ < $#f) )? $_ +1 : $_; print map{“$_: $f[$_]”} $lo .. $hi; } }
Sorting • sort numerically sub numerically { $a <=> $b } @list = sort numerically (16, 1, 8, 2, 4, 32); or @list = sort { $a <=> $b } (16, 1, 8, 2, 4, 32); @list = sort{uc($a) cmp uc($b)} qw(this is a test); #reverse @list = sort { $b <=> $a } (16, 1, 8, 2, 4, 32);
example #! /usr/bin/perl -w # This script will count the frequency of distinct words # in the file that is given as an argument. # Warning: Error checking is minimal! die "usage: $0 file\n" unless @ARGV; while(<>){ tr/A-Z/a-z/; # translate to lowercase @w = split(/[\W]+/,$_); # split into words foreach (@w){ $list{$_}++; # increment the counter } } foreach $key (sort {$list{$b} <=> $list{$a}} keys %list) { print $key, ' = ', $list{$key}, "\n"; }
Tokenizing # tokenize an arithmetic expression while($_){ if(/^(\d+)/) { push @tok, ‘num’, $1; } elsif(/^([+\-\/*()])/) { push @tok, ‘punct’, $1; } elsif (/^([\d\D])/) { die “invalid char $1 in input”; } $_ = substr($_, length $1); } • substr slows things down • cut start of string
Tokenizing 2 while(/ (\d+) | ([+\-\/*()]) | ([\d\D])/gx) { if($1 ne “”){ push @tok, ‘num’, $1; }elsif ($2 ne “”) { push @tok, ‘punct’, $2; }else { die “invalid char $3 in input”; } }
Tokenizing 3 { if(/\G(\d+)/gc) { push @tok, ‘num’, $1; } elsif(/\G([+\-\/*()])/gc) { push @tok, ‘punct’, $1; } elsif (/\G([\d\D])/gc) { die “invalid char $1 in input”; }else{ last; } redo; }
Use split for clarity ($a, $b, $c) = /^(\S+)\s+(\S+)\s+(\S+)/; ($a, $b, $c) = split /\s+/, $_; ($a, $b, $c) = split; Get the fifth field: ($a) = /[^:]*:[^:]*:[^:]*:[^:]*:([^:]*)/; or ($a) = /(?:[^:]*:){4}([^:]*)/; or ($a) = (split /:/)[4];
unpac ps l F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 100 1216 30562 30561 7 0 2804 1768 rt_sig S pts/2 0:00 -tcsh 000 1216 30658 30562 10 0 2780 1080 - R pts/2 0:00 ps l chomp (@ps = `ps l`); shift @ps; for(@ps){ ($uid, $pid, $sz, $tt) = unpack '@3 A6 @9 A7 @30 A5 @52 A7', $_; print "$uid, $pid, $sz, $tt\n"; }
Avoid regex for simple strings do_it() if $answer eq ‘yes’; do_it() if $answer =~ /^yes$/; do_it() if $answer =~ /yes/; do_it() if lc($answer) eq ‘yes’; do_it() if $answer =~ /^yes$/i;
#!/usr/bin/perl # remove the comments from a C program $filename = shift or die "usage $0 filename\n"; open FIN, $filename or die "can't open file"; while (<FIN>){ for(split m!("(:?\\\W|.)*?"|/\*|\*/)!){ if($in_comment){ $in_comment = 0 if $_ eq "*/"; } else { if ($_ eq "/*") { $in_comment = 1; print " "; } else { print; } } } print "\n"; }
References $a = 3.1416; $scalar_ref = \$a; $array_ref = \@a; $hash_ref = \%a; $array_el_ref = \$a[3]; $hash_el_ref = \$a{‘John’};
Lists of Lists @LoL = ( [“fred”, “barney” ], [“george”, “jane”, “elroy” ], [“homer”, “marge”, “bart” ], ); print $LoL[2][2]; # prints “bart” $ref_to_LoL = [ [“fred”, “barney” ], [“george”, “jane”, “elroy” ], [“homer”, “marge”, “bart” ], ]; print $ref_to_LoL ->[2][2]; • Note: $LoL[2][2] implies $LoL[2]->[2]
Grow your own while(<>){ @tmp = split; push @LoL, [ @tmp ]; }
Hashes of Arrays %HoL = ( flinstones => [“fred”, “barney” ], jetsons => [“george”, “jane”, “elroy” ], simpsons => [“homer”, “marge”, “bart” ], ); • generation # reading from a file with format: # flistones: fred barney .. while(<>){ next unless s/^(.*?):\s*//; $HoL{$1} = [ split ]; } • or while($line = <>){ ($who, $rest) = split /:\s*/, 2; @fields = split ‘ ‘, $rest; $Hol{$who} = [ @fields ]; }
Hashes of Arrays # calling a function for $group (flinstones, jetsons, simpsons) { %HoL($group) = [ get_family($group) ]; ); # append member to existing family push @{ $HoL{flinstones} }, “wilma”, “betty”; • access $HoL{flinstone}[0] = “fred”;