440 likes | 557 Views
Programming in Unix. Regular Expressions These expressions are used in grep, sed, awk, ed, vi and the various shells. Regular Expressions. A regular expression is a pattern to be matched Perl is a superset of all these tools Any regular expression used in Unix tools can be used in Perl.
E N D
Programming in Unix • Regular Expressions • These expressions are used in grep, sed, awk, ed, vi and the various shells
Regular Expressions • A regular expression is a pattern to be matched • Perl is a superset of all these tools • Any regular expression used in Unix tools can be used in Perl
Regular Expressions • The string abc can be a regular expression by enclosing the string in slashes: $_ = “I know my abc s” if (/abc/) { print $_; }
Regular Expressions Single character patterns - a character in the expression must match a single character in the string • The dot “.” matches any single character other than “\n” /r.g/ would match rug or rag
Regular Expressions • Metacharacters or escape sequences allow you to match certain conditions in a string . \ | ( ) [ * + ? (Are all metacharacters) • A backslash in front of any metacharacter makes it non-special 5.18 would use /5\.18/ 01\20\03 would use /01\\20\\03
Regular Expressions Some escape sequences you might see
Regular Expressions • Pattern /m./ matches any two character pattern that starts with m my or me would be examples of matches
Regular Expressions • A character class uses a list of possible characters enclosed in brackets [ ] • It will match any one character listed within the brackets • [a-z] will match any single lowercase letter (a range can be used with the hyphen) • Negated character class ^ matches character not in the list
Regular Expressions • Grouping Patterns - one or more of…. • Sequence - i.e.; abc means a followed by b followed by c • Multipliers • * indicates zero or more of previous characters • + meaning one or more of the immediately previous character • ? means zero or one of the immediately previous character
Regular Expressions • General Multiplier • $_ = “fred xxxxxxxxxx barney”; • /x{5,10}/ #would look for 5 to 10 repetitions of the letter x • s/x[5,10]/and/; #would substitute and for the x’s
Regular Expressions • Parentheses • (a) matches an a • ([a-z]) matches any single lowercase letter • Alternation • match exactly one of the alternatives a|b|c • /[abc]/ works the same way
Regular Expressions • Anchoring Patterns • Generally when a pattern is matched against a string it is evaluated from left to right matching at the first opportunity • \b anchor requires a word boundary at the indicated point • \B requires that there is not a word boundary • ^ matches the beginning of a pattern • $ matches the ending of a pattern
Regular Expressions /fred\b/; #matches fred but not frederick /\bmo/; #matches moe but not Elmo /\bFred\b/; #matches Fred but not Freddy or AlFred /\b\+\b/; #matches “ + “but not ++ or x+y
Regular Expressions • Precedence • Parentheses ( ) • Quantifiers * + ? { } • Anchors and sequence ^ $ \b \B\ • Alternation |
Regular Expressions • Matches with m// (m not needed when using //) • Searches using /pattern/ is actually a shortcut for m/pattern/ • You may choose any pair of delimiters to quote the contents • Where you used /fred/ you can use m(fred) or m,fred, or m<fred> or m!fred!
Regular Expressions • Different delimiter • rather than the slash (/) • add the letter m to the new delimiter • ie. m@/usr/etc@
Regular Expressions • Binding Operator =~ selects a different target, it tells Perl to match the pattern on the right against the string on the left (instead of matching $_) • Ignoring case with /i • [yY] matches either upper or lower case y • /^procedure/i #matches P or p
Regular Expressions • Case shifting $_ = “I saw Barney with Fred.”; s/(fred|barney)/\U$1/gi; #Now $_ is “I saw BARNEY with FRED.”
Regular Expressions • The split Operator will break up a string according to a separator. This is useful for tab separated or colon-separated data @fields = split /:/, “abc:def:g:h”; Gives you (“abc”, “def”, “g”, “h”) @fields = split /:/, “abc:def::g:h”; Gives you (“abc”, “def”, “”, “g”, “h”)
Regular Expressions • It is common to split on whitespace using /\s+/ as the pattern • All whitespace runs equal to a single space $input= “This is a \t test.\n”; split /\s+/, $input; will give you the result “This”, “is”, “a”, “test.”
Regular Expressions • Substitutions • $_ = “foot fool buffoon”; • s/foo/bar/; #$_is now “bart fool buffoon” • s/// will make just one replacement • s/foo/bar/g; #$_is now “bart barl bufbarn” • /g globally replace on all possible matches
Regular Expressions • The join function takes a list of values and glues them together. Performs the opposite of split. • For example $info = join(“\n”, Name, Address, “Zip Code”); print $info will display Name Address Zip Code
Regular Expressions • Or take a list @values = ( 2, 4, 6, 8, 10); $new_value= join “-”, @values; # $new_value looks like “2-4-6-8-10” $new_value= join “:”, @values; # $new_value looks like “2:4:6:8:10” $new_value= join “-”, “cat”, @values; # $new_value looks like “cat-2-4-6-8-10”
Filehandles and File Tests • What is a filehandle? • An I/O connection between your Perl process and the outside world. • Like the names for labeled blocks • Easy to confuse with future reserved words, so recommendation is to use all UPPERCASE letters in your filehandle;
Filehandles and File Tests • syntax is like: open (FILEHANDLE, “somename”); • FILEHANDLE is the new filehandle and somename is the external filename (such as file or device) • To open a file for write, use the same open statement but prefix the filename with a greater than sign (caution this will overwrite any existing files with the same name) open (OUT, “>outfile”);
Filehandles and File Tests • Syntax continued: • To open a file to append data to it open (LOGFILE, “>>mylogfile”); • All forms of open return true for success and false for failure • When finished with a filehandle you close it close(LOGFILE); • reopening a filehandle will close the previous version
Filehandles and File Tests • When a filehandle does not open successfully you can use the die function to report that an error has occurred • unless statement can be used as a logical or • unless (this) { that; } • this || that; • unless statement used as a logical or with the die statement • unless (open (DATAPLACE, >/tmp/dataplace”)) { print “Sorry, I couldn’t create your file”; }else { #the rest of your program }
Filehandles and File Tests Or….make it even simpler with: unless (open DATAPLACE, “>/tmp/dataplace”) { die “Sorry, I couldn’t create your file”; or open (DATAPLACE, “>tmp/dataplace”) || die “Sorry, I couldn’t create your file”;
Filehandles and File Tests • The -x File Tests • Suppose you wanted to make sure that there wasn’t a file by that name (so you don’t blow away valuable data) when you open and write to a file • Use file tests (see page 157-8) -e for a file or directory exists
Formats • Helps you generate simple, formatted reports and charts • Keeps track of number of lines per page, current page • Use “format” to declare and “write” to execute
Declaring a Format format MYNAME = FORMLIST . Note: if MYNAME is omitted writes to STDOUT FORMLIST is a list containing the following A comment (start the line with #) A “picture” giving the output for one output line An argument line supplying values to plug into the previous “picture” line
Special Values FORMAT_NAME_TOP defines text that will appear at the top of each page FORMAT_NAME section defines format and variables for each line that should print as the body of the report • You should define the format and format_top together somewhere in your program (often seen at the end).
Example # a report on the /etc/passwd file format MY_REPORT_TOP = Password File Report Name Login Uid Gid Shell Home ------------------------------------------------------------------- .
Example #how to send output to the screen format STDOUT = Password File Report Name Login Uid Gid Shell Home ------------------------------------------------------------------- . open STDOUT; write;
Example (cont...) format MY_REPORT = @<<<<< @||||||| @<<<< @>>>> @>>>> @<<<<<<<<<<<< $name, $login, $uid, $gid, $shell, $home . Then to print this when you want: write MY_REPORT;
Example of Code #!/usr/local/bin/perl -w print "This is an address label program\n"; print "Enter your name: \n"; $name=<>; print "Enter your street address: \n"; $street=<>; print "Enter your City, State, and Zip: \n"; $therest=<>; open (AddressLabel,">myaddrlist"); write (AddressLabel); format AddressLabel = ================================== | @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< | $name | @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< | $street | @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< | $therest ================================== .
Example Entering Data # addrlabel.pl This is an address label program Enter your name: Mike Enter your street address: 14590 Roller Coaster Rd Enter your City, State, and Zip: Denver, CO 80931
Example Output to File # cat myaddrlist ================================== | Mike | | 14590 Roller Coaster Rd | | Denver, CO 80931 | ==================================
Format Pictures • @ or ^ indicates substitution at run-time • < left justify • > right justify • | centering • If the variable has more characters than the format picture, it will be truncated • To avoid truncating use “@*” on a format line by itself.
The ^ Picture • Starting a field with ^ allows you to print part of the text with the first call • The next time you reference it, the string will only contain that part of the string that has not been printed and the next n characters will be printed and so on... • Warning!: this does destroy the original value of the variable so store it off if you will need it again.
Example of the ^ # a report from a bug report form format BUG_REPORT = Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $subject From: @<<<<<<<<<<<<<< Priority: @<<<<<<<<<< $from, $priority Description: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $description ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $description ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<… $description
Special Variables • $~ contains $FORMAT_NAME • $^ contains $FORMAT_NAME_TOP • $% contains the current output page number • $= contains number of lines per page • $- contains lines remaining on current page (set to zero to force a new page)
To Use Special Variables • You can use these by “selecting”: $myform = select(MYFORMAT); $~ = “My_Other_Format”; $^ = “My_Top_Format”; select($myform);