200 likes | 317 Views
CSE4251 The Unix Programming Environment. Lecture 5. Regular Expressions; grep ;. Why Regular Expressions. Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions:
E N D
CSE4251 The Unix Programming Environment Lecture 5 Regular Expressions; grep;
Why Regular Expressions • Regular expressions are used to describe text patterns/filters • Unix commands/utilities that support regular expressions: • grep(fgrep, egrep)- search a file for a string or regular expression • sed - stream editor • awk (nawk) - pattern scanning and processing language • There are some minor differences between the regular expressions supported by these programs • We will cover the general matching operators first.
Character Class • […] matches any of the enclosed chars • [abc] matches a single a b or c • [a-z] matches any of abcdef…xyz • [^…] matches any thing not included • [^A-Za-z] matches a single character as long as it is not a letter. • Example: [Dd][Aa][Vv][Ee] • Matches "Dave" or "dave" or "dAVE", • Does not match "ave" or "da"
Regular Expression Operators • Any character (except a metacharacter!) matches itself. • . Matches any single character except newline. • * Matches 0 or more of the immediately preceding R.E. • ?Matches 0 or 1 instances of the immediately preceding R.E. • + Matches 1 or more instances of immediately preceding R.E. • ^ Matches the preceding R.E. at the beginning of the line • $ Matches the preceding R.E. at the end of the line • | Matches the R.E. specified before or after this symbol • \ Turn off the special meaning
R.E patterns • {n} The preceding item is matched exactly n times. • {n,} The preceding item is matched n or more times. • {n,m} The preceding item is matched at least n times, but not more than m times.
R.E patterns • If you put a subpattern inside parens you can use + * and ? to the entire subpattern. a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd"
R.E patterns \< beginning of word anchor \<abcmatches “abcd” but not “dabc” \> end of work anchor abc\> matches “dabc” but not “abcd” \(…\) stores the pattern inside \( and \) \(abc\)defmatches “abcdef” and stores abcin \1. So \(abc\)def\1 matches “abcdefabc”. Can store up to 9 matches \(ab\)c\(de\)f\1\2 will match abcdefabde
Examples of R.E. x[abc]?x matches "xax" or "xx“ [abc]* matches "aaaaa" or "acbca" 0*10 matches "010" or "0000010"or "10" ^(dog)$ matches lines starting and ending with dog [\t ]* (A|a)+b*c?
Example • Christian Scott lives here and will put on a Christmas party • There are around 30 to 35 people invited. • They are: • Tom • Dan • Rhonda Savage • Nicky and Kimberly. • Steve, Suzanne, Ginger and Larry ^[A-Z]..$ ^[A-Z][a-z]*3[0-5] [a-z]*\. ^ *[A-Z][a-z][a-z]$
Review: Metacharacters forfilename abbreviation (lecture 2) • * Matches anything: ls Test*.doc • ? Matches any single character lsTest?.doc • [abc…] Matches any of the enclosed characters: ls T[eE][sS][tT].doc • [a-z] matches any character in a range ls [a-zA-Z]* • [!abc…] matches any character except those listed: ls [!0-9]*
Difference • Although there are similarities to the metacharacters used in filename expansion – we are talking about something different. • Filename expansion is done by the shell. • Regular expressions are used by commands (programs). • However, be careful about specifying RE on the command line as a result of this overlap • Good idea to always quote RE with special chars (‘’or “”)on the command line • Example: $ grep ‘[a-z]*’ somefile.txt
grep - search for a string • grep [-bchilnsvw] PATTERN [filename...] • Read files or standard /redirected input • Search for specified pattern in each line • Send results to the standard output • Examples: $ grep ‘^X11’ * - search all files for lines starting with the string “X11” $ grep -v text file - print lines that do not match “text” • Exit status: 0 – pattern found; 1 - not found.
Regular Expressions for grep if cis any non special character\c turn off any special meaning of character c^ beginning of line$ end of line. any single character[...] any of characters in range .…[^....] any single character not in range .…r* zero or more occurrences of r
grep - options • Some useful options -c count number of lines -iignore case -l list only the files with matching lines -L list files that dose not match -v display lines that do not match -n print line numbers -r recursively search the sub-directories
grep advanced options • -F fixed string, don’t interpret R.E • -m NUM, stop reading a file after NUM matching lines • --exclude=GLOB skip files whose name matches GLOB • --include=GLOB search only file names mathes GLOB
gamefile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Heme 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Webber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
demos $ grep NW gamefile northwest NW Charles Main 3.0 .98 3 34 $ grep '^n' gamefile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Webber 4.5 .89 5 9 $ grep '4$' gamefile northwest NW Charles Main 3.0 .98 3 34 $ grep TB Savage gamefile grep: Savage: No such file or directory gamefile:eastern EA TB Savage 4.4 .84 5 20 $ grep -l 'SE' * gamefile
demos cont. $ grep '5\..' gamefile western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 northeast NE AM Main Jr. 5.1 .94 3 13 central CT Ann Stephens 5.7 .94 5 13 $ grep '\(3\)\.[0-9].*\1 *\1' gamefile northwest NW Charles Main 3.0 .98 3 34 $ grep '\<north' gamefile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Webber 4.5 .89 5 9 $ grep '\<north\>' gamefile north NO Margot Webber 4.5 .89 5 9
demos cont. $ grep -v "Suan Chin" gamefile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southeast SE Patricia Heme 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Webber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 $ grep -c 'west' gamefile 3 $ grep -w 'north' gamefile north NO Margot Webber 4.5 .89 5 9 $ grep -i "$LOGNAME" /etc/passwd zhengm:x:503:504::/home/zhengm:/bin/bash
grep with pipes • Remember, we can use pipes when a file is expected • ls –l | grep a*