200 likes | 363 Views
Lecture 6. Regular Expressions grep. Why Regular Expressions?. Regular expressions are used to describe text patterns/filters Unix commands/utilities that support regular expressions: grep (fgrep, egrep) - search a file for a string or regular expression sed - stream editor
E N D
Lecture 6 Regular Expressions grep
Why Regular Expressions? • Regular expressions are used to describe text patterns/filters • Unix commands/utilities that support regular expressions: • grep(fgrep, egrep) - search a file for a string or regular expression • sed - stream editor • awk (nawk) - pattern scanning and processing language • There are some minor differences between the regular expressions supported by these programs • We will cover the general matching operators first.
Character Class • […] matches any of the enclosed chars • [abc] matches a singlea b or c • [a-z] matches any of abcdef…xyz • [^…] matches any thing not included • [^A-Za-z] matches a single character as long as it is not a letter. • Example: [Dd][Aa][Vv][Ee] • Matches "Dave" or "dave" or "dAVE", • Does not match "ave" or "da"
Regular Expression Operators • Any character (except a metacharacter!) matches itself. • . Matches any single character except newline. • * Matches 0 or more of the immediately preceding R.E. • ?Matches 0 or 1 instances of the immediately preceding R.E. • + Matches 1 or more instances of immediately preceding R.E. • ^ Matches the preceding R.E. at the beginning of the line • $ Matches the preceding R.E. at the end of the line • | Matches the R.E. specified before or after this symbol • \ Turn off the special meaning
R.E patterns • {n} The preceding item is matched exactly n times. • {n,} The preceding item is matched n or more times. • {n,m} The preceding item is matched at least n times, but not more than m times.
R.E patterns • If you put a subpattern inside parens you can use + * and ? to the entire subpattern. a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd"
R.E patterns \<beginning of word anchor \<abcmatches “abcd” but not “dabc” \>end of work anchor abc\> matches “dabc” but not “abcd” \(…\) stores the pattern inside \( and \) \(abc\)defmatches “abcdef” and stores abcin \1. So \(abc\)def\1 matches “abcdefabc”. Can store up to 9 matches \(ab\)c\(de)f\1\2 will match abcdefabde
Examples of R.E. x[abc]?x matches "xax" or "xx“ [abc]* matches "aaaaa" or "acbca" 0*10 matches "010" or "0000010"or "10" ^(dog)$ matches lines starting and ending with dog [\t ]* (A|a)+b*c?
Example • Christian Scott lives here and will put on a Christmas party • There are around 30 to 35 people invited. • They are: • Tom • Dan • Rhonda Savage • Nicky and Kimberly. • Steve, Suzanne, Ginger and Larry ^[A-Z]..$ ^[A-Z][a-z]*3[0-5] [a-z]*\. ^ *[A-Z][a-z][a-z]$
Review: Metacharacters for filename abbreviation • * Matches anything: ls Test*.doc • ? Matches any single character lsTest?.doc • [abc…] Matches any of the enclosed characters: ls T[eE][sS][tT].doc • [a-z] matches any character in a range ls [a-zA-Z]* • [!abc…] matches any character except those listed: ls [!0-9]*
Difference !! • Although there are similarities to the metacharacters used in filename expansion – we are talking about something different! • Filename expansion is done by the shell. • Regular expressions are used by commands (programs). • However, be careful about specifying RE on the command line as a result of this overlap • Good idea to always quote RE with special chars (‘’or “”)on the command line • Example: % grep ‘[a-z]*’ chap[12]* Note: filename mask expanded by shell with “
grep - search for a string • grep [-bchilnsvw] PATTERN [filename...] • Read files or standard /redirected input • Search for specified pattern in each line • Send results to the standard output • Examples: %grep ‘^X11’ *- search all files for lines starting with the string “X11” %grep -v text file - print lines that do not match “text” • Exit status: 0 – pattern found; 1 - not found.
Regular Expressions for grep ifcis any non special character\c turn off any special meaning of character c^ beginning of line$ end of line. any single character[...] any of characters in range .…[^....] any single character not in range .…r* zero or more occurrences of r
grep - options • Some useful options -c count number of lines -iignore case -h do not display filename -l list only the files with matching lines -L list files that dose not match -v display lines that do not match -n print line numbers -r recursively search the sub-directories -x matches the whole line.
Grepadvanced options • -F fixed string, don’t interpret R.E • -m NUM, stop reading a file after NUM matching lines • --exclude=GLOB skip files whose name matches GLOB • --exclude-from=FILE same as –exclude except GLOB is written in FILE • --include=GLOB search only file names mathes GLOB
gamefile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Heme 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Webber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
demos • grep NW gamefile • grep ‘^n’ gamefile • grep‘4$’ gamefile • grep TB Savage gamefile #cannot find file Savage • grep‘5\..’ gamefile • grep '\(3\)\.[0-9].*\1 *\1’ game file • grep‘\<north’ gamefile • grep‘\<north\>’ datafile Now what you want to find?
Demos cont. grep –v “Suan Chin” gamefile grep –l ‘SE’ * grep –c ‘west’ gamefile grep –w ‘north’ gamefile grep –i “$LOGNAME” /etc/passwd
grep with pipes • Remember, we can use pipes when a file is expected • ls –l | grepa*
Recommended Reading • Chapter 3 • Chapter 4, sections 4.1 – 4.5