230 likes | 399 Views
grep (Global REgular expresion Print). Operation Search a group of files Find all lines that contain a particular regular expression pattern Write the result to an output file grep returns to the prompt with no extra output when it is done Syntax: grep [-cilLnrsvwx] pattern [list of files]
E N D
grep (Global REgular expresion Print) • Operation • Search a group of files • Find all lines that contain a particular regular expression pattern • Write the result to an output file • grep returns to the prompt with no extra output when it is done • Syntax: grep [-cilLnrsvwx] pattern [list of files] • Examples • find information about the user, harley>grep harley /etc/passwd • Find all lines in the files containing the string xxx . >grep xxx .
grep Flags • -c count the number of matches • -i Ignore case when searching for matches • -l List the file names containing matches • -L list files that do not have a match • -n Write the line number in front of each line • -r perform a recursive directory search • -s suppress warning and error messages • -v search for lines without the matching pattern • -w search only for complete words • -x only match lines that exactly match the pattern
Regular Expressions Note: Many UNIX programs use these (vi, sed, more, grep, awk) • Industry standard way to specify patterns • In Java: string.match("pattern"); • In Java: string.replaceAll("pattern", string) • Meta-characters and operators ^ beginning of line, $ end of a line * match 0 or more of the previous group + match 1 or more of the previous group ? match 0 or one of the previous group {n} match n of the previous group {m,n} match m to n of the previous group {n,} match n or more of the previous group | match either the group before or the groups after . match any character except for new line \ literally interpret the following meta-character or operator
Regular Expression Examples Note: To use ( ) {} or + grep use the –E (extended) switch or precede with \
More grep Examples Contents of a file called homework Math: problems 12-10 to 12-33, due Monday BasketWeaving: make a 6-inch basket, DONE Psychology: essay on Animal Existentialism, due end of term Surfing:catch at least 10 grep commands >grep –v DONE homework displays all but line 2 >grep –c DONE homework displays 1 >grep –wi ".*a.*" on homework displays all lines >grep –w "m.*e" homework displays line 2 >grep –i "d.*e" homework displays lines 1, 2 and 3 >grep '\(Ma\|DO\).*' homework displays lines 1 and 2 Note: the last example escapes the parentheses and the vertical bar
Sorting Data • Background • Each line in a file is a record • Each line is a series of fields separated by spaces and/or tabs • Commands >sort fileName sorts fileName on the 1st field of each line >sort +5 fileName sorts on the 6th field of each line >sort –n +4 fileName sort on the 5th field numerically >sort –t ':' +3r +2 fileName sort descending on the 4th field, and then ascending on the 3rd with ':' as a delimeter >sort –t ':' fileName sort using ':' as a separator character >sort –u +1 fileName sort reverse on the 2nd field and remove duplicates (output must be unique) >sort –k 3,4 in a pipe sorts by the key, from field 4 through field 5 >sort +4n +7 sorts numeric by the 5th field and alphabetic by the 8th
SED (Stream Editor • SED is a filter • Input from stdin or a file • Output to stdout or a file • Modifies the input to produce the output • Non-interactive • Processing • Read from an input stream • Perform line oriented commands • Write to an output stream • Syntax: >sed [-i] command | [-e command] … [file]
Search and Replace Note: This syntax works in vi, more, awk • Search, change and redirect to newFile>sed ‘s/cat/dog/g' file > newFile • Search, change, and edit file>sed –i ‘s/cat/dog/g' file • Specific range of lines: >sed '5,10s/cat/dog/g' file • Lines apply search to lines containing OK: >sed '/OK/s/cat/dog/g' names • Lines apply to lines having 2 numeric characters>sed '/[0-9]\{2\}/s/cat/dog/g' names • Delete range of lines: >sed '5,10d' file Note: single quotes suppress the shell's interpretation of special characters Note: You must escape the characters: +, { and } for it to work
sed –i \ -e 's/mon/Monday/g' \ -e 's/tue/Tuesday/g' \ -e 's/wed/Wednesday/g' \ -e 's/thu/Thursday/g' \ -e 's/fri/Friday/g' \ -e 's/sat/Saturday/g' \ -e 's/sun/Sunday/g' \ calendar The backslash is a continuation character The –e specifies another command (extension) The '/g/ means change every occurrence on each line, not just the first Complex Commands
AWK • AWK (Aho, Weinberger, Kernigham) • Special purpose programming language • Interpretive • Useful for UNIX Scripts • Purposes • Filter text files based on supplied patterns • Produce reports • Callable from "vi" • Create simple databases • Simple mathematical operations • Creating scripts • Not good for large complicated tasks • Other interpretive languages: perl, php
The single quote causes the shell to ignore special characters The various clauses are optional Much of the syntax for <action> clauses is c and Java compatible The patterns utilize regular expressions BEGIN {<initialization>} <pattern> {<action>} <pattern> {<action>} • • • <pattern> {<action>} END {<final actions>} >awk '<awk program>' General Syntax
AWK General Operation • Each file consists of a series of records • Each record is a series of fields • Defaults • Record separator: new line character • Field separator: white space characters • Flow of Operation • Read the input file line by line • If it matches the line, then process • Otherwise skip
Some AWK Simple Examples • Print fields of records in a file>awk ' {print $5, $6, $7, $8} ' fileName • Print lines with a search string>awk '/gold/ {print}' fileName • Print the number of records>awk 'END {print NR, "records"}' fileName • Print records using a condition>awk '{if ($3 < 1980) print $3}' fileName • Using variables>awk '/gold/{sum += $2} END {print "value = " sum}' fileName
Longer Program in a file # awk program summarizing a coin collection BEGIN {num_gold=0; wt_gold=0; } { /gold/ {num_gold++; wt_gold += $2}; } END { val_gold = 485 * wt_gold;printf("\n Gold Pieces: %2d," num_gold); printf("\n Gold Weight: %5.2f", wt_gold); printf("\n Gold Value: %7.2f\n", val_gold); } To Execute an AWK program: >awk –f <program fileName>
Invoking AWK >awk [-F<ch>] [<program>] [-f <programFile>] [<vars>] [- | <datafile>] • <ch> is a field separator (default: space, tab) • <program> an AWK program • <programFile> a file containing an AWK program • <vars> a series of variables to initialize>awk –f program f1=file2 f2=file1 > output • - means accept AWK input from STDIN • <dataFile> a file containing data to process Note: AWK is often invoked repeatedly in shell scripts
Search Patterns • An exact string: /The/ • A string starting a line: /^The/ • A string ending a line: /The$/ • A String ignoring case of first letter: /[Tt]he • Decimal: /[0-9]*.[0-9]*/ • Alphanumeric: /[a-zA-Z0-9]*/ • Choice between two strings: /(da|De).*/ • Numeric: /[+-]?[0-9]+/ • Any Boolean expression: $4>90 or $4>$5 Note: Some utilities require \(, \) and \| if you use ()| regular expression characters
Built in Variables • NR: Total number of records • NF: Total number of fields • FILENAME: The current input file • FS: Field separator character • RS: Record separator character • OFS: Output field separator character • ORS: Output record separator character • OFMT: The default printf output format
Arrays and control structures • Indexed and associative arrays • By index: months[3] = "March"; • Associative: debts["Kim"] = 1000; • Note: arrays index from one, not zero • Counter Controled: for (i=1, i<100; i++) data[i] = i; • Iterator: for (i in myArray) print i, names[i]; • Pre test: i=0; while (i<20) data[i] = i++; • Condition: if (i==1) print debts["Kim"] else print debts["Joe"]; print (i==1)? debts["Kim"] : debts["Joe"]; • Unconditional control statements • break: jump out of a loop • continue: next iteration • next: get next line of input • exit: exit the AWK program
Built-in functions • Square root: print sqrt(3.6) • Integer portion: print int(3.2) • Substring: print substr("abcde", 3,2); • Split: letters = split("a;b;c;d;e", ";"); • Position: print index("gorbachev", "bach");Note:if a substring doesn't exist, 0 returnedNote:Strings index from one, not zero
printf • printf(<template>, <arguments>); • printf applies the template to the arguments • Formats are specified in the templates%d for integer output%o for octal%x for hexadecimal%s for string%e for exponential format%f for floating point format • Greater control%5.2f means 5 spaces wide, print two digits%-8.4s means left justify, 8 wide, print 4 characters%08s means output leading zeroes, print 8 characters
Escape Characters • New line: \n • Carriage return: \r • Backspace: \b • Horizontal tab: \t • Form feed: \f • A quote: \" • A backslash: \\
AWK redirection and pipes • Create a file with the first field>awk '{print $1 >> "file" } • Pipe output to another utility>ls –l | awk '{print $8}' | tr '[a-z]' '[ A-Z]'Pipe to a utility to translate from lower to upper case • Sort the grades file and print the first field>sort +4n grades | awk '{print $1}' • list .txt files < 2000 bytes, print sorted descending>ls –l | grep '\.txt$' | awk '$5 < 2000 {print $9, $5}' | sort –nr +1
More Examples • Print Bush's grades>awk '/Bush/{print $3, $4}' grades • Print first name, last name, and quiz 3 grade for everyone who got more than a 90 on quiz 1 and 2>awk '{if ($4>90 && $5>90) print $3, $2, $6}' grades>awk '$4>90 && $5>90 {print $3, $2, $6}' • Print username for user with userid 502>awk –F: '{if ($3==502) print $1}'>awk –F: '$3==502 {print $1}'