520 likes | 742 Views
CS465 - Unix. The awk Utility. Background. awk was developed by Aho, Weinberger, and Kernighan (of K & R) Was further extended at Bell Labs Handles simple data-reformatting jobs easily with just a few lines of code. Versions awk - original version nawk - new awk - improved awk
E N D
CS465 - Unix The awk Utility
Background • awk was developed by • Aho, Weinberger, and Kernighan (of K & R) • Was further extended at Bell Labs • Handles simple data-reformatting jobs easily with just a few lines of code. • Versions • awk - original version • nawk - new awk - improved awk • gawk - gnu awk - improved nawk
How awk works • awk commands include patterns and actions • Scans the input line by line, searching for lines that match a certain pattern (or regular expression) • Performs a selected action on the matching lines • awk can be used: • at the command line for simple operations • in programs or scripts for larger applications
Running awk • From the Command Line: $ awk '/pattern/{action}' file • OR From an awk script file: $ cat awkscript # This is a comment /pattern/ {action} $ awk –f awkscript file
awk’s Format using Input from a File $ awk /pattern/ filename • awk will act like grep $ awk '{action}' filename • awk will apply the action to every line in the file $ awk '/pattern/ {action}' filename • awk will apply the action to every line in the file that matches the pattern
record 1 -> George Jones Admin record 2 -> Anthony Smith Accounting Records and Fields • Each record is split into fields, delimited by a special character (whitespace by default) • Can change delimeter with –F or FS • awk divides the input into records and fields • Each line is a record (by default) field-1 field-2 field-3 | | | v v v
awk field variables • awk creates variables $1, $2, $3… that correspond to the resulting fields (just like a shell script). • $1 is the first field, $2 is the second… • $0 is a special field which is the entire line • NF is always set to the number of fields in the current line (no dollar sign to access)
Example #1 $ cat students Bill White 7777771 1980/01/01 Science Jill Blue 1111117 1978/03/20 Arts Ben Teal 7171717 1985/02/26 CompSci Sue Beige 1717171 1963/09/12 Science $ $ awk '/Science/{print $1, $2}' students Bill White Sue Beige $ • Commas indicates that we want the output to be delimited by spaces (otherwise they are concatonated): • $ awk '/Science/{print $1 $2}' students • BillWhite • SueBeige
Example #2 $ cat phonelist Joe Smith 774-0888 Mary Jones 772-2345 Hank Knight 494-8888 $ $ awk '{print "Name: ", $1, $2, \ " Telephone:", $3}' phonelist Name: Joe Smith Telephone: 774-0888 Name: Mary Jones Telephone: 772-2345 Name: Hank Knight Telephone: 494-8888 $ • No pattern given, so matches ALL lines • Text strings to print are placed in double quotes
Example #3 Given a username, display the person’s real name: $ grep small /etc/passwd small000:x:1164:102:Faculty - Pam Smallwood:/export/home/small000:/bin/ksh $ $ awk -F: '/small000/{print $5}' /etc/passwd Faculty - Pam Smallwood $
awk using Input from Commands • You can run awk in a pipeline, using input from another command: $ command | awk '/pattern/ {action}' • Takes the output from the command and pipes it into awk which will then perform the action on all lines that match the pattern
Piped awk Input Example $ w 1:04pm up 25 day(s), 5:37, 6 users, load average: 0.00, 0.00, 0.01 User tty login@ idle JCPU PCPU what pugli766 pts/8 Tue10pm 3days -ksh lin318 pts/17 10:58am 1:45 vi choosesort small000 pts/18 12:43pm w mcdev712 pts/10 11:52am 14 1 vi adddata gibbo201 pts/12 12:15pm 18 -ksh nelso828 pts/16 7:17pm 17:43 -ksh $ $ w | awk '/ksh/{print $1}' pugli766 gibbo201 nelso828 $
Relational Operators • awk can use relational operators ( <, >, <=, >=, ==, !=, ! ) to compare a field to a value • If the outcome of the comparison is true then the the action is performed • Examples: • To print every record in the log.txt file in which the second field is larger than 10 • $ awk '$2 > 10' log.txt • To print every record in the log.txt file which does NOT contain ‘Win32’ • $ awk '!/Win32/' log.txt
Relational Operator Example $ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net) lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com) nelso828 pts/16 Jun 5 19:17 (65.100.138.177) $ $ who | awk '$4 < 6 {print $1, $3, $4, $5}' pugli766 Jun 3 22:24 nelso828 Jun 5 19:17 $
Piping awk output $ who pugli766 pts/8 Jun 3 22:24 (da1-229-38-103.den.pcisys.net) lin318 pts/17 Jun 6 10:58 (12-254-120-56.client.attbi.com) small000 pts/18 Jun 6 13:16 (mackey.rbe36-213.den.pcisys.net) mcdev712 pts/10 Jun 6 11:52 (ip68-104-41-121.lv.lv.cox.net) gibbo201 pts/12 Jun 6 12:15 (12-219-115-107.client.mchsi.com) nelso828 pts/16 Jun 5 19:17 (65.100.138.177) $ $ who | awk '$4 == 6 {print $1}' | sort gibbo201 lin318 mcdev712 small000 $
awk Programming • awk programming is done by building a list • The list is a list of rules • Each rule is applied sequentially to each line (record) • Example: /pattern1/ { action1 } /pattern2/ { action2 } /pattern3/ { action3 }
awk - pattern matching • Before processing, lines can be matched with a pattern. /pattern/ { action } execute if line matches pattern The pattern is a regular expression. • Examples: /^$/ { print "This line is blank" } /num/ { print "Line includes num" } /[0-9]+$/ { print "Integer at end:", $0 } /[A-z]+/ { print "String:", $0 } /^[A-Z]/ { print "Starts w/uppercase letter" }
awk program from a file • The awk commands (program) can be placed into a file • The –f (lowercase f) indicates that the commands come from a file whose name follows the –f $ awk –f awkfile datafile The contents of the file called awkfile will be used as the commands for awk
Example 1 $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ cat awkprog /5?5/ {print $1, $2} /3*4/ {print $5} $ $ awk –f awkprog students Arts Bill Teal Sue Beige $ **NOTE: All patterns applied to each line before moving to next line
Example 2 $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ cat awkprog /Science/ {print "Science stu:", $1, $2} /CompSci/ {print "Computing stu:", $1, $2} $ $ awk –f awkprog students Science stu: Bill White Computing stu: Bill Teal Science stu: Sue Beige $
More about Patterns • Patterns can be: • Empty: will match everything • Regular expressions: /reg-expression/ • Boolean Expressions: $2=="foo" && $7=="bar" • Ranges: /jones/,/smith/
Example - Boolean Expressions $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ cat awkprog $3 <= 444444 {print "Not counted"} $3 > 444444 {print $2 ",", $1} $ $ awk –f awkprog students Not counted Not counted Teal, Bill Beige, Sue $
Example - Ranges $ cat students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci Sue Beige 555777 1963/09/12 Science $ $ awk '/333333/,/555555/' students Bill White 333333 1980/01/01 Science Jill Blue 333444 1978/03/20 Arts Bill Teal 555555 1985/02/26 CompSci $
More Built-In awk Variables • Two types: Informative and Configuration • Informative: NR = Current Record Number (start at 1) • Counts ALL records, not just those that match NF = Number of Fields in the Current Record FILENAME = Current Input Data File • Undefined in the BEGIN block
Example using NF $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk '{print NF}' names 3 4 2 0 $
Example using a boolen, NF, and NR $ cat names Pam Sue Laurie Bob Joe Bill Dave Joan Jill $ $ awk 'NF > 2 {print NR ":", NF, "fields"}' names 1: 3 fields 2: 4 fields $
Built-in awk functions log(expr) natural logarithm index(s1,s2) position of string s2 in string s1 length(s) string length substr(s,m,n) n-char substring of s starting at m tolower(s) converts string to lowercase printf() print formatted - like C printf
print & printf • Use print in an awk statement to output specific field(s) • printf is more versatile • works like printf in the C language • May contain a format specifier and a modifier
Format Specification • A format specification consists of a percent symbol, a modifier, width and precision values, and a conversion character • To display the third field as a floating point number with two decimal places: awk '{printf("%.2f\n", $3)}' file • You can include additional text in the printf statement '{printf ("3rd value: %.2f\n", $3)}'
Type Specifiers: %c Single character %d integer (decimal) %f Floating point %s String Between the % and the specifier you can place the width and precision %6.2f means a floating point number in a field of width 6 in which there are two decimal places Modifiers control details of appearance: - minus sign is the left justification modifier right justification) + plus sign forces the appearance of a sign (+,-) for numeric output 0 zero pads a right justified number with zeros Specifiers, Width, Precision, & Modifiers
awk Variables • Variables • No need for declaration • Implicitly set to 0 AND the Empty String • Variable type is a combination of a floating-point and string • Variable is converted as needed, based on its use title = "Number of students" no = 100 weight = 13.4
awk program execution Executes only once before reading input data BEGIN { ….} { ….} specification { ….. } END { ….. } Executes for each input line Executes for each input linethat matches specified /pattern/ or Boolean expression Executes at the end after all lines being processed
Example #1: Count # lines in file • $ cat awkprog • BEGIN {total = 0} • {total = total + 1} • END {print total " lines"} • $ cat testfile • Hello There • Goodbye! • $ - Set total to 0 before processing any lines - For every row in the file, execute {total = total + 1} - Print total after all lines processed. • $ awk –f awkprog testfile • 2 lines • $
Ex #2: Count lines containing a pattern {totalpattern++} only executes if the line in filename has pattern appearing in the line. $ cat Simpsons Marge 34 Homer 32 Lisa 10 Bart 11 Maggie 01 $ cat countthem BEGIN {totalMa = 0; totalar = 0} /Ma/ { totalMa++ } /ar/ { totalar++ } END { print totalMa " Ma's" print totalar " ar's"} $ $ awk -f countthem Simpsons 2 Ma's 2 ar's $
Example #3: Add line numbers $ cat numawk BEGIN { print "Line numbers by awk" } { print NR ":", $0 } END { print "Done processing " FILENAME } $ cat testfile Hello There Goodbye! $ • $ awk –f numawk testfile • Line numbers by awk • 1: Hello There • 2: Goodbye! • Done processing testfile • $
More Built-In awk Variables • Two types: Informative and Configuration • Configuration FS = Input field separator OFS = Output field separator (default for both is space " ") RS = Input record seperator ORS = Output record seperator (default for both is newline "\n")
Example #1: Reverse 2 columns $ cat switch BEGIN {FS="\t"} {print $2 "\t" $1} $ awk -f switch Simpsons 34 Marge 32 Homer 10 Lisa 11 Bart 01 Maggie $ NOTE: Columns separated by tabs • Alternatively you could do the following: • $ awk -F\t '{print $2 "\t" $1}' Simpsons
Example #2: Sum a column $ cat awksum2 BEGIN { FS="\t" sum = 0 } {sum = sum + $2} END { print "Done" print "Total sum is " sum } $ • $ awk -f awksum2 Simpsons • Done • Total sum is 88 • $
Example #3: Comma delimited file $ cat names Bill Jones,3333,M Pam Smith,5555,F Sue Smith,4444,F $ • $ awk -F, '{print $2}' names • 3333 • 5555 • 4444 • $
Longer awk program $ cat awkprog BEGIN { print "Processing..." } # print number of fields in first line NR == 1 { print $0, NF, "fields"} /^Unix/ { print "Line starts with Unix: ", $0 } /Unix$/ { print "Line ends with Unix: " $0 } # finishing it up END {print NR " lines checked"} $
awk program execution $ cat datfile First Line Unix is great! What else is better? This is Unix Yes it is Unix Goodbye! $ $ awk -f awkprog datfile Processing... First Line 2 fields Line starts with Unix: Unix is great! Line ends with Unix: This is Unix Line ends with Unix: Yes it is Unix 6 lines checked $
awk programming language syntax if ( found == true ) # if (expr)print “Found”; # {action1}else # elseprint “Not found”; # {action2} while ( i <= 100) # while (cond) { i = i + 1; # { actions... print i } # }
awk programming language syntax do # do{ i = i + 1; #{ actions ... print i } # }while ( i < 100); # while (cond); for (i=1; i < 10; i++ ) # for (set; test; incr) { # { sqr = i * i; # actions print i " squared is " sqr} # }
awk – longer example • Write an awk program that prints out content of a directory in the following format: BYTES FILE 24576 copyfile 736 copyfile.c 740 copyfile.c~ 24576 dirlist 989 dirlist.c 977 dirlist.c% 24576 envadv 185 envadv.c <dir> tmp 740 x.c Total: 73684 bytes in 9 regular files
awk example - code $ cat awkprog BEGIN {print " BYTES \t FILE"; sum=0; filenum=0 } # test for lines starting with - /^-/ { sum += $5 ++filenum printf ("%10d \t%s\n", $5, $9) } # test for directories - line starts with d /^d/ { print " <dir> \t", $9 } # conclusion END { print "\n Total: " sum" bytes in" print " " filenum " regular files" } $