110 likes | 229 Views
Lecture 10. Introduction to AWK COP 3344 Introduction to UNIX. 1. What is AWK. Important early text manipulation language Created by Al Aho, Peter Weinberger & Brian Kernighan This Unix utility manipulates text files that are viewed as arranged in columns
E N D
Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX 1
What is AWK • Important early text manipulation language • Created by Al Aho, Peter Weinberger & Brian Kernighan • This Unix utility manipulates text files that are viewed as arranged in columns • awk splits each line of input (from standard input or a set of files) based on whitespace (the default) and processes each line - the field separator need not be whitespace but can also be a specified character • There are also other flavors of awk such as nawk and gawk 2
Awk Command Structure • awk [options] ‘program’ [file(s)] • awk [options] -f programfile [files(s)] • A program can be one or more pairs of the following: • pattern { procedure } • BEGIN and END constructs can also be used • An important option is -Fc where c is the field separator to use. For example awk -F: . . . indicates that the separator is”:” • Example • awk -F: ‘/this/ { print $2 }’ file1 3
Awk Program Processing • awk scans each input line for pattern and when a match occur the associated actions defined by procedure are executed. The general form of a program is: • BEGIN { initial statements } • pattern { procedure } • pattern { procedure } • END { final statements } • If the pattern is missing, the procedure is applied to each line • If procedure is missing, then the matched lines are written to standard output • Fields are referred to by the variables $1, $2, …, $n. $0 refers to the entire record (the line). • Statements following BEGIN are done before any pattern-procedures; statements after END are done after all pattern-procedures. • In most programs there is only one pattern {procedure} 4
awk patterns • awk patterns can be of the following form • /regular expression/ • relational expression • field-matching expression • Example patterns • /this/ • /^alpha*/ • NF > 2 • $1 == $2 • $1 ~ /m$/ 5
Example pattern-procedures • Print the second field of each line { print $2 } • Print the first field of all lines that contain the pattern alpha /alpha/ { print $1 } • Print all records containing more than two fields NF > 2 • Add numbers in second column if first field matches the word “add” • $1 ~ /^add$/ { total += $2 } • END { print “total is”, total } 6
awk Regular Expressions • Regular expressions are formed in the same way as they are for extended grep. All the operators are available • Note that regular expressions must be placed with the slashes: /<regular expression>/ • Examples • /D[Rr]\./ #matches any line containing DR. or Dr. • /^alpha/ #matches any line starting with alpha • /^[a-zA-Z]+/ #matches any line starting with a sequence of #letters (one or more) 7
awk Relational Expressions • Relational expressions can consist of strings, numbers, arithmetic / string operators, relational operators, defined variables, and predefined variables. • $1, …, $n, are the fields of the record • $0 is the entire line • NF is the number of fields in the current line • NR is the number of the current line • FS is the field separator • FILENAME is the current filename • many relational operators are available • NF > 5 && $1 == $2 • /while/ || /do/ • Note: variables can be assigned with the “=“ operator • FS = “,” • total = 5 8
awk field matching expressions • Field matching expressions can check if a regular expression matches “~” or does not match “!~” a field. • Examples • $1 ~ /D[Rr]\./ #first field matches DR. or Dr. ? • $1 !~ /From/ #first field does not match From ? 9
awk procedures • An awk procedure specifies the processing of a line that matches a given pattern. An awk procedure is contained within the “{“ and “}” and consists of statements separated by semicolons or newlines. • awk is a full programming language, and contains control statements (such as: do while, for, if, break, continue, etc.) • Note that BEGIN can be used to initialize variables and END can be used to do post processing after all records have been processed 10
awk examples • #print the first two fields of each line if the first field matches the string /this/ awk ‘/this/ { print $2, $1 }’ file1 • #sum the values of the fields in the second column and print out the final sum, if the first field matches add awk ‘BEGIN { sum=0 } /add/ { sum += $2 } \ END{ print sum }’ file2 • # illustrating if statements and the or operator awk ‘/green/ || /yellow/ \ {if ($1==“green") print $1 ; \ else if ($1=="yellow") print "SLOW DOWN";}’ \ file3 11