330 likes | 496 Views
10 The Awk Programming Language. Mauro Jaskelioff (originally by Gail Hopkins). Introduction. What is awk? Command line syntax Patterns and procedures Commands Variables Built in variables Variable assignment Arrays Defining functions. What is awk?.
E N D
10 The Awk Programming Language Mauro Jaskelioff (originally by Gail Hopkins)
Introduction • What is awk? • Command line syntax • Patterns and procedures • Commands • Variables • Built in variables • Variable assignment • Arrays • Defining functions
What is awk? • A pattern matching program for processing files • There are different versions of awk: • awk - the original version, sometimes called old awk, or oawk • New awk - additional features added in 1984. Often called nawk • GNU awk (gawk)- has even more features • The version installed in unnc-cslinux is GNU awk 3.1.3
What does awk do? • A text file is thought of as being made up of records and fields • On this file you can: • Do arithmetic and string operations • Use loops and conditionals (if-then-else) • Produce formatted reports
What does awk do? (2) • awk (new awk) also allows you to: • Execute UNIX commands from within a script • Process the output from UNIX commands • Work with multiple input streams • Define functions
What does awk do? (3) • awk can also be combined with sed and shell scripting! • Shell is very easy and quick to write, but it lacks functionality. • sed, awk and shell are designed to be integrated • Simply invoke the sed or awk interpreter from within the shell script, rather than from the command line!
awk Command Line Syntax • From the command line, you can invoke awk in two ways: • awk [options] ‘script’ var=value file(s) • Here, a script is specified directly from the command line • awk [options] -f scriptfile var=value file(s) • Here, a script is stored in a scriptfile and specified with the -f flag • nawk allows you to specify more than one scriptfile at a time (-f scriptfile1 -f scriptfile2, etc.)
awk Command Line Syntax - assigning values to variables • You can assign a value to a variable on the command line (nawk only): • This value can be one of three things: • A literal, e.g. count=5 • awk -f scriptFile count=5 • A shell variable, e.g. $count • awk -f scriptFile count=$count • A command substitution, e.g. `cmd` • awk -f scriptFile count=`who | wc-l` • The value is ONLY available after the BEGIN statement within the script is executed • To make the value available to BEGIN statement: • awk -v count=5 -f scriptFile
awk Command Line Syntax - giving awk a file to operate on • awk operates on one or more more files • You do not have to give awk any files to operate on • Either don’t specify one • Or specify none using ‘-’ • awk -f scriptFile - • If you don’t give awk a file to operate on it takes input from STDIN
awk Command Line Syntax - Field separators • You can set a field separator • In other words, a symbol (or even a regular expression in nawk) that should appear between fields of a record • Do this using -F • E.g. awk –F’;’ –f scriptFile count=5 myFile • Would look for fields in a record (or line) in myFile separated by a semi-colon • Also awk –f scriptFile FS=’;’ count=5 myFile • Fields are referred to by the variables $1, $2, etc. • $0 means the whole record
Field Separators - example $ head /etc/passwd root:x:0:0:root:/root:/bin/bash rootnir:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin • Suppose you want to extract and print the first three (colon-separated) fields of each record in /etc/passwd, on separate lines
Field Separators - example (2) $ awk -F: '{print $1; print $2; print $3}' /etc/passwd root x 0 rootnir x 0 bin x 1 daemon x 2 adm x 3 … Print the first ($1), second ($2) and third ($3) field Look in the file /etc/passwd Look for fields separated by a colon
Patterns and Procedures • awk scripts consist of patterns and procedures: • Patterns and procedures are optional • If a pattern is missing, the procedure applies to all lines • If the procedure is missing, the matched line (matched by pattern) is printed awk -F: ‘/^...:/ {print $1}’ /etc/passwd Procedure Pattern
Patterns A pattern can be: • /regular expression/ • Use the metacharacters we have already seen • ^ and $ mean the beginning and end of a string (e.g. the fields) NOT beginning/end of a line awk -F: ‘/^...:/ {print $1}’ /etc/passwd • Relational expression • Use relational operators, e.g. $1 > $2 • Can do numeric or string comparisons awk -F: ‘$1==“gdm” {print $0}’ /etc/passwd
Patterns (2) • Pattern-matching expression • E.g. quoted strings, numbers, operators, defined variables… • ~ means match, !~ means don’t match awk -F: '$1 ~ /.dm.*/ {print $0}' /etc/passwd • BEGIN • Specifies procedures that take place before the first input line is processed awk ‘BEGIN {print “Version 1.0”}’ dataFile • END • Specifies procedures that take place after the last input record is read awk ‘END {print “end of data”}’ dataFile
Procedures • Consist of one or more: • Commands • Functions • Variable assignments • These are separated by newlines or semi-colons and are contained within curly brackets { }
Commands used with Procedures • There are 5 types of commands: • Assignments of variables or arrays • Commands that print • Built-in functions • Control-flow commands • User-defined functions (in nawk only)
Some Examples usingPatterns and Procedures awk –F: '{print $1}' /etc/passwd -print first field of each line in /etc/passwd awk '/root/' /etc/passwd -print all lines in /etc/passwd that contain the pattern “root” awk -F: '/root/ {print $1}' /etc/passwd -print first field of lines that contain “root” in /etc/passwd awk ‘{print NR}’ -print the number of the current record
awk Built-in Variables • awk has a number of built in variables: • FILENAME - current filename • FS - Field separator • NF - Number of fields in current record • NR - Number of current record • RS - Record separator • $0 - Entire input record • $n - nth field in current record
Variable Assignments • Assign variables with an =, E.g.: • FS = “:” • var1 = count+2 • var2 = max-min • var3 = 2 < 3 ? 4 : 5 • Access variables using just the name • {print var3} • What’s the result?
Arrays in awk • awk has arrays with elements subscripted with strings (associative arrays) • Assign arrays in one of two ways: • Name them in an assignment statement • myArray[i]=n++ • Use the split() function • n=split(input, words, "")
Reading elements in an array • Using a for loop: • Using the operator in: • …use this to see if an index exists. (nawk) for (item in array) print array[item] if (index in array) ...
Defining Functions in awk • You can define your own functions in awk, in much the same way as you define a function in C or Java • Thus code that is to be repeated can be grouped together inside a function • Allows code reuse! • NOTE: when calling a function you have defined yourself, no space is allowed between the function name and the left bracket.
An Example using a Function and an Array # capitalise the first letter of each word in a string function capitalise(input) { result= "" n=split(input, words, "") for (i=1; i <=n; i++) { w = words[i] w = toupper(substr(w, 1, 1)) substr(w, 2) if (i > 1) result = result "" result = result w } return result } # this is the main program { print capitalise($0) }
Break-down of Example # capitalise the first letter of each word in a string function capitalise(input) { … Variable to be used in function - input contains whatever the caller called the function with
Break-down of Example (2) … result= "" n=split(input, words, "") … Set result to be an empty string Take the input and split it up into the array “words” - divide the input wherever there is a space n is the result returned by the split command and contains the number of elements in the array “words”
Break-down of Example (3) Take remainder of string starting at 2nd character and append it to capitalised character Take the substring which starts at the first character and has a length of 1 and capitalise using toupper() For each element of array from 1 to the number of elements… … for (i=1; i <=n; i++) { w = words[i] w = toupper(substr(w, 1, 1)) substr(w, 2) if (i > 1) result = result "" result = result w } return result } … Assign element to w Tag a space on to the end of the result string Tag the next word on to the end of the result string
Break-down of Example (4) … # this is the main program { print capitalise($0) } This is a comment in awk Call the capitalise function with the entire input record. Print the result.
Output from Example • Given the input file: • …our Capitalise function will output: In theory there is no difference between theory and practice, but in practice there is In Theory There Is No Difference Between Theory And Practice, But In Practice There Is
Summary • An introduction to awk • Using awk patterns and procedures on the command line • Writing awk scripts