280 likes | 426 Views
Unix and Software Tools (P51UST) Awk Programming. Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham Ningbo, China. What is awk?. A pattern matching program for processing texts, initially implemented in 1977.
E N D
Unix and Software Tools (P51UST) Awk Programming Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham Ningbo, China P51UST: Unix and Software Tools
What is awk? • A pattern matching program for processing texts, initially implemented in 1977. • The name AWK is derived from family names of its authors: Alfred Aho, Peter Weinberger and Brain Kerinighan • There are different versions of awk: • awk - the original version, sometimes called old awk, or oawk • New awk - additional features added in 1985. Often called nawk • GNU awk (gawk)- has even more features
What does awk do? • A text file is thought of as being made up of records and fields • On this file you can: • Do arithmetic and string operations • Use loops and conditionals (if-then-else) • Produce formatted reports
What does awk do? (2) • awk (new awk) also allows you to: • Execute UNIX commands from within a script • Process the output from UNIX commands • Work with multiple input streams • Define functions
What does awk do? (3) • awk can also be combined with shell scripting! • Shell is very easy and quick to write, but it lacks functionality. • awk and shell are designed to be integrated • Simply invoke the awk interpreter from within the shell script, rather than from the command line!
Awk Syntax The awk command has the following syntax awk [-F field_sep] ‘program’ target-files(s) Or awk[-F field_sep]–f program.file target-file(s) P51UST: Unix and Software Tools
Awk Syntax awk [-F field_sep] ‘program’ target-files(s) • The program is one or more awk programming commands, typed in at command-line, enclosed by single quotes. • target-files is one or more of the input files the command is to process. • Option –F : allows you to change awk’s field separator. Default field separator is white space (one or more spaces or tabs). P51UST: Unix and Software Tools
Awk Syntax awk[-F field_sep]–f program.file target-file(s) • Option –f specifies that the filename that follows contains the awk programming commands. • Awk takes its instructions from that file rather than from the command line. • target-files is one or more of the input files the command is to process. • Using –f option is preferred • More efficient to debug, modify and enhance your awk programming commands. • You could use this awk script again in future • Easier to manage if the program grown over time. P51UST: Unix and Software Tools
Example – First Taste of awk • When type in the following command in a terminal $ cat fingr.txt zliyccj Cai Jiangliang pts/4 1d Apr 8 10:51 zliyccj Cai Jiangliang *pts/17 15:12 Apr 8 19:11 zliyccj2 Chen Jianjun *pts/6 13:48 Apr 8 21:29 zliyccj2 Chen Jianjun pts/30 14:47 Apr 8 20:31 zliyccl Cao Lizhou pts/98 20:09 Apr 8 15:08 zliychj2 He Jiansen pts/19 17:04 Apr 8 18:14 zliychl Huang Lun *pts/26 14:23 Apr 8 18:04 ... • Task: extract the username field? P51UST: Unix and Software Tools
Example – First Taste of awk • Field separator: white space and tabs, default • Target-file: fingr.txt • Commands: awk ‘{print $1}’ fingr.txt Or awk -F “ ” '{print $1}' fingr.txt P51UST: Unix and Software Tools
Awk Variables • Some common awk variables P51UST: Unix and Software Tools
Awk Program Format • General Format pattern {action} • A pattern selects which of the lines from the file or files, if any, are acted upon. • Patterns can be relational expressions or regular expressions, or others. • An action is defined as one or more awk commands enclosed in a pair of curly brackes ‘{ }’. P51UST: Unix and Software Tools
Awk Program Format Program 1 Program 2 • The action associated with a particular pattern must begin on the same line as the pattern with which it is associated. • Program 1 and Program 2 performs very differently. pattern { action 1 action 2 } pattern { action 1 action 2 } P51UST: Unix and Software Tools
Awk Program Format • Patterns without associated actions will print the lines that are matched, while actions without patterns will be applied on every line. • For program 1, the actions will be performed only on lines that match the pattern. • In program 2, each line that matches the pattern would be displayed and the actions would be performed on every line. pattern { action 1 action 2 } pattern { action 1 action 2 } P51UST: Unix and Software Tools
Patterns (1) • Relational expressions $1 < 4 {action} $1 <= 4 {action} $1 == “bac” {action} ($1 < $2 && $3 > $1) || $4 != “abc” {action} • The tilde (~) $1 ~ /^z/ {action} Note: Regular expression must be enclosed in forward slashes ‘/’ P51UST: Unix and Software Tools
Example (($1 < $2 && $3 > $1) && $4 !=“frog”) || $6 >0 Data: 1 4 0 toad frog 0 3 7 9 frog salamander 0 3 7 9 fish salamander 0 0 0 1 cricket fish 9 5 1 3 toad spider -4 0 0 0 wasp bee 1 Which of these lines do you predict will match? P51UST: Unix and Software Tools
Patterns (2) • Regular expressions • You should know this already • A regular expression should be enclosed in forward slashes awk ‘/6$/ {print $0}’ fingr.txt P51UST: Unix and Software Tools
Patterns (2) • Regular expressions • You should know this already • A regular expression should be enclosed in forward slashes awk ‘/6$/ {print $0}’ fingr.txt P51UST: Unix and Software Tools
Pattern (3) • Special Patterns: BEGIN, END BEGIN: • Is the first pattern in an awk script • The associated actions will be performed before the target file is opened. • The open curly bracket must appear on the same line as the BEGIN • A good place to print out headers, initilise counters, and set the field separator for the target file. • Do not try to manipulate target files in the associated actions. P51UST: Unix and Software Tools
Pattern -BEGIN BEGIN { FS=“:” OFS=“ ” count=0 print (“This is a cool heading”) } P51UST: Unix and Software Tools
Pattern (4) • Special Patterns: END • Is the last pattern in an awk script • The associated actions will be performed after the target file is successfully closed. • The open curly bracket must appear on the same line as the END • Useful for putting trailers on files, summing up the data presented, and doing other end-of-file housekeeping tasks. P51UST: Unix and Software Tools
Patterns – Putting Things Together BEGIN { print ("Here come the lines from the input file:") print " " } {print $0} END { print " " print "There were "NR" lines in the file "FILENAME"." } P51UST: Unix and Software Tools
Examples BEGIN { FS="\n" RS="" #empty line as a record separator } { print $0 } END{ print "total number of fields is " NF } P51UST: Unix and Software Tools
Next Lecture • Awk commands • Loops and conditionals • Arrays • Functions P51UST: Unix and Software Tools