400 likes | 568 Views
CIS 218 – Advanced UNIX. (g)awk. Overview. awk is a programming language Awk uses syntax based on grep and sed for handling numbers and text awk provides field level addressability. And within a field (word) using substring commands awk works field by field. awk command syntax.
E N D
CIS 218 – Advanced UNIX (g)awk
Overview • awk is a programming language • Awk uses syntax based on grep and sed for handling numbers and text • awk provides field level addressability. And within a field (word) using substring commands • awk works field by field
awk command syntax • There are two ways to execute an awk program/script: • awk [-F field-separator] ‘program’ target-file • awk [-F field-separator] -f program.file target • From our discussion of sed, and Refrigerator Rule No. 5, I would hope you are firmly committed to the second form!
awk Variables • There are a number of awk variables that are very useful • FS (The field separator, defaults to white space) • OFS (Output field separator, can be critical) • NR (Number of records, a sequential counter) • NF (Number of fields in the current record) • FILENAME (Name of the current target file)
awk Variables (cont.) • $0 (The entire line as read from the target file) • $n (Where n is the nth field in the record. This is how we get field level addressability in awk) • nawk, gawk, etc give us more variables, the most significant two are: • ARGC (the count of the command line arguments) • ARGV (an array of the command line arguments)
Parts of a program • All programs are composed of one or more of the following three constructs: • sequence (a series of instructions, one following the next, executed sequentially) • selection (the ability of the code to decide which instructions to execute, conditional execution) • iteration (adding looping so that selected code will be repeated over an over)
awk Program Format • Awk programs are composed of pattern {action} pairs (actions must be enclosed in French braces {} ) • a pattern without a corresponding action takes the default action, print $0 • an action without a corresponding pattern is applied to every line • each input line is submitted to every pattern/action pair
awk Program Format (cont.) • Placement of the open French brace is critical • pattern { both patterns are action 1 executed for lines action 2 matching the pattern } • pattern lines matching the pattern {action 1 are printed, and both action 2 actions are performed on } every line!
Patterns • In an awk program, the pattern is the selection tool that decides what actions are applied to which lines. • Patterns can be: • relational expressions • regular expressions • magic patterns
Regular Expression patterns • Must be enclosed in slashes /RE/ • Anchors apply to the entire line if they are used as the only pattern • Remember, you can use regular expressions in relational patterns with ~ and !~ to apply them to fields • Both true regular expressions and fixed patterns can be used as REs in awk
Pre/Post Processing • There are two in awk: • BEGIN {the action associated is performed before the target file is opened} • END {the action associated is performed after the target file is successfully closed} • Both are coded in UPPER CASE
# comments • Like most scripting languages # indicates a comment • awk scripts should be well documented • Comments should explain what you are doing and why.
print • The print command is the simplistic output tool for awk. Basically and “echo”/ • You can direct print to send its data to a file with the > operator • Generally print is used for simple output or debugging output
printf • Similar in concept to the “C” language command. The format of a printf command is: printf (“formatting string”,variables) • The formatting characters correspond to the variables one for one in both lists. • Each formatting character is prefixed by %
printf (cont.) • The formatting specifiers contain then following characters: • - indicates that the data should be left justifed • n indicates the minimum width of the field • .n indicates the maximum width of the field “%-5s” indicates a string field, left justified, of width 5 bytes
printf spacing characters • There are two characters available to change the spacing of your text: • \n inserts a newline character. You must use this if you want your output to occur on successive lines. • \t inserts a tab character
getline • getline is used to read from the keyboard • It can also capture the results of a command but this form is seldom used • Read from the keyboard using getline variable < “/dev/tty” • If you don’t supply a variable, awk will use $0, so in most cases you want to use a variable.
rand() srand() • The rand() function generates pseudo-random numbers in the range 0 - 1. • Given the same seed, it will always generate the same series of numbers. • srand() is used to supply a new seed to rand(). • If you don’t supply srand() a value, it uses the current time as the seed.
system() • The system() function allows you to execute system commands within an awk script. • You must enclose the system command in quotation marks. • You cannot capture the output from the system() function within the script but you can capture the return code.
length() • The length([argument]) function returns the length of the argument in bytes. • If you give length() a number, it will return the number of digits in the number. • If you don’t give length() an argument, it will use $0 by default.
index() • The index(string,target) function returns the position of the first occurrence of the target within the string. • The index() function is often used to set the boundary for the substr() function.
substr() • The substr(string,start[,length]) function will return the part of the string beginning with start and continuing for length bytes. • If you don’t give it a length, it will return all the bytes between the start and the end of the string.
split() • You will use split(string, array[, separator]) to divide a string into parts using separator to parse them, storing the resultant parts in the array. • If you don’t code a separator, the function will use the field separator to parse the string.
if • Besides using patterns, if gives us another way to perform selection • The format of an if statement is if (condition) {verb(s)} [else { verb(s)}] • If you have more than one verb, they must be enclosed in French braces.
if • A sample if
exit • The input file is closed • Control is transferred to the action associated with the END magic pattern if there is one • Generally used as a bailout in case of catastrophic errors
for loop • This is a counted loop • executes until the counter reaches the target value • Increment (count up) or decrement (count down) • also works with the elements of an array • multiple verbs must be enclosed in { }
while loop • The while loop is an example of conditional execution • The loop cycles as long as the condition specified is true • A while loop always checks to see if it should execute • multiple verbs must be enclosed in { }
do/while • Even though it has a while in it, this is an example of until logic. • Until logic is shunned by conscientious coders. • ‘nuff said
break • Used to exit from a loop • Control is passed to the line following the end of the loop • Causes an exit from the loop but NOT the awk script. If you want to bail out of the whole script, use the exit command.
continue • Causes awk to skip the rest of the body of the loop for the current value • In a for loop the counter is incremented, and the next cycle of the loop is started • In a while loop, the next iteration of the loop starts
next • Causes the script to start over • takes the next element from standard input or the target file • Like exit, this command effects the whole script