170 likes | 299 Views
Unix and Software Tools (P51UST) Awk Programming (3). Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham Ningbo, China. Contents. Missing things Reading input from a pipeline More string functions System variables Forcing variable types
E N D
Unix and Software Tools (P51UST) Awk Programming (3) Ruibin Bai (Room AB326) Division of Computer Science The University of Nottingham Ningbo, China P51UST: Unix and Software Tools
Contents • Missing things • Reading input from a pipeline • More string functions • System variables • Forcing variable types • Arithmetic functions P51DBS: Database Systems
Reading input from a pipe • The UNIX “who am i” command will give the following type of output: • This output can be piped to getline: • “who am i” | getline • Here, $0 will be set to the output of the command, the line will be parsed into fields such that “zlizrb1” will be put in field $1, “pts/32” will be put into $2, etc. The system variable NF will be set $ who am i zlizrb1 pts/21 Apr 17 15:15 (pcname)
Reading Input From a Pipe This script pipes the result of the “who am i” command to getline which parses it into fields. The variable “name” is assigned to field number 1 and the Field Separator is assigned to “:” $ awk ′ BEGIN { “who am i” | getline name = $1 FS = “:” } name ~ $1 {print $5} ′ /etc/passwd The script then tests to see whether the first field ($1) in /etc/passwd is the same as that stored in name (the fields in /etc/passwd are separated by a “:”) If so, the 5th field of /etc/passwd is printed (which contains the corresponding user’s full name)
Some Important Limitations • There is a limit to the number of pipes and files that the system can have open at any one time • This limit varies from system to system • In most implementations of awk, up to 10 open files is allowed. • Use the close() function! • Some other limits are: • Number of fields per record • Characters per input record • Characters per field • See the awk manual page for more information
Using close() with Pipes and Files • Why use close()? • So your program can open as many pipes and files as it needs without exceeding the system limit • It allows your program to run the same command twice • You may need close() to force an output pipe to finish its work • { do something | “sort > myFile” } • END { • close(“sort > myFile”) • while ((getline < “myFile”) > 0) • { do more stuff }
Advanced String Functions (1) • gsub(regex,s,str) • Globally substitutes s for each match of the regular expression regex in the string str. Returns the number of substitutions. If a string str is not supplied, it will use $0 P51UST: Unix and SoftwareTools
Advanced String Functions (2) • asort(src[,d]) • Supported in gawk • The function sorts the array src based on the element values. • If d is specified, the function will make a copy of src into d and then d is sorted. • Also replaces the indices with values from 1 to the number of elements in the array. P51UST: Unix and SoftwareTools
Advanced String Functions (3) • asorti(src[,d]) • Supported in gawk • Like sort(), but the sorting is done based on indeces in the array, not based on the element values. • The value of original indeces will be stored in the array. P51UST: Unix and SoftwareTools
asort vs asorti BEGIN{ arr[1]="a“; arr[2]="d" arr[4]="f“; arr[8]="c " asort(arr,arrcpy1) asorti(arr,arrcpy2) print "Original array" for(i in arr) print "arr["i"]=" arr[i] print "Array after using asort" for(i in arrcpy1){ print "arr["i"]=" arrcpy1[i] } print "Array after using asorti" for(i in arrcpy2){ print "arr["i"]=" arrcpy2[i] } } $ awk -f asorti.awk Original array arr[4]=f arr[8]=c arr[1]=a arr[2]=d Array after using asort arr[4]=f arr[1]=a arr[2]=c arr[3]=d Array after using asorti arr[4]=8 arr[1]=1 arr[2]=2 arr[3]=4 P51UST: Unix and SoftwareTools
System Variables that are Arrays • There are two system variables that are arrays in Gawk: • ARGV • An array containing the command line arguments given to awk. • The number of elements is stored in another variable called ARGC (not an array) • The array is indexed from 0 (unlike other arrays in awk) • The last element is therefore ARGC-1 • E.g. ARGV[ARGC-1], ARGV[2] • The first element is the name of the command that invoked the script P51UST: Unix and SoftwareTools
System Variables that are Arrays (2) • ENVIRON • An array containing environment variables • Each element is the value of the current environment • The index of each element is the name of the environment variable • E.g. ENVIRON[“PATH”], ENVIRON[“SHELL”] P51UST: Unix and SoftwareTools
ARGV Example BEGIN { for (x=0; x<ARGC; x++) print ARGV[x] print ARGC } Output $ awk -f parameters.awk 2007 G51UST “Gail Hopkins” students=80 - awk 2007 G51UST Gail Hopkins Students=80 - 6 P51UST: Unix and SoftwareTools
Use of Backslash • Backslash can be used: • To continue strings across new lines marian$ awk ‘BEGIN {print ″hello, \ > world″ }’ output hello, world P51UST: Unix and SoftwareTools
Forcing Variable Types • In awk, you do not declare variables and give them types • Sometimes you want to force awk to treat a variable as a particular type, e.g. as a number or as a string. • To force a variable, x, to be treated as a number, put in the line: • x=x+0 • To force a variable, x, to be treated as a string, put in the line: • x=x “” P51UST: Unix and SoftwareTools
Built in Arithmetic Functions • awk has a number of arithmetic functions that are built in. Some are shown below: • exp(x) • int(x) • sqrt(x) • cos(x) Returns e to the power x Returns a truncated value of x Returns the square root of x Returns the cosine of x P51UST: Unix and SoftwareTools
A Summary of Awk Functions P51UST: Unix and SoftwareTools