300 likes | 494 Views
AWK. awk . text processing languge. awk. Created for Unix by Aho, Weinberger and Kernighan Basicly an interpreted text processing programming language Updated versions NAWK New awk GAWK Free S oftware Foundation’s version. awk Basics. Basic form:
E N D
awk text processing languge
awk • Created for Unix by Aho, Weinberger and Kernighan • Basicly an • interpreted • text processing • programming language • Updated versions • NAWK • New awk • GAWK • Free Software Foundation’s version
awk Basics • Basic form: • awk options 'selection criteria {action}' file(s) • Can use regular expressions • Files are read one line at a time with contents as fields • Fields are numbered ($1, $2, etc…) • Entire line is $0 • Can run standalone • Can run as a program • Uses a blank as the default separator
-f Option (stored awk programs) • awk programs can be stored in a file • awk –f awkfile datafile • -f filename is the awk program • datafile contains the data
Example • Find the TAs in the personnel file • The file is blank separated • -F defines the delimiter • Use \ to escape the blank (a blank after the \) • Note: the blank is the default seperator anyway • Title is in the 3rd field # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # #awk -F\ '$3 == "TA" { print }' personnel.data Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 #
example • To run an awk program • personnel.data has the data • findta.awk is the code • Looks for TA (3rdparm) • Prints first name and telephone number (1st and 5thparms) • Note: what small formatting problem is here? #awk -F\ -f findta.awkpersonnel.data TAs Jinyue704-687-2222 Hadi704-687-3333 Done # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # cat findta.awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done" }
print and printf • Output goes to std out • can be redirected with > or | • File name must be in quotes: • # print $2, $1 | "sort" • the output of the print goes to the sort routine • print is unformatted • printf allows formatting • %s – string • %-20s • 20 char spaces, justified (-) • %d – integer • %8d • set aside 8 spaces for the number • %f – floating point • %4.8f • Set aside 4 chars to the left of the decimal point and 8 to the right • printf needs \n to start new line
Number processing • AWK supports basic computation • + - addition • - - subtraction • * - multiplication • / - division • % - modulus • ^ - exponentiation • Also supports: • ++ - add one to itself (post and pre fix) • += - add and assign to self • -- - subtract one from self (post and pre fix) • -= - subtract from self • *= - multiply self • /= - divide self
Variables and Expressions • awk is loosely typed • do not need to declare variables • x = 5 • do not need $ to access like sed • print x • strings are double quoted • x = "This is a string" • no string concatenater, done by context • x = "string1"; y = "string2"print x y • Space is required • some conversions done automatically • x = "56"; y = 43; z = "abc"print x y # gives 5643 y converted to stringprint x + y # gives 99 + converts x to integerprint y + z # gives 43 + converts z to integer 0
Comparison and Logical Operators • awk supports string and numeric comparisons • == is the equality operator • = is for assignment • < and > can be used on strings • Beware of conversions when dealing with strings that consist of numbers • ~ is used for regular expressions • $2 ~ /[dh]og/ • parameter 2 matches hog or dog
Comparison and Logical Operators • awk supports boolean operations • && - and • || - or • ! - not
simple comparison • Field 6 is number of years with organization • Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.data Kombol, Tony:6 Flintstone, Fred:10 # # cat personnelyears.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 #
Regular Expression comparison example • Find the TAs and RAs including the URAs #awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.data Jinyue Xia 704-687-2222 Hadi Hashemi 704-687-3333 Fred Flintstone 704-687-1212 Barney Rubble 704-687-3344 # # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 Fred Flintstone RA 800123321 704-687-1212 Barney Rubble URA 800112233 704-687-3344
BEGIN and END Sections • BEGIN and END allows for some pre and post processing • Both are optional • General format: • BEGIN { action }{ action }END { action } • BEGIN's actions are done before the processing of the datafile begins • Good for headers, setup, etc. • END's actions are done after the processing of the datafile ends • Good for post processing, notes, etc.
another regular expression • This is a more complex check using a file for the awk program • Check to see the ID is 800…… • That is 800 followed by 6 characters # cat findbadid.awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^800....../ { print $1 " " $2 " has a bad id:" $4}; END { print "End of list"; } # # cat personnelbad.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 Bad Id LX 809123456 704-687-8890 0 # awk -f findbadid.awkpersonnelbad.data List of bad IDs follows Bad Id has a bad id:809123456 End of list
# cat ckgrades.awk BEGIN { print "Listing Bs\n" } $3 == "B" { print $0 } END { print "\nDone" } # awk file example # awk -F: -f ckgrades.awk grades.data Listing Bs Tara Boomdea: 85:B Zorbax Bottlewit:88:B Done # # cat grades.data Fred Ziffle:99:A Arnold Ziffle: 55: F Tara Boomdea: 85:B Neo:100:A Buffy Summers: 72:C Sheldon Cooper:67:D Zorbon Prentwist: 88 : B Zorbax Bottlewit:88:B Bad Grade: 33: A Note: ": B" does not get matched
Positional Parameters • Parameters are usually used as the fields of each line • A parameter can be passed to the awk program • Used with a shell program • Must be in quotes in the program • e.g. • Instead of • $4 > 12 • 4th parm in line is > 12 • $4 > '$2' • 4th parm in line is > 2nd parm passed to the program: • prog.awk 50 82
Arrays • awk supports arrays • arrays do not need to be "declared" • "declared" the minute they are used • Arrays are associative • index can be • numeric • alphabetic • thisday["Tue"] = "Tuesday";thisday[2] = "Tuesday"; • above are two array elements for the array thisday • each reference a separate string • printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ;printf("thisday[2] is %s", thisday[2]) ; • Both will print "Tuesday" for the array referenced
Arrays • ENVIRON[ ] • an assosciative array containing all the environmental variables #awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT=10.23.161.139 59365 22 HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION=10.23.161.139 59365 152.15.95.103 22 LANG=en_US.UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36: HISTCONTROL=ignoredups PATH=/usr/local/bin:/usr/bin:/bin:/usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #
Built-in Variables • awk has a set of built-in variables • Some can be overridden
Functions • awk has several built-in functions • () are optional if no parms • encouraged to use • Arithmetic functions • String functions
Arithmetic Functions • int(x) • sqrt(x)
String Functions • length() • length of complete line • length(x) • length of x • tolower(s) • returns s as lower case • toupper(s) • returns s as upper case • substr(str,m) • returns string starting at m to end of string • substr(str,m,n) • returns string starting at m for n characters • index(s1,s2) • finds the position of s2 inside s2 • split(str,arr,ch) • splits str int an array, the delimiter is ch • system("cmd") • exectutes a system (Linux) command and returns exit status
If • Syntax: • if (cond true) { statements} else { statements} • Notes: • else is optional • {} not needed for single statements
For • Syntax form 1: • for ( startval ; condition ; control) statement • C like in form • Example: • for ( k=1 ; k<9 ; k++ ) print k • Syntax form 2: • for ( var in array) statement • Will scan every var in the array • Great for associative array • Non numeric indices • Gaps in array • See ENVIRON example in previous slide
While • Syntax: • while (cond is true) { statement(s)}
continue and break • Continue and break can be used to stop all loops • for • while • break • stops the loop • continue • stops processing statements in this loop • continues to next iteration
Resources • Awk - A Tutorial and Introduction - by Bruce Barnett • http://www.grymoire.com/Unix/Awk.html • Awk Tutorial - Main Page • http://robert.wsi.edu.pl/awk/
Summary • awk is a "primative" scripting language • good for processing text files • filtering • perl is a more modern replacement • "religious war" over which is better • if you understand awk it will be a good basis to understant perl