1 / 31

AWK

AWK. awk . text processing languge. awk. Created for Unix by Aho, Weinberger and Kernighan Basicaly an: interpreted text processing programming language Updated versions NAWK New awk GAWK Free S oftware Foundation’s version. awk Basics. Basic form:

kershner
Download Presentation

AWK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AWK

  2. awk text processing languge

  3. awk • Created for Unix by Aho, Weinberger and Kernighan • Basicaly an: • interpreted • text processing • programming language • Updated versions • NAWK • New awk • GAWK • Free Software Foundation’s version

  4. awk Basics • Basic form: • awk options 'selection criteria {action}' file(s) • Can use regular expressions • Files read one line at a time with contents as fields • Fields are numbered ($1, $2, etc…) • Entire line is $0 • Can run standalone • Can run as a program • Uses a blank as the default separator

  5. -f Option (stored awk programs) • awk programs can be stored in a file • awk –f awkfile datafile • -f filename is the awk program • datafile contains the data

  6. Example • Find the TAs in the personnel file • The file is blank separated • -F defines the delimiter • Use “\ “ to escape the blank (a blank after the \) • Note: the blank is the default seperator anyway • Title is in the 3rd field # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # # awk -F\ '$3 == "TA" { print }' personnel.data Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 #

  7. example • To run an awk program • personnel.data has the data • findta.awk is the code • Looks for TA (3rd parm) • Prints first name and telephone number (1st and 5th parms) • Note: what small formatting problem is here? # awk -F\ -f findta.awkpersonnel.data TAs Jinyue704-687-2222 Hadi704-687-3333 Done # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 # cat findta.awk BEGIN { print "TAs"; } $3 == "TA" {print $1 $5} END { print "Done" }

  8. print and printf • Output goes to std out • can be redirected with > or | • redirected name must be in quotes: • # print $2, $1 | "sort" • the output of the print goes to the sort routine • print is unformatted • printf allows formatting • %s – string • %-20s • 20 char spaces, justified (-) • %d – integer • %8d • set aside 8 spaces for the number • %f – floating point • %4.8f • Set aside 4 chars to the left of the decimal point and 8 to the right • printf needs \n to start new line

  9. Number processing • AWK supports basic computation • + - addition • - - subtraction • * - multiplication • / - division • % - modulus • ^ - exponentiation • Also supports: • ++ - add one to itself (post and pre fix) • += - add and assign to self • -- - subtract one from self (post and pre fix) • -= - subtract from self • *= - multiply self • /= - divide self

  10. Variables and Expressions • awk is loosely typed • do not need to declare variables • x = 5 • do not need $ to use variables like sed or bash • print x • strings are double quoted • x = "This is a string" • no string concatenater, done by context • x = "string1"; y = "string2"print x y • Space is required • some conversions done automatically • x = "56"; y = 43; z = "abc"print x y # gives 5643 y converted to stringprint x + y # gives 99 + converts x to integerprint y + z # gives 43 + converts z to integer 0

  11. Comparison and Logical Operators • awk supports string and numeric comparisons • == is the equality operator • = is for assignment • < and > can be used on strings • Beware of conversions when dealing with strings that consist of numbers • ~ is used for regular expressions • $2 ~ /[dh]og/ • parameter 2 matches hog or dog

  12. Comparison and Logical Operators • awk supports boolean operations • && - and • || - or • ! - not

  13. simple comparison • Field 6 is number of years with organization • Find those with more than 5 years # awk '$6 > 5 { print $2 ", " $1 ":" $6}' personnelyears.data Kombol, Tony:6 Flintstone, Fred:10 # # cat personnelyears.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 #

  14. Regular Expression comparison example • Find the TAs and RAs including the URAs # awk '$3 ~ /[RT]A/ {print $1 " " $2 " " $5}' personnel.data Jinyue Xia 704-687-2222 Hadi Hashemi 704-687-3333 Fred Flintstone 704-687-1212 Barney Rubble 704-687-3344 # # cat personnel.data Tony Kombol Lecturer 800111222 704-687-1111 Jinyue Xia TA 800111333 704-687-2222 Hadi Hashemi TA 800111444 704-687-3333 Fred Flintstone RA 800123321 704-687-1212 Barney Rubble URA 800112233 704-687-3344

  15. BEGIN and END Sections • BEGIN and END • Allows for some pre and post processing • Both are optional • General format: • BEGIN { action }{ action }END { action } • BEGIN's actions are done before the processing of the datafile begins • Good for headers, setup, etc. • END's actions are done after the processing of the datafile ends • Good for post processing, notes, etc.

  16. another regular expression • This is a more complex check using a file for the awk program • Check to see the ID is 800…… • That is 800 followed by 6 characters # cat findbadid.awk BEGIN { print "List of bad IDs follows"; } $4 !~ /^800....../ { print $1 " " $2 " has a bad id:" $4}; END { print "End of list"; } # # cat personnelbad.data Tony Kombol Lecturer 800111222 704-687-1111 6 Jinyue Xia TA 800111333 704-687-2222 3 Hadi Hashemi TA 800111444 704-687-3333 1 Fred Flintstone RA 800123321 704-687-1212 10 Barney Rubble URA 800112233 704-687-3344 4 Bad Id LX 809123456 704-687-8890 0 # awk -f findbadid.awk personnelbad.data List of bad IDs follows Bad Id has a bad id:809123456 End of list

  17. # cat ckgrades.awk BEGIN { print "Listing Bs\n" } $3 == "B" { print $0 } END { print "\nDone" } # awk file example # awk -F: -f ckgrades.awk grades.data Listing Bs Tara Boomdea: 85:B Zorbax Bottlewit:88:B Done # # cat grades.data Fred Ziffle:99:A Arnold Ziffle: 55: F Tara Boomdea: 85:B Neo:100:A Buffy Summers: 72:C Sheldon Cooper:67:D Zorbon Prentwist: 88 : B Zorbax Bottlewit:88:B Bad Grade: 33: A Note: ": B" does not get matched

  18. Positional Parameters • Parameters are usually used as the fields of each line • A parameter can be passed to the awk program • Used with a shell program • Must be in quotes in the program • e.g. • Instead of • $4 > 12 • 4th parm in line is > 12 • $4 > '$2' • 4th parm in line is > 2nd parm passed to the program: • prog.awk 50 82

  19. Arrays • awk supports arrays • arrays do not need to be "declared" • "declared" the minute they are used • Arrays are associative • index can be • numeric • alphabetic • thisday["Tue"] = "Tuesday";thisday[2] = "Tuesday"; • above are two array elements for the array thisday • each reference a separate string • printf("thisday[\"Tue\"] is %s", thisday["Tue"]) ;printf("thisday[2] is %s", thisday[2]) ; • Both will print "Tuesday" for the array referenced

  20. Arrays • ENVIRON[ ] • an assosciative array containing all the environmental variables # awk 'BEGIN{for (env in ENVIRON)print env "=" ENVIRON[env]}' SSH_CLIENT=10.23.161.139 59365 22 HOME=/home/tkombol TERM=xterm LESSOPEN=| /usr/bin/lesspipe %s SHELL=/bin/bash USER=tkombol _=/usr/bin/awk SHLVL=1 PWD=/home/tkombol SSH_CONNECTION=10.23.161.139 59365 152.15.95.103 22 LANG=en_US.UTF-8 MAIL=/var/mail/tkombol LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36: HISTCONTROL=ignoredups PATH=/usr/local/bin:/usr/bin:/bin:/usr/games LESSCLOSE=/usr/bin/lesspipe %s %s LOGNAME=tkombol SSH_TTY=/dev/pts/2 #

  21. Built-in Variables • awk has a set of built-in variables • Some can be overridden

  22. Functions • awk has several built-in functions • () are optional if no parms • encouraged to use • Arithmetic functions • String functions

  23. Arithmetic Functions • int(x) • sqrt(x)

  24. String Functions • length() • length of complete line • length(x) • length of x • tolower(s) • returns s as lower case • toupper(s) • returns s as upper case • substr(str,m) • returns string starting at m to end of string • substr(str,m,n) • returns string starting at m for n characters • index(s1,s2) • finds the position of s2 inside s2 • split(str,arr,ch) • splits str int an array, the delimiter is ch • system("cmd") • exectutes a system (Linux) command and returns exit status

  25. if • Syntax: • if (cond true) { statements} else { statements} • Notes: • else is optional • {} not needed for single statements

  26. for • Syntax form 1: • for ( startval ; condition ; control ) statement • C like in form • Example: • for ( k=1 ; k<9 ; k++ ) print k • Syntax form 2: • for ( var in array ) statement • Will scan every var in the array • Great for associative array • Non numeric indices • Gaps in array • See ENVIRON example in previous slide

  27. While • Syntax: • while (cond is true) { statement(s)}

  28. continue and break • Continue and break can be used to stop all loops • for • while • break • stops the loop • continue • stops processing statements in this loop • continues to next iteration

  29. Resources • Awk - A Tutorial and Introduction - by Bruce Barnett • http://www.grymoire.com/Unix/Awk.html • Awk Tutorial - Main Page • http://robert.wsi.edu.pl/awk/

  30. Which is not a “scripting language: • Auk • Awk • Perl • Pearl • Bash • Bam

  31. Summary • awk is a "primative" scripting language • good for processing text files • filtering • perl is a more modern replacement • "religious war" over which is better • if you understand awk it will be a good basis to understant perl

More Related