1 / 16

Introduction to Awk

Introduction to Awk. Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks. Awk. Works well on record-type data Reads input file(s) a line at a time Parses each line into fields

garin
Download Presentation

Introduction to Awk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.

  2. Awk Works well on record-type data Reads input file(s) a line at a time Parses each line into fields Performs user-defined tests against each line, performs actions on matches

  3. Other Common Uses • Input validation • Every record have same # of fields? • Do values make sense (negative time, hourly wage > $1000, etc.)? • Filtering out certain fields • Searches • Who got a zero on lab 3? • Who got the highest grade? • Many others

  4. Invocation • Can write little one-liners on the command line (very handy): • print the 3rd field of every line: $ awk '{ print $3 }' input.txt • Execute an awk script file: $ awk –f script.awk input.txt • Or, use this sha-bang as the first line, and give your script execute permissions: #!/bin/awk -f

  5. Form of an AWK program • AWK programs are entries of the form: pattern { action } • pattern – some test, looking for a pattern (regular expressions) or C-like conditions • if null, actions are applies to every line • action – a statement or set of statements • if not provided, the default action is to print the entire line, much like grep

  6. Form of an AWK program • Input files are parsed, a record (line) at a time • Each line is checked against each pattern, in order • There are 2 special patterns: • BEGIN – true before any records are read • END – true at end of input (after all records have been read)

  7. Awk Features • Patterns can be regular expressions or C like conditions. • Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed. • Input lines are parsed and split into fields, which are accessed by $1,…,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)

  8. Variables • Not declared, nor typed • No character type • Only strings and floats (support for ints) • $n refers to the nth field (where n is some integer value) # prints each field on the line for( i=1; i<=NF; ++i ) print $i

  9. Some Built-in Variables FS – the input field separator OFS – the output field separator NF – # of fields; changes w/each record NR – the # of records read (so far). So, the current record # FNR – the # of records read so far, reset for each named file $0 – the entire input line

  10. Example Print pay for those employees who actually worked $ awk ‘$3>0 {print $1, $2*$3}’ emp.data Kathy 40 Mark 100 Mary 121 Susie 76.5 $ cat emp.data Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18

  11. Example – CSV file $ cat students.csv smith,john,js12 jones,fred,fj84 bee,sue,sb23 fife,ralph,rf86 james,jim,jj22 cook,nancy,nc54 banana,anna,ab67 russ,sam,sr77 loeb,lisa,guitarHottie $ cat getEmails.awk #!/bin/awk -f BEGIN { FS = "," } { printf( "%s's email is: %s@school.edu\n", $2, $3 ); } $ getEmails.awk students.csv john's email is: js12@school.edu fred's email is: fj84@school.edu sue's email is: sb23@school.edu ralph's email is: rf86@school.edu jim's email is: jj22@school.edu nancy's email is: nc54@school.edu anna's email is: ab67@school.edu sam's email is: sr77@school.edu lisa's email is: guitarHottie@school.edu

  12. Example – output separator $ cat out.awk #!/bin/awk -f BEGIN { FS = ","; OFS = "-*-"; } { print $1, $2, $3; } $ out.awk students.csv smith-*-john-*-js12 jones-*-fred-*-fj84 bee-*-sue-*-sb23 fife-*-ralph-*-rf86 james-*-jim-*-jj22 cook-*-nancy-*-nc54 banana-*-anna-*-ab67 russ-*-sam-*-sr77 loeb-*-lisa-*-guitarHottie

  13. Flow Control • Awk syntax is much like C • Same loops, if statements, etc. • AWK: Aho, Weinberger, Kernighan • Kernighan and Ritchie wrote the C language

  14. Associative Arrays • Awk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables. • Total[“Sue”] = 100; • It is possible to loop over all indices that have currently been assigned values. for (name in Total) print name, Total[name];

  15. Example using Associative Arrays $ cat scores Fred 90 Sue 100 Fred 85 Sam 70 Sue 98 Sam 50 Fred 70 $ cat total.awk { Total[$1] += $2} END { for (i in Total) print i, Total[i]; } $ awk -f total.awk scores Sue 198 Sam 120 Fred 245

  16. Useful one-liners • Line count: awk 'END {print NR}' • grep awk '/pat/' • head awk 'NR<=10' • Add line #s to a file awk '{print NR, $0}' awk '{ printf( "%5d %s", NR, $0 )}' • Many more. See the resources tab on the course webpage for links to more examples.

More Related