100 likes | 124 Views
A talk about AWK. Don Newcomb 18 Jan 2014. What is AWK?. AWK is an interpreted computer language It is primarily used for text processing and data formatting. Very handy for one-time data conversion scripts or command-line data manipulation. AWK History.
E N D
A talk about AWK Don Newcomb 18 Jan 2014
What is AWK? • AWK is an interpreted computer language • It is primarily used for text processing and data formatting. • Very handy for one-time data conversion scripts or command-line data manipulation
AWK History • Developed at Bell Labs in the ’70s and named for its authors Aho, Weinberger and Kernighan. Awk sounded better than Wak or Kaw. • Homonym with auk, the name of a bird, so is original part of the Unix menagerie. • Part of original Unix software suite for formatting telephone directories. • Many flavors, variations and extensions in 40 years. Gnu-AWK (GAWK) commonly used in Linux.
Basics: Fields, patterns & actions • AWK looks at a text file as a series of records (lines) made up of fields. Default field separator is white space and line separator is LF. • Checks each line for a matching the pattern. Blank pattern is always matched. • When a pattern is matched the defined action is taken. • Pattern precedes the action, which is enclosed in curly braces {}. ' pattern { action }' • Patterns can be regular expressions
Simple examples • awk < file ‘{print $3, $1}’ • echo 42 | awk '{ print ($1-32)*(5/9) }‘ • awk ' /^bozon/ { print $4 + $7 }' • awk < words '/^water/ {print $1}' • awk '/water$/ {print $1}' words • awk 'BEGIN{print "Hello, World"}'
Varables • Variables are defined when assigned. (e.g. x=0.3, name="Don") • Predefined variables include $1, $2, ,,,$0, NF, $NF, FS, RS • Arrays are associative tables: • french["cat"] = "chat" • french["dog"] = "chien"
Iteration anyone? • awk '{ tot=0; for (i=1; i<=NF; i++) tot += $i; print tot/NF; }' file • awk '{ if ( $1 == "parse") print $0}' words • {if (x % 2 == 0) print "x is even"; else print "x is odd"} • while, do-while, switch & break also exist
More on patterns • There are some special patterns, BEGIN and END. They match the record before the first record and after the last one. In other words, the BEGIN action is run at the start of the program and the END action is run after the EOF is encountered. • Two patterns separated by a coma includes all records from the the first match of pattern 1 to the first match of pattern 2. • Patterns can be expressions.
Example • awk < words '/^parse/,/^parsley/{print $0}' • awk < words '/parse/,/parsley/{print $0}' • awk < words ' $1 ~ /parse/ {print $0}' • awk < words ' $1 == "parse" {print $0}'
Trivia • Awk is most often used for one line, throw away scripts but some interesting things have been done using the language. Henry Spencer wrote "The Amazing Awk Assembler" (aaa) entirely in awk and sed. It converts assembly language for the old Motorola 68XX chips into machine code.