Mastering AWK: An Essential Guide for Unix Users

Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf

Introduction • Students' grades in a text file • John 22 56 38 70 85 80 • Alex 90 89 79 98 35 • How can I calculate John's current average within this file • GREP? • Search for John with grep? Gives me the line. • Now I can use my calculator to figure it out. • SED? • sed will allow me to print, change, delete, etc. • I really want to automatically manipulate the values within this line. • This is where awk comes in. • (awk me amadeus)

awk • The first initials from the last names of each of the authors, Aho, Weinberg and Kernighan • Which awk are we tawking about? • awk • nawk – new awk ( on CS machines ) • gawk – GNU awk ( bart )

AWK syntax • awk ‘/pattern/’ file • awk ‘{action}’ file • awk ‘/pattern/ {action;}' file • cat file | awk ‘{action}’ Awk automatically reads in the file for you line by line. • No need to open/close file. (like in C or Java) • pattern section FINDS LINES with that pattern • action section does the actions you defined on the lines it found • The original file does not change.

Simple example • awk ‘{ print }’ fruit_prices • Note: Here the pattern is missing, in this case, the awk command print is used to print each line it read

Simple example awk ‘ /\$[0-9]*\.[0-9][0-9]*/ { print} ‘ fruit_prices

Action • Actions are specified by the programmers not just print, delete, etc (p/d/s from sed). That is why it is so awesome! • Actions consists of • variable assignments, • arithmetic and logic operators, • decision structures, • looping structures. • For example, print, if, while and for • awk ‘{print}’ filename

Execution types • format 1: awk ‘script’ • where INPUT must come from pipe or STDIN • command | awk ‘script’ • format 2: awk ‘script’ input1 input2 ... inputn • where we supply input FILES as input1, input2, etc. • format 3: awk -f script_file input1... • (# in "script..." is comment)

Pattern • Types • Regular expressions • BEGIN • Do all the stuff BEFORE reading any input • END • does all this stuff AFTER reading ALL input. • Pattern is optional • If no pattern is specified, the "action" will occur for EVERY LINE one @ time. • awk ‘{Action}’ filename • awk '{print;}' names prints all lines • awk ‘BEGIN {print “The average grades”}’

Awk Regular Expression Metacharacters • Supports • ^, $, ., *, +, ?, [ABC], [^ABC], • [A-Z], A|B, (AB)+, \, & • Not support • Backreferencing,  • Repetition, \{ \}

awk ‘ BEGIN { actions ; } /pattern/ { actions ; } /pattern/ { actions ; } END { actions ;} ‘ files Execution steps: • If a BEGIN pattern is present, executes its actions • Reads an input line and parses it into fields • Compares each of the specified patterns against the input line, if find a match, executes the actions. This step is repeated for all patterns. • Repeats steps 2 and 3 while input lines are present • After the script reads all the input lines, if the END pattern is present, executes its actions

Try This! • Place the following in the file tryawk1.awk BEGIN { print "Starting to read input"; nLines = 0; } /^.*$/ { nLines++; } END { print “DONE: Total lines = “ nLines; } • Run the command: cat tryawk1.awk | awk –f tryawk1.awk • Counts the # of lines in the input • nLines is a variable … note NO declaration, just use • print command prints a line of text, adds newline to end of the line

Records and fields • awk has RECORDS (lines) and FIELDS • $0 represents the entire line of input • $1 represents the first field • Print just like echo • Print $1 $2 # $1 concat $2 • Print $1, $2 # $1 OFS $2 • cat fruit_prices • awk '{print;}' fruit_prices #prints all lines • awk '{print $0;}' fruit_prices #prints each entire line • awk '{print $1;}' fruit_prices #prints first field in each line • awk '{print $2;}' fruit_prices #prints second field in each line

Examples cat phones.data John Robinson 234-3456 Yin Pan 123-4567 awk ‘{ print $1, $2, $3 }’ phones.data John Robinson 234-3456 Yin Pan 123-4567 awk ‘{ print $2 “, ”, $1, $3 }’ phones.data Robinson, John 234-3456 Pan, Yin 123-4567 awk ‘/^$/ { print x += 1 }’ phones.data awk ‘/Mary/ { print $0 }’ phones.data

Examples (con’t) • ls -l | awk ‘ $6 == "Oct" { sum += $5 ; } END { print sum ; } ‘ • ls -l | awk -f block_use.awk cat block_use.awk $6 == "Oct" { sum += $5 ; } END { print sum ; }

Taking Pattern-specific Actions #!/bin/sh awk ‘ /\$[1-9][0-9]*\.[0-9][0-9]*/ { print $0,”*”;} /\$0\.[0-9][0-9]*/ { print ;} ‘ fruit_prices

Intrinsic variables • awk defines RECORDS (lines) and FIELDS • FS, input field separator (default=space/tab) • OFS, output field separator (default=space) • ORS, Output record separator (default=newline) • RS, Input record separator (default=newline) • NR, number of the current record being processed • NF, number of fields within current record • FILENAME, awk sets this pattern to the name of the file that it's currently reading. (If you have more than input file, awk resets this pattern as it reads each file in turn.

How does awk work • awk ‘{print $1, $3}’ names • Put a line of input to $0 based on RS • The line is broken into fields based on FS and store them in a numbered variable, starting with $1 • Prints the fields with print or others based on OFS to separate fields • After awk displays it output, it goes to next line and repeat. The output lines are separated by ORS.

Changing the Input Field Separator • Manually resetting FS in a BEGIN pattern • Forces you to hard code the value of the field separator • BEGIN{FS=“:” ; } • Example: • $ awk ‘BEGIN { FS=“:” ; } { print $1, $6 ; }’ /etc/passwd • Specifying the –F option to awk • awk –F: ‘ { … } ’ • Enables using a shell variable to specify the field separator dynamically • Example: • sep=‘:’ • $ awk –F$sep ‘ { print $1, $6 ; }’ /etc/passwd

Example • FirstName;LastName;Address;City;State;Zip;Phone • SSN:DOB:NumberOfDependents • HospitilizationCOde,DentalCode,LifeCOde • Convert this file format to: • SSN,LastName,FirstName,Address,….

awk ‘BEGIN{OFS=“,”; FS=“;”} {NR%3==1 {FS=“;”; #prepare F=$1; L=$2; A=$3;…..} NR%3==2 {FS=“:”; SSN=$1;DOB=$2;…} NR%3==0{FS=“,”;…;print F L A…} }’ filename

Print vs. Printf.2 • printf • 1st argument is a string … the ‘format’ • Prints each character of the format • Upon reaching a %, the next few characters are a format specifier • The next argument is printed according to the specifier • Does not append a newline • More control over appearance of output • Consider awk 'BEGIN { printf "%5.2f\n", 2/3; }' • Prints 0.67 (here, the  represents a space) • %5.2f means print a fractional number (the ‘f’) in a field 5 characters wide, with 2 digits to the right of the decimal point.

Why Printf • printf - for formatting output of your “print” • We have function print, why printf • Printf allows us to FORMAT stuff. • can FORCE printing of string • Decimals • whole numbers • how many digits fall on either side of decimal pt • scientific notation • make things line up nicely

printf • printf (format, what to print) • printf ( "%s", x) • %s is a PLACEHOLDER for some OUTPUT. • s is a specific type of output (string) • ONE item (%s), must have ONE thing to print in the "what to print“ • format inside of quotes, followed by comma, followed by variables outside the quotes to print. • printf ( " s = %s ", x ) • "s=" is a LITERAL string

Printf format • s = A character string • f = A floating point number • d or i= the integer part of a decimal number • g or e = scientific notation of a floating point • c = An ASCII character • if x=65 and I use this print statement • printf ( " s = %c ", x ) • output is "s = A“ • awk 'BEGIN{x=65; printf("char: %c\n", x)}'

Printf • More control: • %wd • Print an integer out in a field of width w • If the number is smaller than w characters, print leading spaces • Try awk 'BEGIN { printf "%10d\n", 10; }' /dev/null • Try to add a ‘-’ immediately after the % • Left justifies the value in the field

Printf • %ws • Print a string out in a field of width w • Supply leading spaces as necessary • Place a ‘-’ immediately after the % to get left justification

Printf • %w.df • Prints the value out in a field of width w • Places the decimal point d places from the right end • Place a ‘-’ immediately after the % to get left justification

Printf examples • Apple 10 20 25 • <---10----><-5-><-5-><-5-> • awk ‘{printf (" %10s %5d %5d %d ", $1, $2, $3, $4 )}’ file • awk ‘{printf (" %-10s %5d %5d %d ", $1, $2, $3, $4 )}’ file • minus sign designates that this field will be LEFT JUSTIFIED • awk ‘{printf (" %-10s %-5d %-5d %d ", $1, $2, $3, $4 )}’ file • awk ‘{printf (“|%-15s|\n”, $1)}’

Printf examples • Let’s put an average in there... • printf (" %-10s %-5d %-5d %-5d %f ", $1, $2, $3, $4, average ) • Will provide RAW number ( as many decimals as the calculation provides with 6 char’s to RIGHT of decimal) • printf (" %-10s %-5d %-5d %-5d %.2f ", $1, $2, $3, $4, average ) • %.2f says use TWO char's to RIGHT of decimal • printf doesn't provide the newline automatically.... • printf (" %-10s %-5d %-5d %-5d %.2f \n ", $1, $2, $3, $4, average )

The OFMT variable(stands for Output Formatting for numbers) • A special awk variable • Control the printing of numbers when using print function • awk ‘BEGIN{print 1.243434534;}’ • awk ‘BEGIN{OFMT=“%.2f”; print 1.23344455;}’

Mastering AWK: An Essential Guide for Unix Users

Mastering AWK: An Essential Guide for Unix Users

Presentation Transcript

UNIX Talk 2007

talk, talk ,talk

Unix Talk #2

Unix Talk #2

Unix Talk #2

Unix Talk #2

Unix Talk #2

Unix Talk #2