90 likes | 218 Views
CSC 4630. Meeting 12 Exam Reprise. Regular Expressions. In awk distinguish among /[a-dw-z]/ /^[a-dw-z]/ /[^a-dw-z]/ /^[a-dw-z]$/ !/[a-dw-z]/. Variables in Scripts. Built-in vs. user defined Initialization conventions Values vs. names Types and coercion. awk Variables. Compare NF
E N D
CSC 4630 Meeting 12 Exam Reprise
Regular Expressions In awk distinguish among • /[a-dw-z]/ • /^[a-dw-z]/ • /[^a-dw-z]/ • /^[a-dw-z]$/ • !/[a-dw-z]/
Variables in Scripts • Built-in vs. user defined • Initialization conventions • Values vs. names • Types and coercion
awk Variables • Compare • NF • $NF • NR • $NR
awk Variables (2) Example: Implement wc for numnames {len += length($1)} END {print NR, NR, NR + len} Notes: • len initializes to 0 • NR increments with each line read • NF is consistently 1 • length($0) = length($1) = length($NF) • length does not count the \n line terminators
awk Strings Suppose each line of a file looks like the 14 character string (610) 519-4505 How many fields are there in the string? • If using default FS, two fields, lengths 5,8 • If FS = “-”, two fields, lengths 9,4
awk Strings (2) Reformatting the string (610) 519-4505 and others of this form {print substr($1,2,3) “-” $2} Notes: • No pattern means apply to all lines • No commas in print statement means concatenate strings, commas mean insert OFS • awk does not allow multiple simultaneous field separators (except the default space and tab
awk Scripts Given data separated by tabs, fifth field is elapsed time given as hh:mm Script computes average elapsed time. Try again (by Wednesday), using a template that • has one action statement in body • has one calculation and one print statement for the END pattern • uses one user-defined variable called t • uses + * / % operators, int and substr functions
awk Scripts (2) Given a file of words, one per line. Script returns frequency count of the letters in the words. Try again (by Wednesday), using a template that • has one action statement in body, a for loop • has one statement for the END pattern, a for loop that controls the printing • uses one user-defined variable, an array called lc