1 / 24

CSC 4630

CSC 4630. Meeting 7 February 7, 2007. More Scripting Languages. awk, named for Aho, Weinberger, Kernighan Script is embedded in a nested looping control structure: for each pattern {action} do for each input file line do if pattern matches line then action.

duke
Download Presentation

CSC 4630

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 4630 Meeting 7 February 7, 2007

  2. More Scripting Languages • awk, named for Aho, Weinberger, Kernighan • Script is embedded in a nested looping control structure: for eachpattern {action} do for each input file line do if pattern matches line then action

  3. awk Programs • Generally a sequence of pattern {action} statements • If {action} is missing, matched lines are printed (meaning written to STDOUT) • If pattern is missing, action is carried out for all lines

  4. Running awk Programs • Short one, composed at keyboard with little thought • awk ‘program’file1 file2 … • Note that awk can take a sequence of files as input. • Long one,composed in editor • awk –f progfile file1 file2 …

  5. awk’s View of Files • Input to awk are text files • Divided into lines • Each line divided into fields by blanks or tabs (the default separator) • Each field referenced by relative number, $1, $2, $3, … • $0 refers to the entire line

  6. Examples • awk ‘{print $1}’ names • Print the first field in each line of the names file • awk ‘/M/’ names • Print each line of the names file that contains an upper case M

  7. Some Built-In Variables • NR, line number of current line of input (runs sequential over all input files) • NF, number of fields in current line • FS, the field separator • FS = “\t” sets the separator to tab, only • FS = “:” sets the separator to colon • FNR, number of the current line (record) in the current input file (resets when a new input file is opened)

  8. Examples • {print NR, NF} • {print NR, $0} • {print $NF} • NR == 10 • NF != 3 • NF > 4

  9. Patterns • Special patterns • BEGIN Action is done once before any lines of the input file(s) are read • END Action is done once after the last file has been processed • Relational expressions between strings or numbers • Arguments treated as numbers, if possible

  10. Comparison Operators < less than > greater than <= less than or equal to >= greater than or equal to == equal to != not equal to ~ matches !~ does not match

  11. Regular Expressions • Enclosed in / / • Matches in entire line • Field match specified as $3 ~ /Ab/, for example • Special symbols \ ^ $ . [ ] * + ? ( ) |

  12. Examples • /Asia/ • /^.$/ • /a\$/ • /\t/ • $2 !~ /^[0-9]+$/ • /(apple|cherry) (pie|tart)/ (note space)

  13. C Escape Sequences \b backspace \f formfeed \n newline \r carriage return \t tab \ddd character whose ASCII value in octal is ddd \” quotation mark \c any other character c literally

  14. Actions • Mini C-like programs • Can extend over several lines • Statements terminated by semicolons or newlines. Statements grouped with braces { }. • Variables are either floating point numbers or strings. • Variables are automatically declared and initialized • Strings initialized to “”, the empty string • Numbers initialized to 0

  15. Assignment Statements • Simple version: v = e • Variable or field name assigned value of expression • Assignment operators: v op= e means v = v op e • Legal values of op are + - * / % ^ • Used because interpreted code runs faster

  16. Increment Operators • Borrowed from C • Prefix or postfix • ++ or – • Example: x = 3. What is the value of k? • k = x++ • k = ++x • k = x-- • k = --x

  17. Arithmetic Functions • sin(x) assumes x is in radians • cos(x) assumes x is in radians • atan2(y,x) range from –pi to pi • exp(x) exponential • log(x) natural logarithm of x, so x>0 • sqrt(x) square root of x, so x >= 0 • int(x) truncates fractional part • rand(x) returns a random number in [0,1] • srand(x) sets the seed for rand to x

  18. Strings • Literal values enclosed in double quotes “abc” “Wildcats rule” “20 bananas” • Concatenation represented by juxtaposition s = “Villanova” t = “Wildcats” {print s t}

  19. String Functions “Standard” string operations (cf. head, tail, firstfew, lastfew, allbut) • length(s) length of s • length = length($0) • index(s,t) if t is a substring of s return position of first character, return 0 otherwise • substr(s,p) returns substring starting at position p if 0<p<=length(s), returns empty string otherwise • substr(s,p,n) returns substring of length n starting at position p

  20. String Functions (2) “Editing” functions • sub(r,s) replace r by s in current record (first occurrence only) • sub(r,s,t) replace r by s in t (first occurrence only) • gsub(r,s) replace r by s in current record (globally) • gsub(r,s,t) replace r by s in t (globally) In all cases, return the number of substitutions

  21. Control Structures • if (<expression>) <s1> else <s2> <expression> can be any expression; true is defined to be non-zero or non-null <s1> and <s2> can be any group of statements Note the critical parentheses that separate the conditional expression from <s1>

  22. Control Structures (2) • while (<expression>) <s1> Same rules as for if-then-else

  23. Control Structures (3) • for (<e1>;<e2>;<e3>) <s1> is equivalent to <e1>; while (<e2>) {<s1>;<e3>} • for (k in <array>) <s1> loops over the subscripts of an array but the order of the subscripts is random. Careful: awk allows general subscripting. Strings can be used as subscripts.

  24. Control Structures (4) “Go to” structures • break when executed within a for or while statement, causes an immediate exit • continue when executed within a for or while statement, causes immediate execution of the next iteration • next causes the next line (record) of the input file to be read and the sequence of pattern {action} statements executed on it • exit causes the program to jump to the END pattern, execute it, and stop

More Related