290 likes | 462 Views
Math 272. AWK UTILITY. BY A Mikati & M Shaito. supervised by:. Dr. A Nasri. Awk Utility. Introduction Some basics Some samples Patterns & Actions Regular Expressions Boolean start /end BEGIN /END. Awk Utility (continued). Awk variables Control of flow statements:
E N D
Math 272 AWK UTILITY
BY A Mikati & M Shaito
supervised by: Dr. A Nasri
Awk Utility • Introduction • Some basics • Some samples • Patterns & Actions • Regular Expressions • Boolean • start /end • BEGIN /END
Awk Utility (continued) • Awk variables • Control of flow statements: • a: If_Else statement • b: While Statement • c: For statement
Introduction: • History: • The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of awk was written in 1977. In 1985 a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions.
Introduction (cont’d): If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. To write a program to do this in a language such as C or Pascal is a time-consuming inconvenience that may take many lines of code. The job may be easier with awk. The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs easily with just a few lines of code.
Some Basics: • The basic function of awk is to search files for lines (or other units of text) that contain certain patterns. • Awk recognizes the concepts of "file", "record", and "field". • A file consists of records, which by default are the lines of the file. One line becomes one record. • Awk operates on one record at a time. • A record consists of fields, which by default are separated by any number of spaces or tabs. • Field number 1 is accessed with $1, field 2 with $2, and so forth. $0 refers to the whole record.
Some Samples: >awk ‘{print $0}’ filename Perhaps the quickest way of learning awk is to look at some sample programs. The one above will print the file in its entirety, just like cat(1). Here are some others, along with a quick description of what they do. >awk '{print $2,$1}' filename will print the second field, then the first. All other fields are ignored. What if you don't want to apply the program to each line of the file? Say, for example, that you only wanted to process lines that had the first field greater than the second. The following program will do that: >awk '$1 > $2 {print $1,$2,$1-$2}' filename
Patterns & Actions: The part outside the curly braces is called the "pattern", and the part inside is the "action". The comparison operators include the ones from C: == != < > <= >= ?: If no pattern is given, then the action applies to all lines. This fact was used in the sample programs above. If no action is given, then the entire line is printed. If "print" is used all by itself, the entire line is printed. Thus, the following are equivalent: awk '$1 > $2' filename awk '$1 > $2{print}' filename awk '$1 > $2{print $0}' filename
Patterns &Actions: (cont’d) The various fields in a line can also be treated as strings instead of numbers. To compare a field to a string, use the following method: >awk '$1=="foo"{print $2}' filename There are various types of patterns and actions that will be explained in details.
Kinds of patterns: • /regular expression/ • A regular expression as a pattern. It matches when the text of the input record fits the regular expression. • expression • A single expression. It matches when its value, converted to a number, is nonzero (if a number) or non null (if a string). • BEGIN END • Special patterns to supply start-up or clean-up information to awk. • null • The empty pattern matches every input record.
Regular Expressions: A regular expression, or regexp, is a way of describing a class of strings. A regular expression enclosed in slashes (`/') is an awk pattern that matches every input record whose text belongs to that class. The simplest regular expression is a sequence of letters, numbers, or both. Such a regexp matches any string that contains that sequence. Thus, the regexp`foo' matches any string containing `foo'. Therefore, the pattern /foo/ matches any input record containing `foo'. Other kinds of regexps let you specify more complicated classes of strings. >awk '/foo.*bar/{print $1,$3}' filename
Boolean: A Boolean pattern is an expression which combines other patterns using the Boolean operators "or" (`||'), "and" (`&&'), and "not" (`!'). Whether the Boolean pattern matches an input record depends on whether its subpatterns match. For example, the following command prints all records in the input file `filename' that contain both `2400'and `foo'. awk '/2400/ && /foo/' filename
Start & end: There are three special forms of patterns that do not fit the above descriptions. One is the start-end pair of regular expressions. Also it is known as range pattern which is made of two patterns separated by a comma, of the form startpat, endpat. It matches ranges of consecutive input records. The first pattern startpat controls where the range begins, and the second one endpat controls where it ends. For example, awk '$1 == "on", $1 == "off"’ filename
BEGIN /END: • Any action associated with the BEGIN pattern will happen before any line-by-line processing is done. Actions with the END pattern will happen after all lines are processed. • But how do you put more than one pattern-action pair into an awk program? There are several choices. • One is to just mash them together, like so: • >awk 'BEGIN{print"fee"}\ $1=="foo"{print"fi"}\ • END{print"fo fum"}' filename
BEGIN /END: (cont’d) • Another choice is to put the program into a file, like so: • BEGIN{print"fee"} • $1=="foo"{print"fi"} • END{print"fo fum"} • Let's say that's in the file giant.awk. Now, run it using the "-f" flag to awk: • >awk -f giant.awk filename
BEGIN / END : (cont’d) • Athird choice is to create a file that calls awk all by itself. The following form will do the trick • #!/usr/bin/awk -f • BEGIN{print"fee"} • $1=="foo"{print"fi"} • END{print"fo fum"} • If we call this file giant2.awk, we can run it by first giving it execute permissions, • >chmod u+x giant2.awk • and then just call it like so: • >./giant2.awk filename .
BEGIN /END: (cont’d) awk has variables that can be either real numbers or strings. For example, the following code prints a running total of the fifth column: >awk '{print x+=$5,$0 }' filename This can be used when looking at file sizes from an "ls -l". It is also useful for balancing one's checkbook, if the amount of the check is kept in one column.
Actions: An awk program or script consists of a series of rules and function definitions, interspersed. A rule contains a pattern and an action, either of which may be omitted. The purpose of the action is to tell awk what to do once a match for the pattern is found. Thus, the entire program looks somewhat like this: [pattern] [{ action }] [pattern] [{ action }] function name (args) { ... } An action consists of one or more awk statements, enclosed in curly braces (`{' and `}'). Each statement specifies one thing to be done. The statements are separated by newlines or semicolons.
Actions: (cont’d) Here are the kinds of statements supported in awk: 1)Expressions, which can call functions or assign values to variables .Executing this kind of statement simply computes the value of the expression and then ignores it. This is useful when the expression has side effects 2)Control statements, which specify the control flow of awk programs. The awk language gives you C-like constructs (if, for, while, and so on) as well as a few special ones 3)Compound statements, which consist of one or more statements enclosed in curly braces. A compound statement is used in order to put several statements together in the body of an if, while, do or for statement.
Actions:(cont’d) 4)Input control, using the getline command and the next statement 5)Output statements, print and printf. 6)Deletion statements, for deleting array elements.
Awk variables Most awk variables are available for you to use for your own purposes; they never change except when your program assigns values to them, and never affect anything except when your program examines them. A few variables have special built-in meanings. Some of them awk examines automatically, so that they enable you to tell awk how to do certain things. Others are set automatically by awk, so that they carry information from the internal workings of awk to your program. user-modified: Built-in variables that you change to control awk. Auto-set: Built-in variables where awk gives you info.
Control of flow statements: Control statements such as if, while, and so on control the flow of execution in awk programs. Most of the control statements in awk are patterned on similar statements in C. All the control statements start with special keywords such as if and while, to distinguish them from simple expressions. Many control statements contain other statements; for example, the if statement contains another statement which may or may not be executed. The contained statement is called the body. If you want to include more than one statement in the body, group them into a single compound statement with curly braces, separating them with newlines or semicolons.
If- statement : The if-else statement is awk's decision-making statement. It looks like this: if (condition) then-body [else else-body] condition is an expression that controls what the rest of the statement will do. If condition is true, then-body is executed; otherwise, else-body is executed (assuming that the else clause is present). The else part of the statement is optional. The condition is considered false if its value is zero or the null string, and true otherwise. awk '{ if (x % 2 == 0) print "x is even"; else print "x is odd" }'
While Statement : In programming, a loop means a part of a program that is (or at least can be) executed two or more times in succession. The while statement is the simplest looping statement in awk. It repeatedly executes a statement as long as a condition is true. It looks like this: while (condition) body this example prints the first three fields of each record, one per line. awk '{ i = 1 while (i <= 3) { print $i i++ } }'
For Statement : The for statement makes it more convenient to count iterations of a loop. The general form of the for statement looks like this: for (initialization; condition; increment) body This statement starts by executing initialization. Then, as long as condition is true, it repeatedly executes body and then increment. Here is an example of a for statement: awk '{ for (i = 1; i <= 3; i++) print $i }' This prints the first three fields of each input record, one field at a time.
Thanks for listening A Mikati M Shaito
For more information about Awk utility VISIT http://mshaito.tripod.com/awk/awk.html http://