230 likes | 376 Views
Unix Lecture 7. Hana Filip. sed LAST WEEK wc sort tr uniq awk TODAY join paste. Other you may want to check out: comm cut ex iconv xargs. Text Processing Command Line Utility Programs (cont.). Text Processing Command Line Utility Programs. awk
E N D
Unix Lecture 7 Hana Filip LIN 6932
sed LAST WEEK wc sort tr uniq awk TODAY join paste Other you may want to check out: comm cut ex iconv xargs Text ProcessingCommand Line Utility Programs(cont.) LIN 6932
Text ProcessingCommand Line Utility Programs awk • after last names of its inventors, Alfred Aho, Peter Weinberger and Brian Kernighan • a pattern scanning and processing language • a programming language that extensively uses the string datatype, associative arrays* and regular expressions • AWK programs and sed scripts inspired Larry Wall to write Perl *associative array: also map, hash, dictionary, lookup table, and in query-processing an index or index file, is an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing. The relationship between a key and its value is sometimes called a mapping or binding. Hence, associative arrays are very closely related to the mathematical concept of a function LIN 6932
Text ProcessingCommand Line Utility Programs awk • an AWK program is a series of pattern { action } • pairs, where pattern is typically an expression and action is a series of commands. • Each line of input is tested against all the patterns in turn and the action executed if the pattern is matched or the relevant expression true. • Either the pattern or the action may be omitted. • The pattern defaults to matching every line of input. • The default action is to print the line of input. LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it • If the program is short, it is easiest to include it in the command that runs awk: % awk 'program' input-file1 input-file2 ... where 'program' consists of a series of patterns and actions • When the program is long, it is usually more convenient to put it in a file and run it with a command like this: % awk -f program-file input-file1 input-file2 ... LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it % awk 'program' input-file1 input-file2 ... single quotes around'program'make the shell treat all of 'program' as a single argument for awk and allow program to be more than one line long LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it % awk '/foo/ { print $0 }' list fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sabafoo 555-2127 1200/300 C when lines containing ‘foo’ are found in the file list, they are printed PATTERN: /foo/ the slashes indicate that ‘foo’ is a pattern ( = regular expression) to search for ACTION: print $0action to print the current line LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it % awk '/foo/ { print $0 }' list fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sabafoo 555-2127 1200/300 C % egrep 'foo' list fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sabafoo 555-2127 1200/300 C LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it % awk '/12/ { print $0 } /21/ { print $0}' list aardvark 555-5553 1200/300 B alpo-net 555-3412 2400/1200/300 A barfly 555-7685 1200/300 A bites 555-1675 2400/1200/300 A core 555-2912 1200/300 C fooey 555-1234 2400/1200/300 B foot 555-6699 1200/300 B macfoo 555-6480 1200/300 A sdace 555-3430 2400/1200/300 A sabafoo 555-2127 1200/300 C sabafoo 555-2127 1200/300 C LIN 6932
Text ProcessingCommand Line Utility Programs Awk - how to run it The awk language is very useful for producing reports from large amounts of raw data, such as summarizing information from the output of other utility programs like ls. % ls -la | awk '$6 == "Apr" { sum += $5 } END { print sum }' 692947 This command prints the total number of bytes in all the files in the current directory that were last modified in April LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it - executable awk Programs An awk script can have three types of blocks. One of them must be there. The BEGIN{} block is processed before the file is checked. The {} block runs for every line of input. The END{} block is processed after the final line of the input file. LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it • BEGIN and END are special patterns. • They are not used to match input records. • They are used for supplying start-up or clean-up information to your awk script. • A BEGIN rule is executed, once, before the first input record has been read. • An END rule is executed, once, after all the input has been read. LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it - executable awk Programs write self-contained awk scripts, using the ‘#!’ script mechanism #! /usr/bin/awk -f You may want to check with % whereis awk to see what to put into the first line of you awk script LIN 6932
Text ProcessingCommand Line Utility Programs Awk - how to run it - executable awk Programs % vi whowhat #! /usr/bin/awk -f /[Uu]npaid/ {print $1, "owes", $2 > "deadbeats" } % chmod +x whowhat % whowhat debts % vi deadbeats Dick owes 3.87 Harry owes 56.00 Tom owes 36.03 Harry owes 22.60 Tom owes 11.44 LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it - executable awk Programs % vi awklw #! /usr/bin/awk -f BEGIN { nl = 0; nw = 0 } { nl++ ; nw += NF } END { print "Lines:", nl, "words:", nw } % chmod +x awklw % awklw machen.txt Lines: 2538 words: 21853 LIN 6932
Text ProcessingCommand Line Utility Programs awk - how to run it - executable awk Programs ++: the increment operator means “add one to a variable” or “make a variable's value one more than it was before.” NF: awk variable that is used to count how many new fields there are in a given file Useful reference for awk usage: Linux and Unix Shell Programming By D. S. W. Tansley books.google.com/ LIN 6932
Text ProcessingCommand Line Utility Programs paste prints lines consisting of sequentially corresponding lines of each specified file. In the output the original lines are separated by TABs. The output line is terminated with a newline. % paste file1 file2 … filen > filen+1 % vi numbers % vi letters % paste numbers letters > numbers.letters • a 1 a 2 b 2 b • c 3 c 4 d 4 d LIN 6932
Text ProcessingCommand Line Utility Programs join merges the lines of two sorted text files based on the presence of a common field % join file1 file2 > file3 % vi person1 % vi person2 % join person1 person2 > person3 Smith john newman bill smith john betty Carpenter mary Smith betty LIN 6932
Looping Logic • In looping logic, a control structure (or loop) repeats until some condition exists or some action occurs • You knowforeachloop foreach var ( worddlist ) command(s) end it loops through a range of values. it makes a variable take on each value in a specified set, one at a time, and performs some action LIN 6932
Looping Logic foreach var ( worddlist ) command(s) end while ( expr ) command(s) end LIN 6932
Looping Logic #!/bin/csh foreach person (Bob Susan Joe Gerry) echo Hello $person end Output: Hello Bob Hello Susan Hello Joe Hello Gerry LIN 6932
The while Loop • A different pattern for looping is created using the while loop while ( condition ) command(s) end • The while statement best illustrates how to set up a loop to test repeatedly for a matching condition • The while loop tests a condition in a manner similar to the if statement • As long as the condition is true, the command(s) repeat(s) LIN 6932
Looping Logic Adding integers from 1 to 10 #!/bin/csh set i=1 set sum=0 while ($i <= 10) echo Adding $i to the sum. set sum=`expr $sum + $i` set i=`expr $i + 1` end echo The sum is $sum. LIN 6932