UNIX Filters

UNIX Filters

Some Simple UNIX Filters(Commands That Use Both Standard Input & Standard Output) pr cmp, comm, diff sort uniq Content Level File Level • head,tail • cut, paste • tr • grep There are a lot of others!!

Formatting Output: pr Command pr prepares files for printing by adding formatting, headers, footers, … Some options: -k prints in k columns -n numbers the lines of output -d double spaces output -l nsets length to n lines -w m sets width to m chars Example: a.out | pr –n –d –l 64 Prints output with line numbers, double spacing, and 64 lines per page

Comparing Files: cmp Command • cmp compares 2 files and stops when it finds a difference • The comparison is character by character (byte by byte). • Option: -l lists all byte differences in the files • Examples: cmp file1 file2 • file1 file2 differ: char 12, line 3 • cmp –l file1 file2 | wc –l • Displays the number of differences in the 2 files.

Comparing Files: comm Command • comm compares 2 files and lists 3 columns of information: • 1. lines unique to 1st file 2.lines unique to 2nd file 3. lines common to both files • The files must be sorted. • Options: -1, -2, -3 indicates the columns to drop in the output • Examples: comm -3 file1 file2 Lists all the unique lines in • both files. • comm –l2 file1 file2 Lists all the lines common to • both files.

Comparing Files: diff Command • diff compares 2 files and lists the instructions needed • to make the files the same. Here’s an example: • $ diff file1 file2 • 3c3 Change line 3 • < This is line 3 of file 1. from this • --- • > This is line 3 of file 2. to this. • 7a8 • > This is line 8 from file 2. Add this line after • line 7 of file1.

Extracting Vertical Data: cut Command • cut extracts vertical slices of data from a file. Either columns (-c • option) or fields (-f option) of data may be extracted. A delimiter (-d • option) may be defined to separate fields. Default delimiter is tab. • Examples: cut –c1-4,8,15- file1 Extracts chars 1 thru 4, the 8th • character, and characters 15 thru end of each line from file1. No • whitespace in the column list!! • cut –d: -f1-3 file2 Extracts fields 1 thru 3 from file2. The • fields are separated by the : character • who | cut –d” “ –f1 Lists the names of all users logged in.

Joining Vertical Data: paste Command • paste vertically joins 2 files together. A delimiter (-d option) • may be defined to separate fields. Default delimiter is tab. The • -s option joins lines of a single file together. • Examples: paste file1 file2 Displays file1 and file2 side by side. • paste –s –d”::\n” addressbook • rick • rick@att.com >>>> rick:rick@att.com:1234567890 • 1234567890

Displaying Files: head and tail Commands • head displays the top of a file. (1st 10 lines, by default). • tail displays the end of the file (last 10 lines, by default). • Options: -n x or –x displays 1st (last) x lines of the file. • -f continuously displays the end of a file as it grows. This • option is for the tail command only. You must use the • interrupt key to stop monitoring the file growth. • Examples: ls –t | head –n 1 Displays the file most recently edited. • tail –f install.log Continuously displays the log file as • it grows. Use interrupt key to stop.

Ordering Files: sort Command • sort reorders the lines of a file in ascending (descending) order. • The default order is ASCII: whitespace, numbers, uppercase, and • finally, lowercase letters. • Options: -k n sort on the nth field of the line • -tcharuse char as the field delimiter • -n sort numerically • -r reverse order sort • -u remove repeated lines • -m list merge sorted files in list

sort Examples • Examples: sort –t: -k 2 list Sort on the 2nd field of file list. Fields are • separated by : • sort –t: -k 5.7 –r list Sort file list in reverse order on the • 7th character of 5th field. Fields separated by : • sort –n list Numerically sort file list, assumed to contain numbers. • sort –m file1 file2 Sorted files file1 and file2 are merged. • cut –d: -f3 list | sort –u Extract the 3rd field from list & • sort that field, removing the repeated lines.

Removing Duplicates: uniq Command • uniq displays a presorted file, removing all the duplicate lines from it. • If 2 files are specified, uniq reads from the first and writes to the second. • Options: -u lists only the lines that are unique • -d lists only the lines that are duplicates • -c counts the frequency of occurrences • Examples: sort list | uniq – xlistSorts file list; uniq reads from • stdin and writes the output to xlist. • uniq –c list Displays count of each unique line in the file list.

Character Manipulation: tr Command • tr translates characters from one format to another. Input always comes • from standard input. Arguments don’t include filenames. General form is: • tr options expression1 expression2 standard input • expression1 is the set of characters to change; expression2 is what • they change to. (The expressions should be equal length.) • Examples: tr ‘+-’ ‘*/’ < math In the file math, replace all +’s with *’s and • replace all -’s with /’s. • head –n 3 list | tr ‘[a-z]’ ‘[A-Z]’ The 1st 3 lines • of the file list are translated to uppercase.

tr Command Options • Options: -d delete characters from the input stream • -s compress multiple consecutive characters (squeeze) • -c complementing value of expression • Examples: tr –d ‘/’ < dates Remove all /’s from the file dates. • tr –s ‘ ’ < names Replace all strings of blanks with a single • blank in the file names. • tr –cd ‘:’ < file1 In the file file1, delete everything that isn’t • a colon (:). All that’s left is a file full of :’s

Finding Patterns in Files with grep • grep searches a file and displays the lines containing a pattern. Form is: • grepoptions pattern files • If more than 1 file is listed, the filename is also displayed in the output. • Some options: -i ignore case when matching • -n display line numbers as well as lines • -c displays a count of the number of occurrences • Examples: grep “professor” college.lst Displays all lines in file, • college.lst, that contain the string professor. • grep –i “Rick” college.lst Displays all lines in the file, • college.lst, that contain the string Rick. (Also finds rick, RICK, rIcK, …)

Regular Expressions in grep • Regular expressions are metacharacter patterns used in ways different • from how the shell uses them. • Regular ExpressionMeaning • * 0 or more of the previous character • . a single character • [pqr] a single p or q or r • [c1-c2] a single char in the ASCII range of c1 thru c2 • [^pqr] a single character notp nor q nor r • ^abcabc at the beginning of line • abc$abc at the end of the line

Example Regular Expressions with grep • grep “g*” file1 Displays all lines in file1 that contain nothing or g, gg, ggg, … • grep “.*” file1 Displays all lines in file1 that contain nothing or any # of chars • grep “[1-3]” file1 Displays all lines in file1 that contain a digit between 1 & 3. • grep “[^a-zA-Z]” file1 Displays all lines in file1 that contain a non-alphabetic • character • grep “^Rick$” file1 Displays all lines in file1 that contain only Rick • grep “^$” file1 Displays all lines in file1 that contain nothing • grep “R[aeiou]ck” file1 Displays all lines in file1 that contain Rack, Reck, • Rick, Rock or Ruck

Putting It All Together • An author wants to count the frequency of words used in a book chapter. • 1. Put each word on a separate line: • tr “ \011” “\012\012” < chapter • 2. Strip out everything that isn’t an alphabetic character or newline: • tr -cd “[a-zA-Z\012]” • 3. Sort the list: sort • 4. Count the word frequency: uniq –c • 5. Put it all together: • tr “ \011” “\012\012” < chapter | • tr –cd “[a-zA-Z\012]” | sort | uniq -c

UNIX Filters

UNIX Filters

Presentation Transcript

Unix Filters

Filters

Filters

FILTERS

Filters

Filters

Filters

FILTERS

CS 497C – Introduction to UNIX Lecture 25: - Simple Filters

Unix Files, IO Plumbing and Filters

Filters

FILTERS

Filters

Filters

Filters

FILTERS

FILTERS

Filters

Filters

FILTERS

Syringe Filters and Membrane Filters