110 likes | 218 Views
CS 497C – Introduction to UNIX Lecture 25: - Simple Filters. Chin-Chih Chang chang@cs.twsu.edu. sort: Ordering a File. The sort command is used to sort individual fields, and columns within these fields.
E N D
CS 497C – Introduction to UNIXLecture 25: - Simple Filters Chin-Chih Changchang@cs.twsu.edu
sort: Ordering a File • The sort command is used to sort individual fields, and columns within these fields. • When sort is invoked without options, the entire line is sorted in ASCII collating sequence. • Using the -t option, you can sort the file on any field. • You can sort the file on the fifth field. sort -t: +4 /etc/passwd
sort: Ordering a File • You can sort on the more than one field. • If the primary key is fifth field, and the secondary key the first field. sort -t: +4 -5 +0 /etc/passwd • With the –n (numeric) option, you can sort in a numeric sequence. sort -t:0 +2 -3 -n group • The -u (unique) option lets you purge duplicate lines from a file.
sort: Ordering a File cut -d’:’ -f3 shortlist | sort -u | tee des.lst • sort uses the -o (output) option to output the result to a file. sort -o sortedlist +3 list • You can check if the file actually been sorted with the -c (check) option. • To merge two sorted files, use the -m option. sort -m foo1 foo2 foo3
tr: Translating Characters • The tr (translate) command translates characters and can be used to change the case of letters. The syntax is: tr options expression1 expression2 < standard input • You can use tr to replace the : with a | (tilde), and the / with \. tr ‘:/’ ‘|\’ < /etc/group • We can change the case of the first three lines from lower to upper: head -3 /etc/group | tr ‘[a-z]’ [A-Z]’
tr: Translating Characters • The -d (delete) option is used to delete characters. • The -s (squeeze) option is used to compress multiple consecutive characters. tr -s ‘ ‘ < shortlist • The -c (complement) option complements (negates) the set of characters in the expression. • You can also use octal values in tr. tr ‘|’ ‘\012’ < shortlist
uniq: Locate Repeated and Nonrepeated Lines • uniq removes duplicate lines. • It is usually sort a file and pipe the process to uniq. sort dept.lst | uniq - • The -u (unique) option selects only nonduplicate lines. • The -d (duplicate) option selects only the repated ones. • The -c (count) option option displays the frequency of all lines.
nl: Line Numbering • The nl command numbers only logical lines. • nl uses the tab as the default delimiter, but we can change it to the : with the -s option. • You can set the width (-w) of the number format. nl -w1 -s: calc.lst
dos2unix and unix2dos: DOS and UNIX Files • UNIX and DOS files differ in structure. Lines in DOS are terminated by the carriage return - linefeed characters, while a UNIX line uses only linefeed. • Some UNIX systems feature two utilities - dos2unix and unix2dos - for converting files between DOS and UNIX. unix2dos catalog.html catalog.html cat *.html | unix2dos > combined.html
Spell (ispell): Check Your Spellings • spell is used to spell-check a document. The command reads a file and generates a list of all spellings that are recognized as mistakes. • The -b (british) option uses the British dictionary. • Linux has an interactive spell-checking program - ispell. • When used with the -l option, ispell works noninteractively like spell.
Applying the Filters • A three stage operation is shown as below: • Cut out the third field with cut -d’|’ -f3 shortlist. • Sort it next with sort. • Finally, run uniq -c on the sorted output. • This can be done together using a pipeline: cut -d’|’ -f3 shortlist | sort |uniq -c • To output the manual page in a plaintext format: man ls | col -b > ls.man