110 likes | 139 Views
Text Processing. Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows line numbers -A [NUM] prints match and [NUM] lines after match -B [NUM] prints match and preceding [NUM] lines
E N D
Searching Inside Files • grep - searches for patterns within files • grep [options] [[-e] pattern] filename [...] • -n shows line numbers • -A [NUM] prints match and [NUM] lines after match • -B [NUM] prints match and preceding [NUM] lines • -C [NUM] prints match and [NUM] lines before and after For -C, [NUM] defaults to 2 • -i performs case insensitive match • -v inverts match; prints what doesn't match • --color highlight matched string in color • The grep command in Linux searches a file or files for a pattern and by default prints the lines containing matches. • This default behavior is shown in the following example where grep returns the entire line that contains the pattern nobody: • $ grep nobody /etc/passwd • nobody:x:99:99:Nobody:/:
grep Examples: 1. Consider the following extremely simple examples of using the grep command: $ cat file mouse cat dog bear 2. Print all lines that contain the letter e including their line numbers: $ grep -n e file 1:mouse 4:bear 3. Print all lines that do not contain the letter e including their line numbers: $ grep -nv e file 2:cat 3:dog 4. Print lines containing the pattern cat plus one line preceding each match: $ grep -B 1 cat file mouse cat 5. Print lines containing the case insensitive pattern BEAR: $ grep -i BEAR file bear
The Streaming Editor • sed - A [s]treaming [ed]itor • sed [options] filename [...] • • performs edits on a stream of text (usually the output of another program) • • often used to automate edits on many files quickly • • small and very efficient • • -i switch for in place edits with modern versions • Example: • $ cat letter • I love Windows. Windows is my favorite operating • Then sed works its magic fixing the statement with a simple search and replace command: • $ sed s/Windows/Linux/g letter • I love Linux. Linux is my favorite operating system.
Text Processing with awk • awk - pattern scanning and processing language • $ awk -f awk_script_name /path/to/file • • Turning complete programming language • • splits lines into fields (like cut) • • regex pattern matching (like grep) • • math operations, control statements, variables, IO... • awk Command Examples • Print the lines that end with the string bash: • $ awk ‘/bash$/’ /etc/passwd • . . . output omitted . . . • Print the names of the users (field one) for each line that end with the string bash: • $ awk -F: ‘/bash$/ {print $1}’ /etc/passwd • . . . output omitted . . .
Replacing Text Characters • tr - translates, squeezes & deletes characters • tr [options] [set1] [set2] • • translates one set of characters into another commonly used to convert lower case into upper case $ tr a-z A-Z • • squeeze collapses duplicate characters commonly used to merge multiple blank lines into one $tr -s ‘\n’ • • deletes a set of characters commonly used to delete special characters • tr -d ‘\000’ • To display the contents of the lower.txt file and convert all lower-case characters to upper case: • $ cat lower.txt | tr a-z A-Z • THESE ARE CHARACTERS THAT WERE TYPED INTO THIS FILE IN LOWER CASE • To display the contents of the lower.txt file and delete all occurrences of the letter e: • $ cat lower.txt | tr -d e • ths ar charactrs that wr typd into this fil in lowr cas
Text Sorting • sort - Sorts text • sort [options] filename [...] • • can sort on different columns • • by default sorts in lexicographical order • 1, 2, 234, 265, 29, 3, 4, 5 • • can be told to sort numerically • 1, 2, 3, 4, 5, 29, 234, 265 • • can merge and sort multiple files simultaneously • • can sort in reverse order • • often used to prepare input for the uniq command • -n sort numerically • -r sort in reverse order • -m do not sort, only merge; this is faster but only works if the input is already sorted • -t separator use this as a column separator • -k number sort by this column number, counting the first column as 1 • -o filename output to the specified file instead of STDOUT
Duplicate Removal Utility • uniq - Removes duplicate lines from sorted text • uniq [options] [filename [filename]] • • cleanly combines lists of overlapping but not identical information • • -c prefixes each line of output with a number indicating number of occurrences • • taking this output and performing a reverse sort produces a sorted list based on number of occurrences • -i ignore case, ie b is equivalent to B • -D print all duplicated lines • -d only print duplicated lines • -u only print unique lines • -c prefix lines by the number of occurrences
uniq Command Examples Consider the following example which shows several of the features of the uniq command: $ cat file $ uniq -c /tmp/file mouse 2 mouse mouse 1 Mouse Mouse 1 cat Cat 1 dog dog 1 cat cat $ uniq /tmp/file $ uniq -d /tmp/file mouse mouse Mouse Cat dog cat $ uniq -i /tmp/file mouse cat dog
Extracting Columns of Text • cut - Extracts selected fields from a line of text • cut [options] [filename] [...] • • can specify which fields you want to extract • • uses tabs as default delimiter • • -d option to specify a different delimiter • • most useful on structured input (text with columns) • -b range cut and paste only bytes from this range • -c range cut and paste only characters from this range • -f range cut and paste only this range • -d delimiter use this delimiter instead of the default • 6-39 from 6 to 39 • -12 from 1 (the beginning of the line) to 12 • 30- from 30 to the end of the line • $ cat /etc/passwd | cut -d : -f 1,3 • foo:501 • bar:502
Merging Multiple Files • paste - Merges text from multiple files to STDOUT • paste [options] [filename] [...] • • -s option to merge files serially • • uses tabs as default delimiter • $ cat file1 $ cat file2 $ cat file3 • A one 1 • B two 2 • C three 3 • D four 4 • $ paste file1 file2 file3 $ paste -s file1 file2 file3 • A one 1 A B C D • B two 2 one two three four • C three 3 1 2 3 4 • D four 4