420 likes | 590 Views
Chapter 5: Understanding Text Processing. The Complete Guide to Linux System Administration. Objectives. Use regular expressions in a variety of circumstances Manipulate text files in complex ways using multiple command-line utilities Use advanced features of the vi editor
E N D
Chapter 5:Understanding Text Processing The Complete Guide to Linux System Administration
Objectives • Use regular expressions in a variety of circumstances • Manipulate text files in complex ways using multiple command-line utilities • Use advanced features of the vi editor • Use the sed and awk text processing utilities The Complete Guide to Linux System Administration
Regular Expressions • Flexible way to encode many types of complex patterns • Use to define pattern in many situations • Parameter to most Linux commands • Within vi editor • Within programming languages • Including shell scripts • Used for text The Complete Guide to Linux System Administration
Regular Expressions (continued) The Complete Guide to Linux System Administration
Regular Expressions (continued) The Complete Guide to Linux System Administration
Regular Expressions (continued) • Acceptable syntax varies in small but important ways • Depending on where expression used • Examples: • [Rr]eunion[0-9][0-9].jpg • [Rr]eunion[0-9]{2}.jpg • Reunion-[^d].jpg The Complete Guide to Linux System Administration
Manipulating Files • Command-line utilities useful for: • Searching • Sorting • Reorganizing • Otherwise working with text files The Complete Guide to Linux System Administration
Searching for Patterns with grep • grep • Rapidly scan files for specified pattern • Print out lines of text that contain text matching pattern • Take further action on matching lines of text • Using pipe to connect grep with other filtering commands The Complete Guide to Linux System Administration
Searching for Patterns with grep (continued) • Examples: • grep wilson /etc/passwd • grep thomas[Cc]orp *txt • Often used at end of pipe • locate tif | grep frame The Complete Guide to Linux System Administration
Examining File Contents • head and tail commands: • Display first few lines and last few lines of file • By default include 10 lines • -n option • Specify number of lines • Print output to STDOUT • Redirect as needed The Complete Guide to Linux System Administration
Examining File Contents (continued) • tail –f option • “Follows” file printing new lines as they are added to file by other programs • Very useful for tracking log files • wc command • Count number of characters, words, and lines The Complete Guide to Linux System Administration
Examining File Contents (continued) The Complete Guide to Linux System Administration
Examining File Contents (continued) • strings command • Extracts text strings from file that includes binary and other non-text data • Provides convenient way to check for information that may not be otherwise available The Complete Guide to Linux System Administration
Examining File Contents (continued) The Complete Guide to Linux System Administration
Manipulating Text Files • Filtering • Modify part of text file by adding removing or altering data in file • Based on complex rules or patterns • Use command-line programs to filter text files • sort command • Sort all of lines in text file • uniq command • Remove duplicate lines in file The Complete Guide to Linux System Administration
Manipulating Text Files (continued) • diff command • Displays differences between two files • Output format: • < indicates lines that were not found in second file • > indicates lines that were not found in first file • cmp command • Gives quick check of whether two files are identical The Complete Guide to Linux System Administration
Manipulating Text Files (continued) • comm command • Used to compare sorted files to see if they differ at all • ispell spell checker • Uses large dictionary to examine text file • Prompts with suggestions The Complete Guide to Linux System Administration
Manipulating Text Files (continued) The Complete Guide to Linux System Administration
Manipulating Text Files (continued) The Complete Guide to Linux System Administration
Manipulating Text Files (continued) The Complete Guide to Linux System Administration
Using sed and awk • sed • Complex filtering program • awk command • Generally used for formatting output The Complete Guide to Linux System Administration
Filtering and Editing Text with sed • sed command • Processes each line in text file according to series of command-line options • Example: • sed -n '/lincoln/p' /tmp/names • Prints to screen all lines of /tmp/names file that contain text “lincoln” • By default, prints each line to STDOUT The Complete Guide to Linux System Administration
Filtering and Editing Text with sed (continued) • Substitution command syntax: • /pattern1/s/pattern2/pattern3/g • Watches for lines containing pattern1 • Replaces occurrences of pattern2 with pattern3 • g option at end of command • Causes sed to replace all occurrences on each line • Means global The Complete Guide to Linux System Administration
Filtering and Editing Text with sed (continued) • Can place operations in file and pass file name to sed command • sed -f nolatin news-article > new_news-article • ( & ) Operator within sed command • Refers to text that matches pattern2 • S/[0-9]*\[0-9][0-9]/\$&/g • sed often useful as part of pipeline of Linux commands The Complete Guide to Linux System Administration
Formatting with awk • Processes text • Extracts parts of file • Formats text according to information you provide on command line or in script file • Format output based on fields within line of text • Often can perform same functions with sed or awk The Complete Guide to Linux System Administration
Formatting with awk (continued) • Each field on line is normally separated by whitespace • Can change which character awk uses to separate fields • First field is referred to by $1 second by $2, etc. • Basic format: /pattern/ { actions } • Example: ls -l | awk '{ print $3 $9 }' The Complete Guide to Linux System Administration
Formatting with awk (continued) • Can include regular expression to select which lines awk includes in output: • ls -l | awk '/^l/ {print $3 $9 }' • Use variable or comparison in awk command • Put at beginning of command instead of pattern • ls -l | awk ' $2 > 3 {print $0 }' • Using awk script file: • awk -f awk_command_list text_file The Complete Guide to Linux System Administration
More Advanced Text Editing • vi editor provides advanced text editing features The Complete Guide to Linux System Administration
File Operations in vi • :w command • Write file you are editing • :r file name • Insert another file into file you are editing • :q command • Exit from vi • :wq • Save and exit The Complete Guide to Linux System Administration
Screen Repositioning • Line number and cursor position on line • Shown at bottom right • Use parentheses and curly braces • Move forward or backward by one sentence or paragraph at a time • Ctrl+f and Ctrl+b key combinations • Move one screen forward and backward The Complete Guide to Linux System Administration
Screen Repositioning (continued) • Shift+G • Take you to any line in file • Enter line number first then Shift+g • Mark • Like bookmark • m command followed by name (a-z and 0-9) • Place mark • ‘ command followed by mark name • Return to mark The Complete Guide to Linux System Administration
Screen Repositioning (continued) • % • Navigate between matching braces, parenthesis, etc. in program source code • Shift+J • Joins two lines The Complete Guide to Linux System Administration
More Line-Editing Commands • :h • View vi help file • Ctrl+] • Navigate to hyperlinks in help files • Ctrl+t • Navigate back from links in help files The Complete Guide to Linux System Administration
More Line-Editing Commands (continued) • Forward slash (/) • Search forward from current cursor position • Can use regular expression as search pattern • n key • Move to next occurrence of search pattern • ? • Search backwards • N key • Move to previous occurrence of pattern The Complete Guide to Linux System Administration
More Line-Editing Commands (continued) • Search-and-replace operations • Format • :line-number-range s/search-pattern/replacement text/flags • Example • :1$ s/^configure/configure/ The Complete Guide to Linux System Administration
More Line-Editing Commands (continued) • Shelling out • Execute another Linux command • As if you were at shell prompt • Type ! followed by command • Example: :!ls /etc/samba The Complete Guide to Linux System Administration
Setting vi Options • :set all • View all options currently set in vi • Press spacebar multiple times to see all screens of settings • :set without the word all • Displays all options that current user has set • :set followed by option • To set option The Complete Guide to Linux System Administration
Setting vi Options (continued) The Complete Guide to Linux System Administration
Setting vi Options (continued) • Can automate settings • Define environment variable called EXINIT that contains set command • Executed each time vi started • EXINIT='set nu nosmartindent' • Place settings in file called .exrc • Overrides information in EXINIT variable The Complete Guide to Linux System Administration
Summary • Regular expressions used in many places to define patterns of information • grep command used to search for lines of text containing pattern defined using regular expression • sed and awk commands support complex scripting language that includes regular expressions The Complete Guide to Linux System Administration
Summary (continued) • vi • Uses complex combinations of commands to reposition cursor within text • Supports search-and-replace operations • set command defines editor settings The Complete Guide to Linux System Administration