1 / 41

Chapter 5: Understanding Text Processing

Chapter 5: Understanding Text Processing. The Complete Guide to Linux System Administration. Objectives. Use regular expressions in a variety of circumstances Manipulate text files in complex ways using multiple command-line utilities Use advanced features of the vi editor

liv
Download Presentation

Chapter 5: Understanding Text Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5:Understanding Text Processing The Complete Guide to Linux System Administration

  2. Objectives • Use regular expressions in a variety of circumstances • Manipulate text files in complex ways using multiple command-line utilities • Use advanced features of the vi editor • Use the sed and awk text processing utilities The Complete Guide to Linux System Administration

  3. Regular Expressions • Flexible way to encode many types of complex patterns • Use to define pattern in many situations • Parameter to most Linux commands • Within vi editor • Within programming languages • Including shell scripts • Used for text The Complete Guide to Linux System Administration

  4. Regular Expressions (continued) The Complete Guide to Linux System Administration

  5. Regular Expressions (continued) The Complete Guide to Linux System Administration

  6. Regular Expressions (continued) • Acceptable syntax varies in small but important ways • Depending on where expression used • Examples: • [Rr]eunion[0-9][0-9].jpg • [Rr]eunion[0-9]{2}.jpg • Reunion-[^d].jpg The Complete Guide to Linux System Administration

  7. Manipulating Files • Command-line utilities useful for: • Searching • Sorting • Reorganizing • Otherwise working with text files The Complete Guide to Linux System Administration

  8. Searching for Patterns with grep • grep • Rapidly scan files for specified pattern • Print out lines of text that contain text matching pattern • Take further action on matching lines of text • Using pipe to connect grep with other filtering commands The Complete Guide to Linux System Administration

  9. Searching for Patterns with grep (continued) • Examples: • grep wilson /etc/passwd • grep thomas[Cc]orp *txt • Often used at end of pipe • locate tif | grep frame The Complete Guide to Linux System Administration

  10. Examining File Contents • head and tail commands: • Display first few lines and last few lines of file • By default include 10 lines • -n option • Specify number of lines • Print output to STDOUT • Redirect as needed The Complete Guide to Linux System Administration

  11. Examining File Contents (continued) • tail –f option • “Follows” file printing new lines as they are added to file by other programs • Very useful for tracking log files • wc command • Count number of characters, words, and lines The Complete Guide to Linux System Administration

  12. Examining File Contents (continued) The Complete Guide to Linux System Administration

  13. Examining File Contents (continued) • strings command • Extracts text strings from file that includes binary and other non-text data • Provides convenient way to check for information that may not be otherwise available The Complete Guide to Linux System Administration

  14. Examining File Contents (continued) The Complete Guide to Linux System Administration

  15. Manipulating Text Files • Filtering • Modify part of text file by adding removing or altering data in file • Based on complex rules or patterns • Use command-line programs to filter text files • sort command • Sort all of lines in text file • uniq command • Remove duplicate lines in file The Complete Guide to Linux System Administration

  16. Manipulating Text Files (continued) • diff command • Displays differences between two files • Output format: • < indicates lines that were not found in second file • > indicates lines that were not found in first file • cmp command • Gives quick check of whether two files are identical The Complete Guide to Linux System Administration

  17. Manipulating Text Files (continued) • comm command • Used to compare sorted files to see if they differ at all • ispell spell checker • Uses large dictionary to examine text file • Prompts with suggestions The Complete Guide to Linux System Administration

  18. Manipulating Text Files (continued) The Complete Guide to Linux System Administration

  19. Manipulating Text Files (continued) The Complete Guide to Linux System Administration

  20. Manipulating Text Files (continued) The Complete Guide to Linux System Administration

  21. Using sed and awk • sed • Complex filtering program • awk command • Generally used for formatting output The Complete Guide to Linux System Administration

  22. Filtering and Editing Text with sed • sed command • Processes each line in text file according to series of command-line options • Example: • sed -n '/lincoln/p' /tmp/names • Prints to screen all lines of /tmp/names file that contain text “lincoln” • By default, prints each line to STDOUT The Complete Guide to Linux System Administration

  23. Filtering and Editing Text with sed (continued) • Substitution command syntax: • /pattern1/s/pattern2/pattern3/g • Watches for lines containing pattern1 • Replaces occurrences of pattern2 with pattern3 • g option at end of command • Causes sed to replace all occurrences on each line • Means global The Complete Guide to Linux System Administration

  24. Filtering and Editing Text with sed (continued) • Can place operations in file and pass file name to sed command • sed -f nolatin news-article > new_news-article • ( & ) Operator within sed command • Refers to text that matches pattern2 • S/[0-9]*\[0-9][0-9]/\$&/g • sed often useful as part of pipeline of Linux commands The Complete Guide to Linux System Administration

  25. Formatting with awk • Processes text • Extracts parts of file • Formats text according to information you provide on command line or in script file • Format output based on fields within line of text • Often can perform same functions with sed or awk The Complete Guide to Linux System Administration

  26. Formatting with awk (continued) • Each field on line is normally separated by whitespace • Can change which character awk uses to separate fields • First field is referred to by $1 second by $2, etc. • Basic format: /pattern/ { actions } • Example: ls -l | awk '{ print $3 $9 }' The Complete Guide to Linux System Administration

  27. Formatting with awk (continued) • Can include regular expression to select which lines awk includes in output: • ls -l | awk '/^l/ {print $3 $9 }' • Use variable or comparison in awk command • Put at beginning of command instead of pattern • ls -l | awk ' $2 > 3 {print $0 }' • Using awk script file: • awk -f awk_command_list text_file The Complete Guide to Linux System Administration

  28. More Advanced Text Editing • vi editor provides advanced text editing features The Complete Guide to Linux System Administration

  29. File Operations in vi • :w command • Write file you are editing • :r file name • Insert another file into file you are editing • :q command • Exit from vi • :wq • Save and exit The Complete Guide to Linux System Administration

  30. Screen Repositioning • Line number and cursor position on line • Shown at bottom right • Use parentheses and curly braces • Move forward or backward by one sentence or paragraph at a time • Ctrl+f and Ctrl+b key combinations • Move one screen forward and backward The Complete Guide to Linux System Administration

  31. Screen Repositioning (continued) • Shift+G • Take you to any line in file • Enter line number first then Shift+g • Mark • Like bookmark • m command followed by name (a-z and 0-9) • Place mark • ‘ command followed by mark name • Return to mark The Complete Guide to Linux System Administration

  32. Screen Repositioning (continued) • % • Navigate between matching braces, parenthesis, etc. in program source code • Shift+J • Joins two lines The Complete Guide to Linux System Administration

  33. More Line-Editing Commands • :h • View vi help file • Ctrl+] • Navigate to hyperlinks in help files • Ctrl+t • Navigate back from links in help files The Complete Guide to Linux System Administration

  34. More Line-Editing Commands (continued) • Forward slash (/) • Search forward from current cursor position • Can use regular expression as search pattern • n key • Move to next occurrence of search pattern • ? • Search backwards • N key • Move to previous occurrence of pattern The Complete Guide to Linux System Administration

  35. More Line-Editing Commands (continued) • Search-and-replace operations • Format • :line-number-range s/search-pattern/replacement text/flags • Example • :1$ s/^configure/configure/ The Complete Guide to Linux System Administration

  36. More Line-Editing Commands (continued) • Shelling out • Execute another Linux command • As if you were at shell prompt • Type ! followed by command • Example: :!ls /etc/samba The Complete Guide to Linux System Administration

  37. Setting vi Options • :set all • View all options currently set in vi • Press spacebar multiple times to see all screens of settings • :set without the word all • Displays all options that current user has set • :set followed by option • To set option The Complete Guide to Linux System Administration

  38. Setting vi Options (continued) The Complete Guide to Linux System Administration

  39. Setting vi Options (continued) • Can automate settings • Define environment variable called EXINIT that contains set command • Executed each time vi started • EXINIT='set nu nosmartindent' • Place settings in file called .exrc • Overrides information in EXINIT variable The Complete Guide to Linux System Administration

  40. Summary • Regular expressions used in many places to define patterns of information • grep command used to search for lines of text containing pattern defined using regular expression • sed and awk commands support complex scripting language that includes regular expressions The Complete Guide to Linux System Administration

  41. Summary (continued) • vi • Uses complex combinations of commands to reposition cursor within text • Supports search-and-replace operations • set command defines editor settings The Complete Guide to Linux System Administration

More Related