200 likes | 402 Views
The grep Command. Purpose & Use. Searches the input files for lines containing a match to a given pattern list copies the line to standard output produces whatever sort of output is requested with options. The grep command. Matching on text
E N D
Purpose & Use • Searches the input files for lines containing a match to a given pattern list • copies the line to standard output • produces whatever sort of output is requested with options
The grep command • Matching on text • No limit on input file length other than available memory • Arbitrary characters within a line
Invoking grep • General synopsis: • grep options pattern input_file_names • Zero or more options • Zero or more input file names
Command-line Options • POSIX.2 • GNU extensions • long option names
Generic Program Information • ‘--help’ • command-line options and the bug-reporting address and exit • ‘-V’ ‘--version’ • version number of grep
Matching Control • ‘-e pattern’ Use pattern as a pattern ‘--regexp=pattern’ • ‘-f file’ Obtain patterns from file, one per line ‘--file=file’ • ‘-i’ ‘-y’ Ignore case ‘--ignore-case’ • ‘-v’ Selects non-matching lines ‘—invert-match’ • ‘-w’ Lines w. matches that form whole words ‘--word-regexp’ • ‘-x’ Matches that exactly match the whole line ‘--line-regexp’
General Output Control • ‘-c’, ‘--count’ prints a count of matching lines • ‘--color[=WHEN]’ Surround the matched strings, matching lines, context • ‘-- colour[=WHEN]’ lines, file names, line numbers, byte offsets, and separators with escape sequences to display them in color on the terminal. • ‘-L’ prints out name of files without any match ‘--files-without-match’ • ‘-l’ prints out names f files with match ‘--files-with-matches’ • ‘-m num’ stops reading file after num matches ‘--max-count=num’ (!) when used with other options like ‘-c’ or ‘-v’ • ‘-o’ prints each matched part of each line on a ‘--only-matching’ separate line • ‘-q’, ‘--quiet’, ‘--silent’ no output • ‘-s’, ‘--no-messages’ suppress error messages about nonexistent or unreadable files
Output Line Prefix Control • order is always file name, line number, and byte offset • ‘-b’, ‘--byte-offset’ prints the 0-based byte offset • ‘-H’, ‘--with-filename’ prints file name, default when more than 1 file • ‘-h’, ‘--no-filename’ doesn’t print out file names • ‘--label=LABEL’ display input as i. coming from file LABEL • ‘-n’, ‘--line-number’ displays 1-based line number • ‘-T’, ‘--initial-tab’ the first character of line content lies on a tab stop • ‘-u’, ‘--unix-byte-offsets’ Unix-style byte offset; used with ‘-b’ • ‘-Z’, ‘--null’ outputs zero instead of the character that follows the file name
Context Line Control • Cannot be used with ‘-o’ or ‘--only-matching’ • ‘-A num’ prints num lines of trailing context ‘--after-context=num’ after matching lines • ‘-B num’ prints num lines of leading context ‘--before-context=num’ before matching lines • ‘-C num’, ‘-num’ Print num lines of leading and trailing ‘--context=num’ output context • ‘--group-separator=string’ print string instead of ‘--’ around disjoint groups of lines. • ‘--no-group-separator’ print disjoint groups of lines adjacent to each other
File and Directory Selection • ‘-a’, ‘--text’ process a binary file as a text file • ‘--binary-files=type’ if the first few bytes are of type binary assume that file is binary • ‘-D action’ for device, FIFO, or socket file, use ‘--devices=action’ action to process it (read or skip) • ‘-d action’ for directory file, use action to process it ‘--directories=action’ (read, skip, recurse) • ‘--exclude=glob’ skip files whose base name matches glob • ‘--exclude-dir=dir’ exclude directories matching the pattern dir from recursive directory searches • ‘-I’ process binary files as if there is no match ‘--binary-files=without-match’ • ‘--include=glob’ files whose base name matches glob only • ‘-r’, ‘-R’, ‘--recursive’ process all files in that directory, recursively
Other Options • ‘--line-buffered’ use line buffering on output; can cause a performance penalty • ‘--mmap’ ignored for backwards compatibility; reads input with the mmap system call, rarely if ever yields better performance • ‘-U’, ‘--binary’ treats the file(s) as binary • ‘-z’ ‘--null-data’ treats the input as a set of lines, each terminated by a zero byte
Environment Variables • GREP_OPTIONS specifies default options to be placed in front of any explicit options • LC_ALL specify the locale for the LC_COLLATE category LC_COLLATE which determines the collating sequence used to LANG interpret range expressions • LC_ALL specify the locale for the LC_CTYPE category LC_CTYPE which determines the type of characters LANG • LC_ALL specify the locale for the LC_MESSAGES category LC_MESSAGES which determines the language that grep uses for LANG messages • POSIXLY_CORRECT grep behaves as posix.2 requires
Environment Variables • GREP_COLOR specifies the color used to highlight matched text • GREP_COLORS specifies the colors and other attributes used to highlight various parts of the output • Capabilities: • sl= whole selected lines / context matching lines • cx= whole context lines/ selected non-matching lines • rv Boolean value that reverses (swaps) the meanings of the ‘sl=’ and‘cx=’ capabilities when the ‘-v’ command-line option is specified • mt=01;31 matching non-empty text in any matching line • ln=32 line numbers prefixing any content line
Exit Status • 0 if selected lines are found • 1 if selected lines are not found • 2 if an error occured
grep Programs • four major variants of grep, controlled by the following options: • ‘-G’, ‘--basic-regexp’ • Interpret the pattern as a basic regular expression (BRE). This is the default. • ‘-E’, ‘--extended-regexp’ • Interpret the pattern as an extended regular expression • ‘-F’, ‘--fixed-strings’ • Interpret the pattern as a list of fixed strings, separated by newlines • ‘-P’, ‘--perl-regexp’ • Interpret the pattern as a Perl regular expression
Regular Expressions • Is a pattern that describes a set of strings • fundamental building blocks are the regular expressions that match a single character • A regular expression may be followed by one of several repetition operators: • ‘.’ matches any single character • ‘?’ The preceding item will be matched at most once. • ‘*’ The preceding item will be matched zero or more times. • ‘+’ The preceding item will be matched one or more times. • ‘{n}’ The preceding item is matched exactly n times. • ‘{n,}’ The preceding item is matched n or more times. • ‘{,m}’ The preceding item is matched at most m times. • ‘{n,m}’ The preceding item is matched at least n times, but not more than m times. expressions may be joined by the infix operator ‘|’ Repetition -> concatenation -> alternation
Bracket Expressions • matches any single character in that list • If (^) is at the beginning then it matches any characters not in the list • classes of characters: • ‘[:alnum:]’ Alphanumeric characters • ‘[:alpha:]’ Alphabetic characters • ‘[:blank:]’ Blank characters: space and tab. • ‘[:cntrl:]’ Control characters. • ‘[:digit:]’ Digits • ‘[:graph:]’ Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’. • ‘[:lower:]’ Lower-case letters • ‘[:print:]’ Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space. • [:punct:]’ Punctuation characters: ! " # $ % & ’ ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _‘ { | } ~. • ‘[:space:]’ Space characters: tab, newline, vertical tab, form feed, carriage return • ‘[:upper:] Upper-case letters • ‘[:xdigit:]’ Hexadecimal digits
The Backslash Character and Special Expressions • ‘‘\b’’ Match the empty string at the edge of a word. • ‘‘\B’’ Match the empty string provided it’s not at the edge of a word. • ‘‘\<’’ Match the empty string at the beginning of word. • ‘‘\>’’ Match the empty string at the end of word. • ‘‘\w’’ Match word constituent, it is a synonym for ‘[[:alnum:]]’. • ‘‘\W’’ Match non-word constituent, it is a synonym for ‘[^[:alnum:]]’. • Ex: ‘\brat\b’ matches the separate word ‘rat’, ‘\Brat\B’ matches ‘crate’ but not ‘furry rat’
Example & Questions grep -i ’hello.*world’ menu.hmain.c • How can you list just the names of matching files? • How do you search directories recursively? • What if a pattern has a leading ‘-’? • How do you search for a whole word, not a part of a word?