160 likes | 301 Views
^Regular Expression$. "Sometimes you have a programming problem and it seems like the best solution is to use regular expressions; now you have two problems." - from Programmer’s Comments. What is a RE?. What is an Expression? ~ not interpreted literally, needs to be evaluated
E N D
^Regular Expression$ "Sometimes you have a programming problem and it seems like the best solution is to use regular expressions; now you have two problems." - from Programmer’s Comments
What is a RE? • What is an Expression? ~ not interpreted literally, needs to be evaluated • What is a Regular Expression? ~ describes patterns, or sequences of characters without necessarily specifying the characters literally • Regular Expressions ARE NOT the same as the shell’s wildcards.
How does it match? • RE is descriptive of a pattern or sequence of characters • Concatenation is the basic operation implied in every RE; that is a pattern matches adjacent characters. • To see if the string matches the pattern, it compares the 1st char in the string to the 1st char in the pattern. • If there is a match, it compares the 2nd char in the string to the 2nd char in the pattern • Whenever if fails to make a match, it compares the NEXT char in the string to the 1st char in the pattern
How does it REALLY match? Example: The string of char= “I flip in my flip flops” The pattern = “flo” No match I flo flip flo match flip flo match again Compare the next char of the string w/ the 1st char of the pattern Since there is a match, the 2nd char in the pattern is compared with the next char in the string Since there is a match again, the 3rd char in the pattern is compared with the next char in the string No match flip flo . . . flops flo flops flo match flops flo match The 3rd char in the pattern does NOT match the next char in the string. So returning to the 1st char in the pattern, it compares the next char in the string. There is no match, so process starts all over again Compare the 2nd char in the pattern with the next char in the string Compare the 3rd char in the pattern with the next char in the string. Since this is also a match, the string of chars matches the pattern Question: What is the next comparison step?
What we already know… • RE is not limited to literal characters • There is meta-character - the dot - /. / that can be used to match any single character. Review Question:What shell wildcard is it similar to? Examples: What will be matched by .at ? What about at ? Review Question:What other shell wildcard do we know? What does it match? • in RE, * is also used to match 0 or more occurrences of characters, but only PRECEDING characters • In RE, the meta-character * does NOT match anything itself, it only modifies what goes before it Examples: Will a.*e match the following strings: able, affordable, a red apple ? In general, what does RE.* match?
The dot ( . ) and the asterisk ( * ) • A . matches a single character (like the shell’s ?). • The pattern 20.. matches a four character pattern beginning with 2. • The * refers to the immediately preceding character • This character bears no resemblance to the shell’s *. • As a regular expression it indicates that the previous character can occur many times or no times at all. • The pattern e* matches e, ee, eee, eeee, …., but it also matches a null string. If you are looking to match a string beginning with e then use ee*. • The Regular Expression .* • The dot along with the * (.*) is a useful regular expression. • It signifies any number of characters or none.
Using Meta-Characters in RE By looking at . and * examples, we’ve just looked at two out of three important parts of RE, characters and modifiers: • Character sets match one or more characters in a single position Example: the hash mark # is a simplechar set that matches a single char • Modifiers specify how many times the previous character is repeated Example:* is a modifier that … • Anchors are used to specify the position of a pattern in relation to a line of a text. Example: the caret ^ is an anchor that indicates the beginning of the line Question: Which one of these expressions is more useful? Why? ^.* ^*
The Closer Look at ^Anchor$ Pattern Matches • Since most UNIX text utilities are line oriented, end-of-line char is a separator. • RE examine text between separators, so in order to search for a pattern on either end of a line, we use anchors: ^ and $ • ^ and $ are only anchors at the beginning and at the end of the line respectively • If it is required to match ^ at the beginning of the line or $ at the end of the line, we escape the special char by preceding it with ( \ ) Question: What will ^ . $ match? Note:. doesn’t match end-of-line char
The character class, [ ] • like the shell’s character class, it lets you specify a group of characters within brackets, []. • The match is performed for any single character in the group. • For example, the expression [to] matches either t or o. Negating a class (^) • Regular expressions use the ^ (caret) to negate a character class while the shell uses the ! (bang). • When the character class begins with a caret all characters other than the ones grouped in the class are matched. • For example, [^a-zA-Z] matches a single nonalphabetic character.
Using [ ] in a Range of Characters RE Matches • To match specific characters in a set, use [ ] • ^[0123456789] $ will match any line of text that has exactly ONE digit. • It is the same as ^[0-9]$ • It is possible to intermix specific characters with char ranges as in [A-Za-z0-9_] Question: what will it match? • ^ following [ at the beginning specifies exceptions as in [^AEIOU] matches all chars except upper case vowels Question: what will match [^-_a-z] ?
When metacharacters lose their meaning • The dot ( . ) and asterisk ( * ) lose their meaning inside the character class and are interpreted literally. • The * is also matched literally if it’s the first character in the expression. • We could escape these characters: • For example, to look for a pattern g* you need to use grep “g\*”. • To look for a [, use \[, • To look for the literal pattern .* use \.\*
grep - global regular expressions print • grep is UNIX command for handling search requirements, • grep scans its input for a pattern, • it can display the selected pattern, the line numbers, or filenames where the pattern occurs. • grep also happens to be a filter so it can search its standard input for a pattern and store the output in a file. Example: who | grep henry > foo
fgrep: supports only string patterns - No regular expressions Fast Grep grep: supports only a limited number of regular expressions Grep egrep: supports most regular expressions but not all of them Extended Grep grep -- Global Regular Expression Print The Grep Family
Quoting in grep • Since shell metacharacter are expanded before the shell passes the arguments to the program, the command: $ grep [A-Z]*.c chap[12] could be interpreted by the shell as: grep Array.c Bug.c Comp.c chap1 chap2; so grep would search for the pattern Array.c in the files Bug.c, Comp.c, chap1, and chap2. • Where as the intention may have been to search the files chap1 and chap2 for the pattern [A-Z]*.c. • The solution in most cases is to surround the regular expression in quotes.
When grep fails… • When grep doesn’t find a pattern it is silent. • However, grep returns the following exit values: • 0 One or more matches were found. • 1 No matches were found. • 2 Syntax errors or inaccessible files (even if matches were found). • Exit values from the last executed command are stored in the special variable $?. • When writing scripts you will often test the value of this variable to control the flow of execution.