570 likes | 692 Views
Lecture 6 – Regular Expressions. Intro Meta vs. Non-Meta Characters FNRE SMRE. Intro. I have a long text document I want to extract all email addresses Same form, different instantiation How do I find all addresses?. Intro. I have a long text document
E N D
Lecture 6 – Regular Expressions • Intro • Meta vs. Non-Meta Characters • FNRE • SMRE
Intro • I have a long text document • I want to extract all email addresses • Same form, different instantiation • How do I find all addresses?
Intro • I have a long text document • I want to extract all email addresses • Same form, different instantiation • How do I find all addresses? • Regular Expressions!
Regular Expression • AKA regex and regexp • Pattern which describes a piece of some text • For our discussion, we are matching lines of text • Useful for: • Searching text • Replacing text
Non-Meta Characters • Hello stands for the string “Hello” literally • Matches: • “Hello” • Does NOT match • “Hello, Bob” • “Bunnyraptor said Hello today” • “That’s a Hellocopter!”
Meta Characters • Sometimes we need special characters • You’ve seen these before • \n, \r, ^M, blah, blah, blah
Meta Characters • We have some new ones • ., ?, *, (), [], {}, +, |, ^, $ • These will depend on which “kind” of RegEx we’re using • FNRE: File Name RegEx • SMRE: String Matching RegEx
FNRE: * • Match zero or more characters • a*txt • “atxt” • “aa.txt” • “aba.txt”
FNRE: ? • Matches exactly one character • Character could be anything • ?.txt • “a.txt” • “_.txt” • “7.txt”
FNRE: . • Not a meta character for FNRE • We’ll hit it again in SMRE
FNRE: ! • “history” character • !c • A command in your history that begins with character ‘c’
FNRE: $, #, %, # • $ • Value of variable • # • Comments • % • ${fnm%.mp3} • # (v2) • ${fnm#*.}
FNRE: Quoting • ‘text’ • Literal text • “text” • Expand stuff a bit • `text` • “delayed expansion” • Run as command, and replace text with output of command
FRNE: () • () • Group commands together • (echo start; ls –l; echo done) | wc –l
FNRE: [] • [range] • Any characters in range • [d-m] • [chars…] • Any one of the chars inside • [aeiou] • [^chars] • Any character except chars • [^aeiou]
SMRE • Sometimes we aren’t using a shell • Still need regex • There’s a second flavor • SMRE • Used by grep, emacs, sed, etc • The meta characters are different from FNRE
SMRE: . • Matches any single character
SMRE: ^ • Matches text at the beginning of a line • ^Unix • “Unix” • “Unix is pretty nifty” • “Unixisalsofloofy”
SMRE: $ • Matches text at the end of lines • We usually feed a file • Each line is matched to regex • CEG$ • “CEG” • “My degree is CEG” • “I could be CS/CEG”
SMRE: [] • [range] • Same as before • [chars…] • Still the same • [^chars] • Yup…
SMRE: * • Match 0 or more times • Quantifier only…no “value” • .* • Any character, any number of times • NOTE: also matches blank lines
SMRE: + • Match 1 or more times • [ab]+ • Any string containing only characters of a or b • (cat)+ • “cat”, “catcat”, “catcatcat”
SMRE: ? • Match 1 or 0 times • b? • “b”
SMRE: | • Match either of the expressions • a|b • “a”, “b” • (exp1|exp2) • Match one of the “complex” expressions
SMRE: {} • {n} • Match n times • a{5} “aaaaa” • {m,n} • Match between m and n times • a{2,4} “aa”, “aaa”, “aaaa”
SMRE: \ • Escape character • The character interpreted literally • \* • A literal “*” • \n • A new line
Replacements • Finding text is great
Replacements • Finding text is great • …but we usually want to do more than that
Replacements • Finding text is great • …but we usually want to do more than that • …like replace stuff
Replacements • s/PATTERN/REPLACMENT/OPTIONS • s is a keyword. “Search”, “substitute”, etc. • PATTERN is a regex expression that we’re looking for • REPLACEMENT is plain-text that the matched line will be replaced with • OPTIONS is…uh, options
Examples • For following examples, we are doing SMRE
Example 1 • Input • abccba • RegEx • s/a/x/ • Output • xbccba
Example 2 • Input • abccba • RegEx • s/a/x/g • Output • xbccbx
Example 3 • Input • a19b20c3d4e5 • RegEx • s/[0-9]+//g • Output • abcde
Example 4 • Input • duckduckduck • RegEx • s/duck$/goose/ • Output • duckduckgoose
Example 5 • Input • duckduckduck • RegEx • s/^duck/goose/ • Output • gooseduckduck
Example 6 • Input • duckduckduck • RegEx • s/duck/goose/g • Output • goosegoosegoose
Example 7 • Input • abcdefgeeeeee • RegEx • s/[^e]+e/123/ • Output • 123fgeeeeee
Example 8 • Input • abcdefgeeeeee • RegEx • s/.*e/123/ • Output • 123
Capturing • Regular expressions can be complex • Makes them difficult to work with • a*b+c • a*, b+, c
Capturing • a*b+c isn’t obvious as to sub-expresssions • (a*)(b+)(c) is much easier to see
Capturing • Consider (a*)(b+)(a*) • What if I require both “(a*)” sub-expressions to match exactly? • Will this do it?
Capturing • As-is, it will not work • aabbaaaaaa • (a*)(b+)\1 WILL work • \1 refers to subexpression 1
Subexpressions • (A[BC](dE))([fF][Gg]) • (dE) is 1 • (A[BC](dE)) is 2 • ([fF][Gg]) is 3
Example 1 • (ABC)def\1 • Matches • ABCdefABC • Doesn’t match • ABCdefAbc, ABCdefabc
Example 2 • ([aA][bB][cC])def\1 • Matches • aBCdefaBC, AbcdefAbc, AbCdefAbC • Doesn’t match • abcdefABC, Abcdefabc
Example 3 • (…..) ham \1 • Matches • Hello ham Hello, 12345 ham 12345 • Doesn’t match • Hello ham H…o
Example 4 • ([a-z][A-Z]) ([1-9])\2\1 • Matches • aA11aA, aF99aF • Does not match • aA11Aa, aA12aA
Editors • The lab will require you to use several editors • vi • Emacs • sed • You should read Sobell chapters 6, 7, and 13, respectively
Find • It finds files that meet the specified criteria • find PATH… EXPRESSION… • PATH describe s directories to examine • EXPRESSION can be option, a test, or an action