1 / 57

Lecture 6 – Regular Expressions

Lecture 6 – Regular Expressions. Intro Meta vs. Non-Meta Characters FNRE SMRE. Intro. I have a long text document I want to extract all email addresses Same form, different instantiation How do I find all addresses?. Intro. I have a long text document

arden-burke
Download Presentation

Lecture 6 – Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6 – Regular Expressions • Intro • Meta vs. Non-Meta Characters • FNRE • SMRE

  2. Intro • I have a long text document • I want to extract all email addresses • Same form, different instantiation • How do I find all addresses?

  3. Intro • I have a long text document • I want to extract all email addresses • Same form, different instantiation • How do I find all addresses? • Regular Expressions!

  4. Regular Expression • AKA regex and regexp • Pattern which describes a piece of some text • For our discussion, we are matching lines of text • Useful for: • Searching text • Replacing text

  5. Non-Meta Characters • Hello stands for the string “Hello” literally • Matches: • “Hello” • Does NOT match • “Hello, Bob” • “Bunnyraptor said Hello today” • “That’s a Hellocopter!”

  6. Meta Characters • Sometimes we need special characters • You’ve seen these before • \n, \r, ^M, blah, blah, blah

  7. Meta Characters • We have some new ones • ., ?, *, (), [], {}, +, |, ^, $ • These will depend on which “kind” of RegEx we’re using • FNRE: File Name RegEx • SMRE: String Matching RegEx

  8. FNRE: * • Match zero or more characters • a*txt • “atxt” • “aa.txt” • “aba.txt”

  9. FNRE: ? • Matches exactly one character • Character could be anything • ?.txt • “a.txt” • “_.txt” • “7.txt”

  10. FNRE: . • Not a meta character for FNRE • We’ll hit it again in SMRE

  11. FNRE: ! • “history” character • !c • A command in your history that begins with character ‘c’

  12. FNRE: $, #, %, # • $ • Value of variable • # • Comments • % • ${fnm%.mp3} • # (v2) • ${fnm#*.}

  13. FNRE: Quoting • ‘text’ • Literal text • “text” • Expand stuff a bit • `text` • “delayed expansion” • Run as command, and replace text with output of command

  14. FRNE: () • () • Group commands together • (echo start; ls –l; echo done) | wc –l

  15. FNRE: [] • [range] • Any characters in range • [d-m] • [chars…] • Any one of the chars inside • [aeiou] • [^chars] • Any character except chars • [^aeiou]

  16. SMRE • Sometimes we aren’t using a shell • Still need regex • There’s a second flavor • SMRE • Used by grep, emacs, sed, etc • The meta characters are different from FNRE

  17. SMRE: . • Matches any single character

  18. SMRE: ^ • Matches text at the beginning of a line • ^Unix • “Unix” • “Unix is pretty nifty” • “Unixisalsofloofy”

  19. SMRE: $ • Matches text at the end of lines • We usually feed a file • Each line is matched to regex • CEG$ • “CEG” • “My degree is CEG” • “I could be CS/CEG”

  20. SMRE: [] • [range] • Same as before • [chars…] • Still the same • [^chars] • Yup…

  21. SMRE: * • Match 0 or more times • Quantifier only…no “value” • .* • Any character, any number of times • NOTE: also matches blank lines

  22. SMRE: + • Match 1 or more times • [ab]+ • Any string containing only characters of a or b • (cat)+ • “cat”, “catcat”, “catcatcat”

  23. SMRE: ? • Match 1 or 0 times • b? • “b”

  24. SMRE: | • Match either of the expressions • a|b • “a”, “b” • (exp1|exp2) • Match one of the “complex” expressions

  25. SMRE: {} • {n} • Match n times • a{5}  “aaaaa” • {m,n} • Match between m and n times • a{2,4}  “aa”, “aaa”, “aaaa”

  26. SMRE: \ • Escape character • The character interpreted literally • \* • A literal “*” • \n • A new line

  27. Replacements • Finding text is great

  28. Replacements • Finding text is great • …but we usually want to do more than that

  29. Replacements • Finding text is great • …but we usually want to do more than that • …like replace stuff

  30. Replacements • s/PATTERN/REPLACMENT/OPTIONS • s is a keyword. “Search”, “substitute”, etc. • PATTERN is a regex expression that we’re looking for • REPLACEMENT is plain-text that the matched line will be replaced with • OPTIONS is…uh, options

  31. Examples • For following examples, we are doing SMRE

  32. Example 1 • Input • abccba • RegEx • s/a/x/ • Output • xbccba

  33. Example 2 • Input • abccba • RegEx • s/a/x/g • Output • xbccbx

  34. Example 3 • Input • a19b20c3d4e5 • RegEx • s/[0-9]+//g • Output • abcde

  35. Example 4 • Input • duckduckduck • RegEx • s/duck$/goose/ • Output • duckduckgoose

  36. Example 5 • Input • duckduckduck • RegEx • s/^duck/goose/ • Output • gooseduckduck

  37. Example 6 • Input • duckduckduck • RegEx • s/duck/goose/g • Output • goosegoosegoose

  38. Example 7 • Input • abcdefgeeeeee • RegEx • s/[^e]+e/123/ • Output • 123fgeeeeee

  39. Example 8 • Input • abcdefgeeeeee • RegEx • s/.*e/123/ • Output • 123

  40. Capturing • Regular expressions can be complex • Makes them difficult to work with • a*b+c • a*, b+, c

  41. Capturing • a*b+c isn’t obvious as to sub-expresssions • (a*)(b+)(c) is much easier to see

  42. Capturing • Consider (a*)(b+)(a*) • What if I require both “(a*)” sub-expressions to match exactly? • Will this do it?

  43. Capturing • As-is, it will not work • aabbaaaaaa • (a*)(b+)\1 WILL work • \1 refers to subexpression 1

  44. Subexpressions • (A[BC](dE))([fF][Gg]) • (dE) is 1 • (A[BC](dE)) is 2 • ([fF][Gg]) is 3

  45. Example 1 • (ABC)def\1 • Matches • ABCdefABC • Doesn’t match • ABCdefAbc, ABCdefabc

  46. Example 2 • ([aA][bB][cC])def\1 • Matches • aBCdefaBC, AbcdefAbc, AbCdefAbC • Doesn’t match • abcdefABC, Abcdefabc

  47. Example 3 • (…..) ham \1 • Matches • Hello ham Hello, 12345 ham 12345 • Doesn’t match • Hello ham H…o

  48. Example 4 • ([a-z][A-Z]) ([1-9])\2\1 • Matches • aA11aA, aF99aF • Does not match • aA11Aa, aA12aA

  49. Editors • The lab will require you to use several editors • vi • Emacs • sed • You should read Sobell chapters 6, 7, and 13, respectively

  50. Find • It finds files that meet the specified criteria • find PATH… EXPRESSION… • PATH describe s directories to examine • EXPRESSION can be option, a test, or an action

More Related