1 / 20

Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions. 2007 년 3 월 30 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 21 ~ 33. Outline. Introduction Basic Regular Expression Patterns Disjunction, Grouping, and Precedence A Simple Example

marnie
Download Presentation

Chapter 2. Regular Expressions and Automata 2.1 Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2. Regular Expressions and Automata2.1 Regular Expressions 2007년 3월 30일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 21 ~ 33

  2. Outline • Introduction • Basic Regular Expression Patterns • Disjunction, Grouping, and Precedence • A Simple Example • A More Complex Example • Advanced Operators • Regular Expression Substitution, Memory, and ELIZA

  3. Introduction • One of the unsung successes in standardization in computer science • a language for specifying text search strings • an algebraic notation for characterizing a set of strings • regular expression search • requires a pattern that we want to search • function will search through the corpus returning all texts that contain the pattern 3 / 20

  4. Basic Regular Expression Patterns (1/6) • metacharacter the slash / • metacharacter the square bracket [ ] 4/ 20

  5. Basic Regular Expression Patterns (2/6) • metacharacter the dash – • / [123456789] / / [1-9] / • / [ABCDEFGHIJKLMNOPQRSTUVWXYZ] / / [A-Z] / 5 / 20

  6. Basic Regular Expression Patterns (3/6) • metacharacter the caret ^ 6 / 20

  7. Basic Regular Expression Patterns (4/6) • metacharacter the question-mark ? • Kleene * • zero or more occurrences of the immediately previous character or regular expression • /a*/ means ‘any string of zero or more as’ • /aa*/ means ‘one or more as’ • /[ab]*/ means ‘zero or more as or bs’ 7 / 20

  8. Basic Regular Expression Patterns (5/6) • Kleene + • one or more of the previous character • /baaa*!/ = /baa+!/ • metacharacter period . (wildcard expression) • /beg.n/ • any character between beg and n • begin, beg’n, begun • .* • any string o fcharacters 8 / 20

  9. Basic Regular Expression Patterns (6/6) • Anchor • special metacharacter • caret ^ matches the start of a line • dollar sign $ matches the end of line • / ^The dog\.$/ matches a line that contains only the phrase The dog. • \b matches a word boundary • /the/ VS /\bthe\b/ • there • / ^ $/ 9 / 20

  10. Disjunction, Grouping, and Precedence • We can’t use the [] to search for “cat or dog” • metacharater pipe symbol | • /cat | dog/ matches either cat or the string dog • How can I specify both guppy and guppies? • /guppy|ies/ • sequences like guppy take precedence over the | • /guppy(y|ies)/ 10 / 20

  11. Disjunction, Grouping, and Precedence • operator precedence hierarchy 11 / 20

  12. A simple Example • to write a RE to find cases of the English article the • / the / • this pattern will miss the word when it begins a sentencc and hence is capitalized (i.e., The) • / [tT]he / • the embedded in other words (e.g., other or theology) • / \b[tT]he\b / • / [^a-zA-Z] [tT]he [^a-zA-Z] / • (^|/ [^a-zA-Z]) [tT]he [^a-zA-Z] / 12 / 20

  13. A More Complex Example (1/2) • "any PC with more than 500 MHz and 32 Gb of disk space for less than $l000” • regular expression for prices (e.g., $999.99) • simple regular expression for prices • / $ [0-9] + / • to deal with fractions of dollars • / $ [0-9] + \. [0-9] [0-9] / • this pattern only allows $199.99 but not $199 • / \b $ [0-9] + ( \. [0-9] [0-9] )? \b / 13 / 20

  14. A More Complex Example (2/2) • regular expression for processor speed • regular expression operating systems and vendors 14 / 20

  15. Advanced Operators (1/3) • Aliases for common sets of characters 15 / 20

  16. Advanced Operators (2/3) • Regular expression operators for counting • / a \.{24} z / 16 / 20

  17. Advanced Operators (3/3) • Some characters that need to be backslashes 17 / 20

  18. Regular Expression Substitution, Memory, and ELIZA(1/2) • Perl substitution operator s / regexp1 / regexp2 / • s / colour / color / • number operator (using memory) • changing the 35 boxes to <35> boxes • s / ([0 - 9] +) / <\1> / • /the (.*)er they were, the \ler they will be/ • will match The bigger they well be, the bigger they were • but not The bigger they well be, the faster they were • these numbered memories are called resisters • “extended” feature of regular expressions 18 / 20

  19. Regular Expression Substitution, Memory, and ELIZA(2/2) • number operator (Cont’) • /the (.*)er they (.*), the \ler they \2/ • will match The bigger they were, the bigger they were • but not The bigger they were, the bigger they will be • ELIZA • simple natural-language understanding program (1966) • substitution using memory 19 / 20

  20. Regular Expression Substitution, Memory, and ELIZA(3/3) 20 / 20

More Related