1 / 40

Regular Expressions

Regular Expressions. Programming Language Concepts Lecture 5. Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida. Regular Expressions. A compact, easy-to-read language description.

Download Presentation

Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions Programming Language Concepts Lecture 5 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida

  2. Regular Expressions • A compact, easy-to-read language description. • Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.

  3. Regular Expressions Definition: A regular expression over an alphabet Σ is recursively defined as follows: • ø denotes language ø • ε denotes language {ε} • a denotes language {a}, for all a  Σ. • (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s. • (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s. • P* denotes L(P)*, where P is r.e. To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +

  4. Regular Expressions Examples: (O + 1)*: any string of O’s and 1’s. (O + 1)*1: any string of O’s and 1’s, ending with a 1. 1*O1*: any string of 1’s with a single O inserted. Letter (Letter + Digit)*: an identifier. Digit Digit*: an integer. Quote Char* Quote: a string.† # Char* Eoln: a comment. † {Char*}: another comment. † † Assuming that Char does not contain quotes, eoln’s, or } .

  5. Regular Expressions Conversion from Right-linear grammars to regular expressions Example: S → aS R → aS → bR → ε What does S → aS mean? L(S)  {a}·L(S) S → bR means L(S)  {b}·L(R) S → ε means L(S) {ε}

  6. Regular Expressions Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε} or S = aS + bR + ε Similarly, R → aS means R = aS. Thus, S = aS + bR + ε R = aS System of simultaneous equations, in which the variables are nonterminals.

  7. Regular Expressions Solving systems of simultaneously equations. S = aS + bR + ε R = aS Back substitute R = aS: S = aS + baS + ε = (a + ba) S + ε Question: What to do with equations of the form: X = X + β ?

  8. Regular Expressions Answer: β  L(x), so αβ  L(x), ααβ  L(x), αααβ  L(x), … Thus α*β = L(x). In our case, S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*

  9. Regular Expressions Right-linear regular grammar ↓ regular expression 1. A = α1 + α2 + … + αn if A → α1 → α2 . . . → αn

  10. Regular Expressions • If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α. If equation is of the form X = αX + β, where X does not occur in either α or β, then replace the equation with X = α*β. Note: Some algebraic manipulations may be needed to obtain the form X = αX + β. Important: Catenation is not commutative!!

  11. Regular Expressions Example: S → a R → abaU U → aS → bU → U → b → bR S = a + bU + bR R = abaU + U = (aba + ε) U U = aS + b Back substitute R: S = a + bU + b(aba + ε) U U = aS + b

  12. Regular Expressions Back substitute U: S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb = (ba + babaa)S + (a + bb + babab) therefore S = (ba + babaa)*(a + bb + babab) repeats

  13. Regular Expressions Summarizing: RGR RGL Minimum DFA RE NFA DFA Done Soon

  14. Regular Expressions Regular Expression ↓ NFA Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state. Conversions: • if ø ALGORITHM 1 1 2

  15. Regular Expressions • if ε • if a • if P + Q • if P· Q or 1 a 1 2 P ε ε 1 2 ε ε Q ε P Q ε ε ε 1 P Q 2

  16. Regular Expressions ε • if P* Example:(b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* ε ε 1 P 2 ε b 1 2 a 3 4 b 5 6

  17. Regular Expressions a (b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* 7 8 9 a 10 11 a ε b 3 4 5 6 ε 7 8 a

  18. Regular Expressions (b (aba + ε) a)* (b (aba + ε) a)* a ε b 3 ε 4 5 6 ε 12 9 7 ε 13 8 ε ε a b 2 1 ε a ε b 3 ε 4 5 6 ε 12 9 7 ε 13 8 ε ε a

  19. Regular Expressions (b (aba + ε) a) * b 2 1 ε a ε b ε 3 4 5 6 ε 12 9 7 ε 13 8 ε ε a ε 10 a 11

  20. Regular Expressions (b (aba + ε) a)* ε b ε ε a 14 2 1 12 3 4 ε ε ε ε 11 9 5 ε a ε 15 13 6 10 8 7 ε ε a ε

  21. Regular Expressions Regular Expression ↓ NFA Start With: ALGORITHM 2 E

  22. Regular Expressions Apply Rules: a a* ε ε ab a b a a + b b

  23. Regular Expressions Algorithm 1: • Builds FSA bottom up • Good for machines • Bad for humans Algorithm 2: • Builds FSA top down • Bad for machines • Good for humans Arguable

  24. Regular Expressions Example (Algorithm 2): (a + b)* (aa + bb) (a + b)* aa + bb aa ε ε bb a + b a a ε ε b b a b

  25. Regular Expressions Example (Algorithm 2): ba(a + b)* ab a b a ε ε a b b

  26. Regular Expressions Deterministic Finite-State Automata (DFA’s) Definition: A deterministic FSA is defined just like an NFA, except that δ: Q x Σ → Q, rather than δ: Q x Σ U {ε}→ 2Q Thus, both and are impossible. ε a a

  27. Regular Expressions Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s. Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.

  28. Regular Expressions Conversion from NFA’s to DFA’s: • “Simulate” all moves of the NFA with the DFA. • The start state of the DFA is the start state of the NFA (say, S), together with states that are ε-reachable from S. • Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states. • New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state. • The final states in the DFA are those that contain any final state of the NFA.

  29. Regular Expressions Example: a*b + ba* a ε 2 3 b ε a 6 1 NFA b ε ε 4 5

  30. a ε 2 3 b ε a 6 1 b ε ε 4 5 Regular Expressions DFA Input State a b 123 23 456 23 23 6 456 56 --- 6 --- --- 56 56 --- NFA a b 6 23 a DFA 123 b a a 456 56

  31. ε a 4 5 ε ε b a ε 8 0 1 2 3 ε 6 7 ε b NFA ε ε b a 11 10 9 Regular Expressions In general, if NFA has N states, the DFA can have as many as 2N states. Example: ba (a + b)* ab

  32. DFA Input State a b 0 --- 1 1 234689 --- 234689 34568910 346789 34568910 34568910 34678911 346789 34568910 346789 34678911 34568910 346789

  33. Regular Expressions a 34568910 b a b a a 0 1 234689 a 34678911 b 346789 b b

  34. Regular Expressions State Minimization Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’. Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.

  35. Regular Expressions Example: S = {1, 2, 3, 4, 5} Π1 = { {1, 2, 3, 4}, {5} } Π2 = { {1, 2, 3,}, {4}, {5} } Π3 = { {1, 3}, {2}, {4}, {5} } Note:Π2 is a refinement of Π1, and Π3 is a refinement of Π2.

  36. Regular Expressions Minimization Algorithm: • Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable. • Partition all states into two groups (final states and non-final states). • Complete the “Next State” table for each group, by specifying transitions from group to group. Form the next partition: split groups in which Next State table entries differ. Repeat 3 until no further splitting is possible. • Determine start and final states.

  37. 2 4 a a 1 a b a b b 3 5 Regular Expressions a b Example: Π0 = { {1, 2, 3, 4}, {5} } State a b 1 1234 1234 2 1234 1234 3 1234 1234 4 1234 5 5 1234 1234 b Split {4} from partition {1,2,3,4}

  38. 2 4 a a 1 a b a b b 3 5 Regular Expressions Π1 = { {1, 2, 3}, {4}, {5} } State a b 1 123 123 2 123 4 3 123 123 4 123 5 5 123 123 Split {2} from partition {1,2,3}

  39. b 13 b a a a 5 2 a b 4 Regular Expressions Π2 = { {1, 3}, {2}, {4}, {5} } State a b 1 2 13 3 2 13 2 2 4 4 2 5 5 2 13 No more splitting Minimal DFA

  40. Regular Expressions Summary of Regular Languages • Smallest class in the Chomsky hierarchy. • Appropriate for lexical analysis. • Four representations: RGR ,RGL , RE and FSA. • All four are equivalent; there are algorithms to perform transformations among them. • Various advantages and disadvantages among these four, for language designer, implementor, and user. • FSA’s can be made deterministic, and minimal.

More Related