400 likes | 537 Views
Regular Expressions. Programming Language Concepts Lecture 5. Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida. Regular Expressions. A compact, easy-to-read language description.
E N D
Regular Expressions Programming Language Concepts Lecture 5 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida
Regular Expressions • A compact, easy-to-read language description. • Use operators to denote the language constructors described earlier, to build “complex” languages from simple “atomic” ones.
Regular Expressions Definition: A regular expression over an alphabet Σ is recursively defined as follows: • ø denotes language ø • ε denotes language {ε} • a denotes language {a}, for all a Σ. • (P + Q) denotes L(P) U L(Q), where P, Q are r.e.’s. • (PQ) denotes L(P)·L(Q), where P, Q are r.e.’s. • P* denotes L(P)*, where P is r.e. To prevent excessive parentheses, we assume left associativity, with the following operator precedence hierarchy, from most to least binding: *, ·, +
Regular Expressions Examples: (O + 1)*: any string of O’s and 1’s. (O + 1)*1: any string of O’s and 1’s, ending with a 1. 1*O1*: any string of 1’s with a single O inserted. Letter (Letter + Digit)*: an identifier. Digit Digit*: an integer. Quote Char* Quote: a string.† # Char* Eoln: a comment. † {Char*}: another comment. † † Assuming that Char does not contain quotes, eoln’s, or } .
Regular Expressions Conversion from Right-linear grammars to regular expressions Example: S → aS R → aS → bR → ε What does S → aS mean? L(S) {a}·L(S) S → bR means L(S) {b}·L(R) S → ε means L(S) {ε}
Regular Expressions Together, they mean that L(S) = {a}·L(S) + {b}·L(R) + {ε} or S = aS + bR + ε Similarly, R → aS means R = aS. Thus, S = aS + bR + ε R = aS System of simultaneous equations, in which the variables are nonterminals.
Regular Expressions Solving systems of simultaneously equations. S = aS + bR + ε R = aS Back substitute R = aS: S = aS + baS + ε = (a + ba) S + ε Question: What to do with equations of the form: X = X + β ?
Regular Expressions Answer: β L(x), so αβ L(x), ααβ L(x), αααβ L(x), … Thus α*β = L(x). In our case, S = (a + ba) S + ε = (a + ba)* ε = (a + ba)*
Regular Expressions Right-linear regular grammar ↓ regular expression 1. A = α1 + α2 + … + αn if A → α1 → α2 . . . → αn
Regular Expressions • If equation is of the form X = α, where X does not appear in α, then replace every occurrence of X with α in all other equations, and delete equation X = α. If equation is of the form X = αX + β, where X does not occur in either α or β, then replace the equation with X = α*β. Note: Some algebraic manipulations may be needed to obtain the form X = αX + β. Important: Catenation is not commutative!!
Regular Expressions Example: S → a R → abaU U → aS → bU → U → b → bR S = a + bU + bR R = abaU + U = (aba + ε) U U = aS + b Back substitute R: S = a + bU + b(aba + ε) U U = aS + b
Regular Expressions Back substitute U: S = a + b(aS + b) + b(aba + ε)(aS + b) = a + baS + bb + babaaS + babab + baS + bb = (ba + babaa)S + (a + bb + babab) therefore S = (ba + babaa)*(a + bb + babab) repeats
Regular Expressions Summarizing: RGR RGL Minimum DFA RE NFA DFA Done Soon
Regular Expressions Regular Expression ↓ NFA Recursively build the FSA, mimicking the structure of the regular expression. Each FSA built has one start state, and one final state. Conversions: • if ø ALGORITHM 1 1 2
Regular Expressions • if ε • if a • if P + Q • if P· Q or 1 a 1 2 P ε ε 1 2 ε ε Q ε P Q ε ε ε 1 P Q 2
Regular Expressions ε • if P* Example:(b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* ε ε 1 P 2 ε b 1 2 a 3 4 b 5 6
Regular Expressions a (b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* (b (aba + ε) a)* 7 8 9 a 10 11 a ε b 3 4 5 6 ε 7 8 a
Regular Expressions (b (aba + ε) a)* (b (aba + ε) a)* a ε b 3 ε 4 5 6 ε 12 9 7 ε 13 8 ε ε a b 2 1 ε a ε b 3 ε 4 5 6 ε 12 9 7 ε 13 8 ε ε a
Regular Expressions (b (aba + ε) a) * b 2 1 ε a ε b ε 3 4 5 6 ε 12 9 7 ε 13 8 ε ε a ε 10 a 11
Regular Expressions (b (aba + ε) a)* ε b ε ε a 14 2 1 12 3 4 ε ε ε ε 11 9 5 ε a ε 15 13 6 10 8 7 ε ε a ε
Regular Expressions Regular Expression ↓ NFA Start With: ALGORITHM 2 E
Regular Expressions Apply Rules: a a* ε ε ab a b a a + b b
Regular Expressions Algorithm 1: • Builds FSA bottom up • Good for machines • Bad for humans Algorithm 2: • Builds FSA top down • Bad for machines • Good for humans Arguable
Regular Expressions Example (Algorithm 2): (a + b)* (aa + bb) (a + b)* aa + bb aa ε ε bb a + b a a ε ε b b a b
Regular Expressions Example (Algorithm 2): ba(a + b)* ab a b a ε ε a b b
Regular Expressions Deterministic Finite-State Automata (DFA’s) Definition: A deterministic FSA is defined just like an NFA, except that δ: Q x Σ → Q, rather than δ: Q x Σ U {ε}→ 2Q Thus, both and are impossible. ε a a
Regular Expressions Every transition of a DFA consumes a symbol. Fortunately, DFA’s are just as powerful as NFA’s. Theorem: For every NFA there exists an equivalent (accepting the same language) DFA.
Regular Expressions Conversion from NFA’s to DFA’s: • “Simulate” all moves of the NFA with the DFA. • The start state of the DFA is the start state of the NFA (say, S), together with states that are ε-reachable from S. • Each state in the DFA is a subset of the set of states of the NFA; the notion of being in “any one of” a number of states. • New states in the DFA are constructed by calculating the sets of states that are reachable through symbols, after the start state. • The final states in the DFA are those that contain any final state of the NFA.
Regular Expressions Example: a*b + ba* a ε 2 3 b ε a 6 1 NFA b ε ε 4 5
a ε 2 3 b ε a 6 1 b ε ε 4 5 Regular Expressions DFA Input State a b 123 23 456 23 23 6 456 56 --- 6 --- --- 56 56 --- NFA a b 6 23 a DFA 123 b a a 456 56
ε a 4 5 ε ε b a ε 8 0 1 2 3 ε 6 7 ε b NFA ε ε b a 11 10 9 Regular Expressions In general, if NFA has N states, the DFA can have as many as 2N states. Example: ba (a + b)* ab
DFA Input State a b 0 --- 1 1 234689 --- 234689 34568910 346789 34568910 34568910 34678911 346789 34568910 346789 34678911 34568910 346789
Regular Expressions a 34568910 b a b a a 0 1 234689 a 34678911 b 346789 b b
Regular Expressions State Minimization Theorem: Given a DFA M, there exists an equivalent DFA M’ that is minimal, i.e. no other equivalent DFA exists with fewer states than M’. Definition: A partition of a set S is a set of subsets of S such that every element of S appears in exactly one of the subsets.
Regular Expressions Example: S = {1, 2, 3, 4, 5} Π1 = { {1, 2, 3, 4}, {5} } Π2 = { {1, 2, 3,}, {4}, {5} } Π3 = { {1, 3}, {2}, {4}, {5} } Note:Π2 is a refinement of Π1, and Π3 is a refinement of Π2.
Regular Expressions Minimization Algorithm: • Remove all undefined transitions by introducting a TRAP state, i.e. a state from which no final state is reachable. • Partition all states into two groups (final states and non-final states). • Complete the “Next State” table for each group, by specifying transitions from group to group. Form the next partition: split groups in which Next State table entries differ. Repeat 3 until no further splitting is possible. • Determine start and final states.
2 4 a a 1 a b a b b 3 5 Regular Expressions a b Example: Π0 = { {1, 2, 3, 4}, {5} } State a b 1 1234 1234 2 1234 1234 3 1234 1234 4 1234 5 5 1234 1234 b Split {4} from partition {1,2,3,4}
2 4 a a 1 a b a b b 3 5 Regular Expressions Π1 = { {1, 2, 3}, {4}, {5} } State a b 1 123 123 2 123 4 3 123 123 4 123 5 5 123 123 Split {2} from partition {1,2,3}
b 13 b a a a 5 2 a b 4 Regular Expressions Π2 = { {1, 3}, {2}, {4}, {5} } State a b 1 2 13 3 2 13 2 2 4 4 2 5 5 2 13 No more splitting Minimal DFA
Regular Expressions Summary of Regular Languages • Smallest class in the Chomsky hierarchy. • Appropriate for lexical analysis. • Four representations: RGR ,RGL , RE and FSA. • All four are equivalent; there are algorithms to perform transformations among them. • Various advantages and disadvantages among these four, for language designer, implementor, and user. • FSA’s can be made deterministic, and minimal.