Overview of Lexical Analysis in Programming

Chapter 3 Lexical Analysis

Content • Overview of this chapter 3.1 The Role of the Lexical Analyzer 3.2 Input Buffering 3.3 Specification of Tokens 3.4 Recognition of Tokens 3.5 The Lexical- Analyzer Generator Lex 3.6 Finite Automata 3.7 From Regular Expressions to Automata 3.8 Design of a Lexical- Analyzer Generator

3.5 The Lexical- Analyzer Generator Lex In this section, we • Introduce a tool : Lex • Learn the Lex language

3.5.1 Use of Lex • a.out: returns an integer, a code for token name • yylval:a global variable, holds the attribute value

3.5.2 Structure of Lex Programs • Form of Lex program: declarations %% translation rules %% auxiliary functions 1.declarations:variables, manifest constants and regular definitions 2.translation rules:Pattern { Action }

3.5.2 Structure of Lex Programs • Lex program for the tokens of Fig. 3.12

3.5.3 Conflict Resolution in Lex • Tow rules: 1. Always prefer a longer prefix to a shorter prefix 2. If the longest possible prefix matches two or more patterns, prefer the pattern listed first in the Lex program

3.5.4 The Lookahead Operator • Use the slash(/) in a pattern • Example: IF(I,J)=3, where IF is the name of an array IF(condition) THEN..., where IF is akeyword We could write a Lex rule: IF/ \(.*\) {letter} For instance: IF(A<(B+C)*D) THEN...

3.6 Finite Automata • Differencesbetween finite automata and transition diagrams 1. Finite automata are recognizers 2. Finite automata come in two flavors (a) Nondeterministic finite automata (NFA) (b) Deterministic finite automata (DFA) • Notices: DFA and NFA are capable of recognizing the same languages (regular languages)

3.6.1 Nondeterministic Finite Automata • A nondeterministic finite automaton (NFA) consists of: 1. A finite set of states S 2.A set of input symbols ∑, the input alphabet, Є is never a member of ∑ 3. A transition function that gives, for each state, and for each symbol in ∑U{Є} a set of next states 4. A start state (or initial state) S0 from S 5. A set of states F, a subset of S, as the accepting states (or final states)

3.6.1 Nondeterministic Finite Automata • NFA is very much like a transition diagram except: 1. The same symbol can label edges from one state to several different states 2. An edge may be labeled by Є • Example: An NFA recognizing the language of (a|b)*abb

3.6.2 Transition Tables • Represent an NFA: rows: states columns: input symbols and Є e.g. Transition table for the NFA of Fig. 3.24 • Advantage: can easily find the transitions • Disadvantage: takes a lot of space

3.6.3 Acceptance of Input Strings by Automata • An NFA accepts input string x: if and only if there is some path from the start state to one of the accepting states • Example 1: The string aabb is accepted by the NFA Path: (accepted) Another: (unaccepted)

3.6.3 Acceptance of Input Strings by Automata • Example 2: An NFA accepting L(aa*lbb*) String aaa is accepted because of the path:

3.6.4 Deterministic Finite Automata • A DFA is a special case of an NFA where: 1. There are no moves on input Є 2. For each state s and input symbol a, there is exactly one edge out of s labeled a • Example: A DFA accepting (a|b)*abb

3.6.4 Deterministic Finite Automata • Simulating a DFA: INPUT: A string x, A DFA D OUTPUT: Answer ''yes" if D accepts x; "no" otherwise METHOD:

3.7 From Regular Expressions to Automata In this section, we • Show how to convert NFA's to DFA's • Give a useful algorithm for simulating NFA's • Show how to convert regular expressions to NFA's

3.7.1 Conversion of an NFA to a DFA • Algorithm: “subset construction” INPUT: An NFA N OUTPUT: A DFA D accepting the same language as N METHOD: 1. Constructs a transition table Dtran for D 2. Operations on NFA states Є-closure(s):Set of NFA states reachable from NFA state s on Є-transitions alone Є-closure(T):Set of NFA states reachable from some NFA state s in set Ton Є-transitions U s in TЄ-closure(s) move(T, a):Set of NFA states to which there is a transition on input symbol a from some state s in T

3.7.1 Conversion of an NFA to a DFA 3. subset construction Algorithm

3.7.1 Conversion of an NFA to a DFA • Example: An NFA N for (alb)*abb, construct it to a DFA 1. Є-closure(0)={0,1,2,4,7}=A 2. Dtran[A, a] = Є-closure(move(A, a))={1,2,3,4,6,7,8}=B Dtran[A, b]=Є-closure(move(A, b))={1,2,4,6,7}=C

3.7.1 Conversion of an NFA to a DFA 3. Dtran[B, a] = Є-closure(move(B, a))={1,2,3,4,6,7,8}=B Dtran[B, b]=Є-closure(move(B, b))={1,2,4,5,6,7,9}=D Dtran[C, a]=Є-closure(move(C, a))={1,2,3,4,6,7,8}=B Dtran[C, b]=Є-closure(move(C, b))={1,2,4,5,6,7}=C 4. Dtran[D, a]=Є-closure(move(D, a))={1,2,3,4,6,7,8}=B Dtran[D, b]=Є-closure(move(D, b))={1,2,3,5,6,7,10}=E 5. Dtran[E, a]=Є-closure(move(E, a))={1,2,3,4,6,7,8}=B Dtran[E, b]=Є-closure(move(E, b))={1,2,4,5,6,7}=C

3.7.1 Conversion of an NFA to a DFA 6. Transition table Dtran 7. Result (A and C can be merged)

3.7.2 Simulation of an NFA INPUT: An input string x, An NFA N OUTPUT: Answer "yes' if N accepts x; "no" otherwise METHOD:

3.7.3 Efficiency of NFA Simulation • Algorithm 3.22 • Data structures we need: 1. Two stacks 2. A boolean array alreadyOn 3. A two-dimensional array move[s, a]

3.7.3 Efficiency of NFA Simulation • Implement line(1) of Algorithm 3.22 • Implement line(4) of Algorithm 3.22 • Total running time:O((k(n + m))

第一次作业 3.3.2 (2,3) 3.3.5 (6, 9) 3.4.1(1,2,3) 3.5.2 3.6.5 (1)

3.7.4 Construction of an NFA from a Regular Expression • McNaughton-Yamada-Thompson algorithm INPUT: A regular expression r over alphabet ∑ OUTPUT: An NFA N accepting L(r) METHOD: BASIS: 1. For expression : 2. For any subexpression a in ∑:

3.7.4 Construction of an NFA from a Regular Expression INDUCTION: 1. union: r = s|t 2.concatenation: r = st 3. closure: r = s* 4. r = (s): L(r) = L(s), we can use N(s) as N(r)

3.7.4 Construction of an NFA from a Regular Expression • Example:Construct an NFA for r=(a|b)*abb 1. r1=a: 2. r2=b: 3. r3=r1|r2: 4. r4=(r3): is the same as N(r3) 5. r5=(r3)*:

3.7.4 Construction of an NFA from a Regular Expression 6. r6=a: 7. r7=r5r6: … 10. r10:

3.7.5 Efficiency of String-Processing Algorithms

3.8 Design of a Lexical- Analyzer Generator How a lexical-analyzer generator such as Lex is architected? • Approaches based on NFA's • Approaches based on DFA's(Lex’s implementation)

3.8.1 The Structure of the Generated Analyzer • Architecture of a lexical analyzer generated by Lex 1. A transition table for the automaton. 2. Those functions that are passed directly through Lex to the output. 3. The actions from the input program, which appear as fragments of code to be invoked at the appropriate time by the automaton simulator.

3.8.1 The Structure of the Generated Analyzer • An NFA constructed from a Lex program • Example: a { action A1 for pattern p1 } abb { action A2 for pattern p2 } a*b+ { action A3 for pattern p3 }

3.8.1 The Structure of the Generated Analyzer NFA's for a, abb, and a*b+ Combined NFA:

3.8.2 Pattern Matching Based on NFA's • Example: a { action A1 for pattern p1 } abb { action A2 for pattern p2 } a*b+ { action A3 for pattern p3 } input: aaba

3.8.3 DFA's for Lexical Analyzers • Convert NFA into an equivalent DFA 1. Simulate the DFA until no next state 2. back up and, as soon as we meet an accepting state, perform the action

3.8.4 Implementing the Lookahead Operator • Treat the / as if it were  • Example: NFA recognizing the keyword IF

3.9.6 Minimizing the Number of States of a DFA • The minimum-state DFA can be constructed from any DFA for the same language by grouping sets of equivalent states. • We say that string x distinguishes state s from state t if exactly one of the states reached from s and t by following the path with label x is an accepting state. • State s is distinguishable from state t if there is some string that distinguishes them.

3.9.6 Minimizing the Number of States of a DFA • Initially, the partition consists of two groups: the accepting states and the nonaccepting states. • The fundamental step is to take some group of the current partition, say A = {sl , s2, . . . , sk), and some input symbol a, and see whether a can be used to distinguish between any states in group A.

3.9.6 Minimizing the Number of States of a DFA • We examine the transitions from each of sl , s2, . . . , sk on input a, and if the states reached fall into two or more groups of the current partition, we split A into a collection of groups. • si and sjare in the same group if and only if they go to the same group on input a. • Repeat this process of splitting groups, until for no group, and for no input symbol, can the group be split further.

3.9.6 Minimizing the Number of States of a DFA Algorithm 3.39: Minimizing the number of states of a DFA. • INPUT: ADFA Dwith set of states S, input alphabet , start state s0, and set of accepting states F. • OUTPUT: ADFA D' accepting the same language as D and having as few states as possible.

3.9.6 Minimizing the Number of States of a DFA • = { A,C,B,D}, {E} • ={A,C,B}, {D}, {E} • ={A,C},{B},{D},{E}

The end of Lecture03

Overview of Lexical Analysis in Programming

Overview of Lexical Analysis in Programming

Presentation Transcript

CHAPTER 3-3

Chapter 3-3

Chapter 3 Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

CHAPTER 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3

Chapter 3-3