330 likes | 669 Views
Lecture # 3. Chapter #3: Lexical Analysis. Role of Lexical Analyzer. It is the first phase of compiler Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for syntax analysis Reasons to make it a separate phase are:
E N D
Lecture # 3 Chapter #3: Lexical Analysis
Role of Lexical Analyzer • It is the first phase of compiler • Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for syntax analysis • Reasons to make it a separate phase are: • Simplifies the design of the compiler • Provides efficient implementation • Improves portability
Interaction of the Lexical Analyzer with the Parser (Fig 3.1) Token,tokenval LexicalAnalyzer Parser SourceProgram Get nexttoken error error Symbol Table
Tokens, Patterns, and Lexemes • A token is a classification of lexical units • For example: id and num • Lexemes are the specific character strings that make up a token • For example: abc and 123 • Patterns are rules describing the set of lexemes belonging to a token • For example: “letter followed by letters and digits” and “non-empty sequence of digits”
Attributes of Tokens Lexical analyzer y := 31 + 28*x <id, “y”> <assign, :=> <num, 31> <operator , +> <num, 28> <operator, *> <id, “x”> token Parser tokenval(token attribute)
Section # 3.3: Specification of Tokens • Alphabet: Finite, nonempty set of symbols Example: Example: • Strings: Finite sequence of symbols from an alphabet e.g. 0011001 • Empty String: The string with zero occurrences of symbols from alphabet. The empty string is denoted by
Continue… • Length of String: Number of positions for symbols in the string. |w| denotes the length of string w Example |0110| = 4; | | = 0 • Powers of an Alphabet: = = the set of strings of length k with symbols from Example:
Continue.. • The set of all strings over is denoted
Continue.. • Language: is a specific set of strings over some fixed alphabet Example: The set of legal English words The set of strings consisting of n 0's followed by n 1’s LP = the set of binary numbers whose value is prime
Concatenation and Exponentiation • The concatenation of two strings x and y is denoted by xy • The exponentiationof a string s is defined by s0 = si = si-1s for i > 0note that s = s = s
Language Operations • Union L M = {s s L or s M} • Concatenation LM = {xy x L and y M} • Exponentiation L0 = {}; Li = Li-1L • Kleene closure L* = i=0,…, Li • Positive closureL+ = i=1,…, Li
Regular Expressions • Basis symbols: • is a regular expression denoting language {} • a is a regular expression denoting {a} • If r and s are regular expressions denoting languages L(r) and M(s) respectively, then • rs is a regular expression denoting L(r) M(s) • rs is a regular expression denoting L(r)M(s) • r* is a regular expression denoting L(r)* • (r) is a regular expression denoting L(r) • A language defined by a regular expression is called a Regular set or a Regular Language
Regular Definitions • Regular definitions introduce a naming convention: d1 r1d2 r2…dn rnwhere each ri is a regular expression over {d1, d2, …, di-1}
Example:letter AB…Zab…z digit 01…9id letter ( letterdigit)* • The following shorthands are often used:r+ = rr*r? = r [a-z] = abc…z • Examples:digit [0-9]num digit+ (. digit+)? ( E (+-)? digit+ )?
Regular Definitions and Grammars Grammar stmt ifexprthenstmtif exprthenstmtelsestmtexpr term reloptermtermterm idnum Regular definitions if if then then else elserelop <<=<>>>== id letter ( letter | digit )* num digit+ (. digit+)? ( E (+-)? digit+ )?
Coding Regular Definitions in Transition Diagrams relop <<=<>>>== start < = 0 1 2 return(relop, LE) > 3 return(relop, NE) other * 4 return(relop, LT) = return(relop, EQ) 5 > = 6 7 return(relop, GE) other * 8 return(relop, GT) id letter ( letterdigit )* letter or digit start letter other * 9 10 11 return(gettoken(),install_id())
Section 3.6 : Finite Automata • Finite Automata are used as a model for: • Software for designing digital circuits • Lexical analyzer of a compiler • Searching for keywords in a file or on the web. • Software for verifying finite state systems, such as communication protocols.
Design of a Lexical Analyzer Generator • Translate regular expressions to NFA • Translate NFA to an efficient DFA Optional regularexpressions NFA DFA Simulate NFAto recognizetokens Simulate DFAto recognizetokens
Nondeterministic Finite Automata • An NFA is a 5-tuple (S, , , s0, F) whereS is a finite set of states is a finite set of symbols, the alphabet is a mapping from S to a set of statess0 S is the start stateF S is the set of accepting (or final) states
Transition Graph • An NFA can be diagrammatically represented by a labeled directed graph called a transition graph a S = {0,1,2,3} = {a,b}s0 = 0F = {3} start b b a 0 1 2 3 b
Transition Table • The mapping of an NFA can be represented in a transition table (0,a) = {0,1}(0,b) = {0}(1,b) = {2}(2,b) = {3}
The Language Defined by an NFA • An NFA accepts an input string x if and only if there is some path with edges labeled with symbols from x in sequence from the start state to some accepting state in the transition graph • A state transition from one state to another on the path is called a move • The language defined by an NFA is the set of input strings it accepts, such as (ab)*abb for the example NFA
From Regular Expression to -NFA (Thompson’s Construction) start i f start a a i f N(r1) start r1r2 i f N(r2) start r1r2 i N(r1) N(r2) f start r* i N(r) f
Combining the NFAs of a Set of Regular Expressions start a 1 2 a { action1 }abb { action2 }a*b+ { action3 } start a b b 3 4 5 6 a b start 7 8 b a 1 2 start a b b 3 0 4 5 6 a b 7 8 b
Deterministic Finite Automata • A deterministic finite automaton is a special case of an NFA • No state has an -transition • For each state s and input symbol a there is at most one edge labeled a leaving s • Each entry in the transition table is a single state • At most one path exists to accept a string • Simulation algorithm is simple
Example DFA A DFA that accepts (ab)*abb b b a start b b a 0 1 2 3 a a
Conversion of an NFA into a DFA • The subset construction algorithm converts an NFA into a DFA using:-closure(s) = {s} {ts … t} -closure(T) = sT-closure(s)move(T,a) = {ts a t and s T} • The algorithm produces:Dstates is the set of states of the new DFA consisting of sets of states of the NFADtran is the transition table of the new DFA
-closure and move Examples -closure({0}) = {0,1,3,7}move({0,1,3,7},a) = {2,4,7}-closure({2,4,7}) = {2,4,7}move({2,4,7},a) = {7}-closure({7}) = {7}move({7},b) = {8}-closure({8}) = {8}move({8},a) = a 1 2 start a b b 3 0 4 5 6 a b 7 8 b a a b a none Also used to simulate NFAs
Subset Construction Example 1 a 2 3 start a b b 0 1 6 7 8 9 10 b 4 5 b DstatesA = {0,1,2,4,7}B = {1,2,3,4,6,7,8}C = {1,2,4,5,6,7}D = {1,2,4,5,6,7,9}E = {1,2,4,5,6,7,10} C b a b start a b b A B D E a a a
Subset Construction Example 2 a a1 1 2 start a b b a2 3 0 4 5 6 a b a3 7 8 b b DstatesA = {0,1,3,7}B = {2,4,7}C = {8}D = {7}E = {5,8}F = {6,8} a3 C a b b b start A D a a b b B E F a1 a3 a2 a3