1 / 31

Lecture # 3

Lecture # 3. Chapter #3: Lexical Analysis. Role of Lexical Analyzer. It is the first phase of compiler Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for syntax analysis Reasons to make it a separate phase are:

noura
Download Presentation

Lecture # 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture # 3 Chapter #3: Lexical Analysis

  2. Role of Lexical Analyzer • It is the first phase of compiler • Its main task is to read the input characters and produce as output a sequence of tokens that the parser uses for syntax analysis • Reasons to make it a separate phase are: • Simplifies the design of the compiler • Provides efficient implementation • Improves portability

  3. Interaction of the Lexical Analyzer with the Parser (Fig 3.1) Token,tokenval LexicalAnalyzer Parser SourceProgram Get nexttoken error error Symbol Table

  4. Tokens, Patterns, and Lexemes • A token is a classification of lexical units • For example: id and num • Lexemes are the specific character strings that make up a token • For example: abc and 123 • Patterns are rules describing the set of lexemes belonging to a token • For example: “letter followed by letters and digits” and “non-empty sequence of digits”

  5. Diff b/w Token, Lexeme and Pattern

  6. Attributes of Tokens Lexical analyzer y := 31 + 28*x <id, “y”> <assign, :=> <num, 31> <operator , +> <num, 28> <operator, *> <id, “x”> token Parser tokenval(token attribute)

  7. Section # 3.3: Specification of Tokens • Alphabet: Finite, nonempty set of symbols Example: Example: • Strings: Finite sequence of symbols from an alphabet e.g. 0011001 • Empty String: The string with zero occurrences of symbols from alphabet. The empty string is denoted by

  8. Continue… • Length of String: Number of positions for symbols in the string. |w| denotes the length of string w Example |0110| = 4; | | = 0 • Powers of an Alphabet: = = the set of strings of length k with symbols from Example:

  9. Continue.. • The set of all strings over is denoted

  10. Continue.. • Language: is a specific set of strings over some fixed alphabet  Example: The set of legal English words The set of strings consisting of n 0's followed by n 1’s LP = the set of binary numbers whose value is prime

  11. Concatenation and Exponentiation • The concatenation of two strings x and y is denoted by xy • The exponentiationof a string s is defined by s0 = si = si-1s for i > 0note that s = s = s

  12. Language Operations • Union L  M = {s  s  L or s  M} • Concatenation LM = {xy x  L and y  M} • Exponentiation L0 = {}; Li = Li-1L • Kleene closure L* = i=0,…, Li • Positive closureL+ = i=1,…, Li

  13. Regular Expressions • Basis symbols: •  is a regular expression denoting language {} • a   is a regular expression denoting {a} • If r and s are regular expressions denoting languages L(r) and M(s) respectively, then • rs is a regular expression denoting L(r)  M(s) • rs is a regular expression denoting L(r)M(s) • r* is a regular expression denoting L(r)* • (r) is a regular expression denoting L(r) • A language defined by a regular expression is called a Regular set or a Regular Language

  14. Regular Definitions • Regular definitions introduce a naming convention: d1 r1d2 r2…dn rnwhere each ri is a regular expression over  {d1, d2, …, di-1}

  15. Example:letter  AB…Zab…z digit  01…9id  letter ( letterdigit)* • The following shorthands are often used:r+ = rr*r? = r [a-z] = abc…z • Examples:digit  [0-9]num  digit+ (. digit+)? ( E (+-)? digit+ )?

  16. Regular Definitions and Grammars Grammar stmt ifexprthenstmtif exprthenstmtelsestmtexpr term reloptermtermterm  idnum Regular definitions if if then then else elserelop <<=<>>>== id letter ( letter | digit )* num digit+ (. digit+)? ( E (+-)? digit+ )?

  17. Coding Regular Definitions in Transition Diagrams relop <<=<>>>== start < = 0 1 2 return(relop, LE) > 3 return(relop, NE) other * 4 return(relop, LT) = return(relop, EQ) 5 > = 6 7 return(relop, GE) other * 8 return(relop, GT) id letter ( letterdigit )* letter or digit start letter other * 9 10 11 return(gettoken(),install_id())

  18. Section 3.6 : Finite Automata • Finite Automata are used as a model for: • Software for designing digital circuits • Lexical analyzer of a compiler • Searching for keywords in a file or on the web. • Software for verifying finite state systems, such as communication protocols.

  19. Design of a Lexical Analyzer Generator • Translate regular expressions to NFA • Translate NFA to an efficient DFA Optional regularexpressions NFA DFA Simulate NFAto recognizetokens Simulate DFAto recognizetokens

  20. Nondeterministic Finite Automata • An NFA is a 5-tuple (S, , , s0, F) whereS is a finite set of states is a finite set of symbols, the alphabet is a mapping from S   to a set of statess0  S is the start stateF  S is the set of accepting (or final) states

  21. Transition Graph • An NFA can be diagrammatically represented by a labeled directed graph called a transition graph a S = {0,1,2,3} = {a,b}s0 = 0F = {3} start b b a 0 1 2 3 b

  22. Transition Table • The mapping  of an NFA can be represented in a transition table (0,a) = {0,1}(0,b) = {0}(1,b) = {2}(2,b) = {3}

  23. The Language Defined by an NFA • An NFA accepts an input string x if and only if there is some path with edges labeled with symbols from x in sequence from the start state to some accepting state in the transition graph • A state transition from one state to another on the path is called a move • The language defined by an NFA is the set of input strings it accepts, such as (ab)*abb for the example NFA

  24. From Regular Expression to -NFA (Thompson’s Construction)  start  i f start a a i f   N(r1) start r1r2 i f   N(r2) start r1r2 i N(r1) N(r2) f    start r* i N(r) f 

  25. Combining the NFAs of a Set of Regular Expressions start a 1 2 a { action1 }abb { action2 }a*b+ { action3 } start a b b 3 4 5 6 a b start 7 8 b a 1 2  start  a b b 3 0 4 5 6 a b  7 8 b

  26. Deterministic Finite Automata • A deterministic finite automaton is a special case of an NFA • No state has an -transition • For each state s and input symbol a there is at most one edge labeled a leaving s • Each entry in the transition table is a single state • At most one path exists to accept a string • Simulation algorithm is simple

  27. Example DFA A DFA that accepts (ab)*abb b b a start b b a 0 1 2 3 a a

  28. Conversion of an NFA into a DFA • The subset construction algorithm converts an NFA into a DFA using:-closure(s) = {s}  {ts  …  t} -closure(T) = sT-closure(s)move(T,a) = {ts a t and s  T} • The algorithm produces:Dstates is the set of states of the new DFA consisting of sets of states of the NFADtran is the transition table of the new DFA

  29. -closure and move Examples -closure({0}) = {0,1,3,7}move({0,1,3,7},a) = {2,4,7}-closure({2,4,7}) = {2,4,7}move({2,4,7},a) = {7}-closure({7}) = {7}move({7},b) = {8}-closure({8}) = {8}move({8},a) =  a 1 2  start  a b b 3 0 4 5 6 a b  7 8 b a a b a none Also used to simulate NFAs

  30. Subset Construction Example 1  a 2 3     start a b b 0 1 6 7 8 9 10   b 4 5  b DstatesA = {0,1,2,4,7}B = {1,2,3,4,6,7,8}C = {1,2,4,5,6,7}D = {1,2,4,5,6,7,9}E = {1,2,4,5,6,7,10} C b a b start a b b A B D E a a a

  31. Subset Construction Example 2 a a1 1 2  start  a b b a2 3 0 4 5 6 a b  a3 7 8 b b DstatesA = {0,1,3,7}B = {2,4,7}C = {8}D = {7}E = {5,8}F = {6,8} a3 C a b b b start A D a a b b B E F a1 a3 a2 a3

More Related