210 likes | 375 Views
Chapter 2. Chang Chi-Chung 2007.3.15. Lexical Analyzer. The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens Recognize Keywords and Identifiers Store identifier names in a symbol table. token. Lexical Analyzer. if (peek == ‘<br>’) line = line +1.
E N D
Chapter 2 Chang Chi-Chung 2007.3.15
Lexical Analyzer • The tasks of the lexical analyzer: • Remove white space and comments • Encode constants as tokens • Recognize Keywords and Identifiers • Store identifier names in a symbol table.
token Lexical Analyzer if (peek == ‘\n’) line = line +1 Lexical analyzerLexer() token Attribute <if> <(> <id, “peek”> <eq> <const, ‘\n’> <)> <id, “line”> <assign> <id, “line”> <+> <num, 1> <;> Parser or Syntax-Directed TranslatorParser()
Remove white space and comments • For white spaces and comments • Eliminated by the lexical analyzer. • Modifying the grammar to incorporate it into the syntax. ( not easy ) for ( ; ; peek = next character ) { if ( peek is a blank or a tab ) do nothing; else if (peek is a newline) line = line + 1; else break; }
Encode constants as tokens • For a sequence of digits, the lexical analyzer must pass to the parser a token. • The token consists of the terminal along with an integer-valued attribute computed from the digits. • Example • 31 + 28 + 29 • <num, 31> <+> <num, 28> <+> <num,29> if ( peek holds a digit ) { v = 0; do { v = v * 10 + integer value of digit peek; peek = next input character; } while (peek holds a digit) return token <num, v>; }
Recognize Keywords and Identifiers • Keyword • A fixed character string as punctuation marks or to identify constructs. • Example • for、while、if • Identifier • Use to name variables, arrays, functions, and the like. • Parser treat identifiers as terminals. • Example • count = count + increment; • <id, ”count”> = <id, “count”> <+> <id, “increment”> <;>
Recognize Keywords and Identifiers • The lexical analyzer uses a table to hold character strings. • A string table can be implemented by a hash table. • Single Representation • Reserved Words. if ( peek holds a letter ) { collect letters or digits into a buffer b; s = string formed from the characters in b; w = token returned by words.get(s); if (w is not null) return w; else { Enter the key-value pair (s, <id, s>) into words return token <id, s>; } }
Create a Lexical Analyzer Token scan() { skip white space. (A) handle numbers. (B) handle reserved words and identifiers. (C) Tokent = newToken(peek); peek = blank; (D) returnt; }
package lexer; public classToken { public final int tag; public Token(int t) { tag = t; } } public classTag { public final static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259; } public classNumextends Token { public final int value; public Num(int v) { super(Tag.NUM); value = v; } } public class Word extends Token { public final String lexeme; public Word(int t, String s) { super(t); lexeme = new String(s); } } Complete Lexical Analyzer (1) classToken +inttag classNum +intvalue classWord +stringlexeme
Complete Lexical Analyzer (2) package lexer; import java.io.*; import java.util.*; public class Lexer { public int line = 1; private char peek = ' '; private Hashtable words = new Hashtable(); void reserve(Word t) { words.put(t.lexeme, t); } public Lexer() { reserve( new Word(Tag.TRUE, "true") ); reserve( new Word(Tag.FALSE, "false") ); }
Complete Lexical Analyzer (3) public Token scan() throws IOException { for ( ; ; peek = (char) System.in.read() ) { if ( peek == ' ' || peek == '\t' ) continue; elseif ( peek == '\n' ) line = line + 1; elsebreak; } if ( Character.isDigit(peek) ) { int v = 0; do { v = v * 10 + Character.digit(peek, 10); peek = (char) System.in.read(); } while ( Character.isDigit(peek) ) return new Num(v); } } } C D
Complete Lexical Analyzer (4) public Token scan() throws IOException { if ( Character.isLetter(peek) ) { StringBuffer b = new StringBuffer(); do { b.append(peek); peek = (char) System.in.read() } while ( Character.isLetterOrDigit(peek) ); String s = b.toString(); Word w = (Word) words.get(s); if (w != null) return w; w = new Word(Tag.ID, s); words.put(s, w); return w; } Token t = new Token(peek); peek = ' '; return t; } } A B
Symbol Tables • Symbol tables are data structures • Used by compilers to hold information about source-program constructs. • Scope of identifier x • The scope of a particular declaration x • Scope • A portion of a program that is the scope of one or more declaration.
w B0 x int B1 y int w int y bool B3 z int Symbol Tables { int x1, int y1; { int w2; bool y2; int z2; w2; x1; y2; z2; } w0; x1; y1; }
w B0 x int B1 y int w int y bool B3 z int Symbol Tables package symbols; import java.util.*; public class Env { private Hashtable table; protected Env prev; public Env(Env p) { table = new Hashtable(); prev = p; } public void put(String s, Symbol sym) { table.put(s, sym); } public Symbol get(String s) { for (Env e = this; e != null; e = e.prev) { Symbol found = (Symbol)(e.table.get(s)); if (found != null) return found; } returnnull; } }
The Use of Symbol Tables program → block{ top = null; } block → ‘{‘ { saved = top; top = new Env(top); print(“{ “); } decls stmts ‘}’ { top = saved; print(“} “); } decls → declsdecl | ε decl → type id; { s = new Symbol; s.type = type.lexeme; top.put(id.lexeme, s); } stmts → stmts stmt | ε stmt → block | factor ; { print(“; “); } factor → id { s = top.get(id.lexeme); print(id.lexeme); print(“:”); print(s.type); }
op E1 E2 Intermediate Code Generation • Two most important intermediate representations. • Trees • Parse trees, syntax trees (abstract trees) • Example • while ( expr ) stmt • op: while E1 : expr E2 : stmt • Linear representations • Three-address code • Example • ifFalse x goto L • ifTrue x goto L • goto L • x [ y ] = z • x = y [ z ] x = y opz
If eq assign peek (int) line + ‘\n’ line 1 Intermediate Code Generation if (peek == ‘\n’) line = line +1 Parser or Syntax-Directed TranslatorParser() or 1: t1 = (int) ‘\n’ 2: ifFalse peek == t1 goto 4 3: line = line + 1 4:
Syntax Trees seq seq while seq if some tree for an expression some tree for an expression null some tree for an expression some tree for an expression
Static Checking • Static checks are consistency checks that are done during compilation. • Syntactic Checking • Type Checking • L-values and R-values • i = 5 • i = i + 1