1 / 21

Chapter 2

Chapter 2. Chang Chi-Chung 2007.3.15. Lexical Analyzer. The tasks of the lexical analyzer: Remove white space and comments Encode constants as tokens Recognize Keywords and Identifiers Store identifier names in a symbol table. token. Lexical Analyzer. if (peek == ‘<br>’) line = line +1.

thetis
Download Presentation

Chapter 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2 Chang Chi-Chung 2007.3.15

  2. Lexical Analyzer • The tasks of the lexical analyzer: • Remove white space and comments • Encode constants as tokens • Recognize Keywords and Identifiers • Store identifier names in a symbol table.

  3. token Lexical Analyzer if (peek == ‘\n’) line = line +1 Lexical analyzerLexer() token Attribute <if> <(> <id, “peek”> <eq> <const, ‘\n’> <)> <id, “line”> <assign> <id, “line”> <+> <num, 1> <;> Parser or Syntax-Directed TranslatorParser()

  4. Remove white space and comments • For white spaces and comments • Eliminated by the lexical analyzer. • Modifying the grammar to incorporate it into the syntax. ( not easy ) for ( ; ; peek = next character ) { if ( peek is a blank or a tab ) do nothing; else if (peek is a newline) line = line + 1; else break; }

  5. Encode constants as tokens • For a sequence of digits, the lexical analyzer must pass to the parser a token. • The token consists of the terminal along with an integer-valued attribute computed from the digits. • Example • 31 + 28 + 29 • <num, 31> <+> <num, 28> <+> <num,29> if ( peek holds a digit ) { v = 0; do { v = v * 10 + integer value of digit peek; peek = next input character; } while (peek holds a digit) return token <num, v>; }

  6. Recognize Keywords and Identifiers • Keyword • A fixed character string as punctuation marks or to identify constructs. • Example • for、while、if • Identifier • Use to name variables, arrays, functions, and the like. • Parser treat identifiers as terminals. • Example • count = count + increment; • <id, ”count”> = <id, “count”> <+> <id, “increment”> <;>

  7. Recognize Keywords and Identifiers • The lexical analyzer uses a table to hold character strings. • A string table can be implemented by a hash table. • Single Representation • Reserved Words. if ( peek holds a letter ) { collect letters or digits into a buffer b; s = string formed from the characters in b; w = token returned by words.get(s); if (w is not null) return w; else { Enter the key-value pair (s, <id, s>) into words return token <id, s>; } }

  8. Create a Lexical Analyzer Token scan() { skip white space. (A) handle numbers. (B) handle reserved words and identifiers. (C) Tokent = newToken(peek); peek = blank; (D) returnt; }

  9. package lexer; public classToken { public final int tag; public Token(int t) { tag = t; } } public classTag { public final static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259; } public classNumextends Token { public final int value; public Num(int v) { super(Tag.NUM); value = v; } } public class Word extends Token { public final String lexeme; public Word(int t, String s) { super(t); lexeme = new String(s); } } Complete Lexical Analyzer (1) classToken +inttag classNum +intvalue classWord +stringlexeme

  10. Complete Lexical Analyzer (2) package lexer; import java.io.*; import java.util.*; public class Lexer { public int line = 1; private char peek = ' '; private Hashtable words = new Hashtable(); void reserve(Word t) { words.put(t.lexeme, t); } public Lexer() { reserve( new Word(Tag.TRUE, "true") ); reserve( new Word(Tag.FALSE, "false") ); }

  11. Complete Lexical Analyzer (3) public Token scan() throws IOException { for ( ; ; peek = (char) System.in.read() ) { if ( peek == ' ' || peek == '\t' ) continue; elseif ( peek == '\n' ) line = line + 1; elsebreak; } if ( Character.isDigit(peek) ) { int v = 0; do { v = v * 10 + Character.digit(peek, 10); peek = (char) System.in.read(); } while ( Character.isDigit(peek) ) return new Num(v); } } } C D

  12. Complete Lexical Analyzer (4) public Token scan() throws IOException { if ( Character.isLetter(peek) ) { StringBuffer b = new StringBuffer(); do { b.append(peek); peek = (char) System.in.read() } while ( Character.isLetterOrDigit(peek) ); String s = b.toString(); Word w = (Word) words.get(s); if (w != null) return w; w = new Word(Tag.ID, s); words.put(s, w); return w; } Token t = new Token(peek); peek = ' '; return t; } } A B

  13. Symbol Tables • Symbol tables are data structures • Used by compilers to hold information about source-program constructs. • Scope of identifier x • The scope of a particular declaration x • Scope • A portion of a program that is the scope of one or more declaration.

  14. w B0 x int B1 y int w int y bool B3 z int Symbol Tables { int x1, int y1; { int w2; bool y2; int z2; w2; x1; y2; z2; } w0; x1; y1; }

  15. w B0 x int B1 y int w int y bool B3 z int Symbol Tables package symbols; import java.util.*; public class Env { private Hashtable table; protected Env prev; public Env(Env p) { table = new Hashtable(); prev = p; } public void put(String s, Symbol sym) { table.put(s, sym); } public Symbol get(String s) { for (Env e = this; e != null; e = e.prev) { Symbol found = (Symbol)(e.table.get(s)); if (found != null) return found; } returnnull; } }

  16. The Use of Symbol Tables program → block{ top = null; } block → ‘{‘ { saved = top; top = new Env(top); print(“{ “); } decls stmts ‘}’ { top = saved; print(“} “); } decls → declsdecl | ε decl → type id; { s = new Symbol; s.type = type.lexeme; top.put(id.lexeme, s); } stmts → stmts stmt | ε stmt → block | factor ; { print(“; “); } factor → id { s = top.get(id.lexeme); print(id.lexeme); print(“:”); print(s.type); }

  17. op E1 E2 Intermediate Code Generation • Two most important intermediate representations. • Trees • Parse trees, syntax trees (abstract trees) • Example • while ( expr ) stmt • op: while E1 : expr E2 : stmt • Linear representations • Three-address code • Example • ifFalse x goto L • ifTrue x goto L • goto L • x [ y ] = z • x = y [ z ] x = y opz

  18. If eq assign peek (int) line + ‘\n’ line 1 Intermediate Code Generation if (peek == ‘\n’) line = line +1 Parser or Syntax-Directed TranslatorParser() or 1: t1 = (int) ‘\n’ 2: ifFalse peek == t1 goto 4 3: line = line + 1 4:

  19. Syntax Trees

  20. Syntax Trees seq seq while seq if some tree for an expression some tree for an expression null some tree for an expression some tree for an expression

  21. Static Checking • Static checks are consistency checks that are done during compilation. • Syntactic Checking • Type Checking • L-values and R-values • i = 5 • i = i + 1

More Related