330 likes | 346 Views
Elaboration or: Semantic Analysis. Compiler Baojian Hua bjhua@ustc.edu.cn. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Elaboration. Also known as type-checking, or semantic analysis context-sensitive analysis
E N D
Elaboration or:Semantic Analysis Compiler Baojian Hua bjhua@ustc.edu.cn
Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR
Elaboration • Also known as type-checking, or semantic analysis • context-sensitive analysis • Checking the well-formedness of programs: • every variable is declared before use • every expression has a proper type • function calls conform to definitions • all other possible context-sensitive info’ (highly language-dependent) • … • translate AST into intermediate or machine code
Elaboration Example void f (int *p) { x += 4; p (23); “hello” + “world”; } int main () { f () + 5; } What errors can be detected here?
Terminology • Scope • Lifetime • Storage class • Name space
Terminologies: Scope int x; int f () { if (4) { int x; x = 6; } else { int x; x = 5; } x = 8; }
Terminologies: Lifetime static int x; int f () { int x, *p; x = 6; p = malloc (sizeof (*p)); if (3) { static int x; x = 5; } }
Terminologies: Storage class extern int x; int f () { extern int x; x = 6; if (3) { extern int x; x = 5; } }
Terminologies: Name space struct list { int x; struct list *list; } *list; void walk (struct list *list) { list: printf (“%d\n”, list->x); if (list = list->list) goto list; }
Moral • For the purpose of elaboration, must take care of all of this TOGETHER • Scope • Life time • Storage class • Name space • … • All these details are handled by symbol tables!
Symbol Tables • In order to keep track of the types and other infos’ we’d maintain a finite map of program symbols to info’ • symbols: variables, function names, etc. • Such a mapping is called a symbol table, or sometimes an environment • Notation: {x1: t1, x2: t2, …, xn: tn} • where xi: ti (1≤i ≤n) is called a binding
Scope • How to handle lexical scope? • It’s easy, we just insert and remove bindings during elaboration, as we enters and leaves a local scope
Scope int x; σ={x:int} int f () σ1 = σ + {f:…} = {x:int, f:…} { if (4) { int x; σ2 = σ1 + {x:int} = {x:…, f:…, x:…} x = 6; } σ1 else { int x; σ4 = σ1 + {x:int} = {x:…, f:…, x:…} x = 5; } σ1 x = 8; } σ1 Shadowing: “+” is not commutative!
Implementation • Must be efficient! • lots of variables, functions, etc • Two basic approaches: • Functional • symbol table is implemented as a functional data structure (e.g., red-black tree), with no tables ever destroyed or modified • Imperative • a single table, modified for every binding added or removed • This choice is largely independent of the implementation language
Functional Symbol Table • Basic idea: • when implementing σ2 = σ1 + {x:t} • creating a new table σ2, instead of modifyingσ1 • when deleting, restore to the old table • A good data structure for this is BST or red-black tree
BST Symbol Table ’ c: int c: int e: int a: char b: double
Possible Functional Interface signature SYMBOL_TABLE = sig type ‘a t type key val empty: ‘a t val insert: ‘a t * key * ‘a -> ‘a t val lookup: ‘a t * key -> ‘a option end
Imperative Symbol Tables • The imperative approach almost always involves the use of hash tables • Need to delete entries to revert to previous environment • made simpler because deletes follow a stack discipline • can maintain a stack of entered symbols, so that they can be later popped and removed from the hash table
Possible Imperative Interface signature SYMBOL_TABLE = sig type ‘a t type key val insert: ‘a t * key * ‘a -> unit val lookup: ‘a t * key -> ‘a option val delete: ‘a t * key -> unit val beginScope: unit -> unit val endScope: unit -> unit end
Name Space • It’s trivial to handle name space • one symbol table for each name space • Take C as an example: • Several different name spaces • labels • tags • variables • So …
Implementation of Symbols • For several reasons, it will be useful at some point to represent symbols as elements of a small, densely packed set of identities • fast comparisons (equality) • for dataflow analysis, we will want sets of variables and fast set operations • It will be critically important to use bit strings to represent the sets • For example, your liveness analysis algorithm • More on this later
Types • The representation of types is highly language-dependent • Some key considerations: • name vs. structural equivalence • mutually recursive type definitions • dealing with errors
Name vs. Structural Equivalence struct A { int i; } x; struct B { int i; } y; x = y; • In a language with structural equivalence, this program is legal • But not in a language with name equivalence (e.g., C) • For name equivalence, can generate a unique symbol for each defined type • For structural equivalence, need to recursively compare the types
Mutually recursive type definitions • To process recursive and mutually recursive type definitions, need a placeholder • in ML, an option ref • in C, a pointer • in Java, bind method (read Appel) struct A { int data; struct A *next; struct B *b; }; struct B {…};
Error Diagnostic • To recover from errors, it is useful to have an “any” type • makes it possible to continue more type-checking • In practice, use “int” or guess one • Similarly, a “void” type can be used for expressions that return no value • Source locations are annotated in AST!
Organization of the Elaborator • Module structure: elabProg: Ast.Program.t -> unit elabStm: Ast.Stm.t * tenv * venv -> unit elabDec: Ast.Dec.t * venv * tenv-> tenv * venv elabTy: Ast.Type.t * tenv -> ty elabExp: Ast.Exp.t * venv-> ty elabLVal: Ast.Lval.t * venv-> ty • It will be extended to also do translation. • For now let’s concentrate on type-checking
Elaborate Expressions • Checks that expressions are correctly typed. • Valid expressions are defined in the C specification. • e: t means that e is a valid expression of type t. • venv is a symbol table (environment).
venv| e1: int venv| e2: int venv| e1+e2: int Elaborate Expressions fun elabExp (e, venv) = case e of BinaryExp (PLUS, e1, e2) => let val t1 = elabExp (e1, env) val t2 = elabExp (e2, env) in case (t1, t2) of (Int, Int) => Int | (Int, _) => error (“e2 should be int”) | (_, Int) => error (“e1 should be int”) | _ => error (“should both be int”) end
Elaborate Types • Elaborating types is straightforward, except for recursive types • Need to do “knot-tying”: • extend tenv with bindings for all of the new type names • bind new names to “dummy” bodies • process each definition, replacing the dummy bodies with real definitions
Elaborate Declarations • elabDec will extend the symbol tables with a new binding: int a; • will add {a: int} to the environment. • Remember that environments have to take into account scope of variables!
Elaborate Statement, Lvals, Programs • All follow the same structures as exp or types • elabProg calls the other functions in order to type-check each component of the program (declarations, statements, expressions, …)
Labs • For lab #4, your job is to implement an elaborator for C-- • you may go in two steps • first type-checking • and then generating target code • At every step, check the output carefully to make sure your compiler works correctly
Summary • Elaboration checks the well-formedness of programs • must take care of semantics of source programs • and may translate into more low-level forms • Usually the most big (complex) part in a compiler!