1 / 42

Elaboration or: Semantic Analysis

This article explores the elaboration and semantic analysis of programming languages, including concepts such as type-checking, context-sensitive analysis, and symbol tables. Examples and rules are provided to demonstrate these concepts.

cvirgil
Download Presentation

Elaboration or: Semantic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elaborationor: Semantic Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

  2. Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

  3. Elaboration • Also known as type-checking, or semantic analysis • context-sensitive analysis • Checking the context-sensitive property of programs (AST): • every variable is declared before use • every expression has a proper type • function calls conform to definitions • all other possible context-sensitive info’ (highly language-dependent) • …

  4. Elaboration Example // Sample C code: void f (int *p) { x += 4; p (23); “hello” + “world”; } int main () { f () + 5; break; } What errors can be detected here?

  5. Conceptually Elaborator AST Intermediate Code Language Semantics

  6. Semantics • Traditionally, semantics takes the form of natural language specification • e.g., for the “+” operator, both the left and right operands should be of “integer” type • refer to various specifications • But recent research has revealed that semantics can also be addressed via math • rigorous and clean

  7. Semantics • Now let’s turn to Macqueen’s note… • How to implement these rules?

  8. Language T-SLP // Let’s make the SLP typed: T-SLP P -> DS S DS -> T id; DS | T -> bool | int S -> S ; S | id := E | print (E) | printBool (E) E -> id | num | E+E | E&&E | true | false variable declarations followed by statements two types: “bool” and “int” print an “integer” value print a “bool” value both the two sub-expressions must be booleans

  9. Symbol Tables • In order to keep track of the types and other infos’ we’d maintain a finite map of program symbols to info’ • symbols: variables, function names, etc. • Such a mapping is called a symbol table, or sometimes an environment • Notation: {x1: b1, x2: b2, …, xn: bn} • where bi (1≤i ≤n) is called a binding

  10. Type System • Next, we write the symbol table as ∑ • ∑=T1 x1; T2 x2; T3 x3; … • a list of (T id) tuples • may be empty • Each rule takes the form of … ∑  P1: T1 ∑  Pn: Tn ∑ C : T

  11. Type System: exp T id ∈ ∑ ∑  num: int ∑  id: T ∑  true: bool ∑  false: bool ∑  E1: int ∑  E2: int ∑ E1+E2: int ∑  E1: bool ∑  E2: bool ∑  E1&&E2: bool

  12. Type System: stm ∑ id: T ∑ E: T ∑|- id:=E: OK ∑ E: int ∑ print(E): OK ∑ E: bool ∑ printBool(E): OK

  13. Type System: dec, prog id ∈dom(∑) ∑; T id DS: ∑’ ∑  T id; DS : ∑’ ∑  : ∑ ∑  S: OK DS: ∑  DS S: OK

  14. Example // Whether or not the following program is // well-typed? int x; int y; print (x+y); int x ∈ ∑ int y ∈ ∑ ∑ x: int ∑ y: int int x; int y  : ∑ int x  int y: ∑ ∑ x+y: int  int x; int y: ∑ ∑ print(x+y): OK   int x; int y; print(x+y): OK

  15. Elaboration of Expressions T elab_exp (sigma, num) = return int ∑ num: int

  16. Elaboration of Expressions T elab_exp (sigma, true) = return bool ∑ true: bool

  17. Elaboration of Expressions T elab_exp (sigma, false) = return bool ∑ false: bool

  18. Elaboration of Expressions T elab_exp (sigma, id) = T ty = Table_lookup (sigma, id); if (ty==NULL) error (“variable not declared”); return ty; T id ∈ ∑ ∑ id : T

  19. ∑ e1: int ∑ e2: int ∑ e1+e2: int Elaboration of Expressions T elab_exp (sigma, e1+e2) = type t1 = elab_exp (sigma, e1) type t2 = elab_exp (sigma, e2) switch (t1, t2){ case (Int, Int): return Int; case (Int, _): error (“e2 should be int”) case(_, Int): error (“e1 should be int”) default: error (“should both be int”) }

  20. Elaboration of Expressions type elab_exp (sigma, e1&&e2) = type t1 = elab_exp (sigma, e1) type t2 = elab_exp (sigma, e2) switch (t1, t2){ case (Bool, Bool): return Bool; case (Bool, _): error(“e2 should be bool”) case(_, Bool): error(“e1 should be bool”) default: error (“should both be bool”) } ∑ e1: bool ∑ e2: bool ∑ e1&&e2: bool

  21. Elaboration of Statements void elab_stm (sigma, x=e) = type t1 = elab_exp (sigma, x); type t2 = elab_exp (sigma, e); if (t1 != t2) error (“different types in assigment”); ∑ x: ty ∑ e: ty ∑ x:=e: OK

  22. Elaboration of Statements void elab_stm (sigma, print(e)) = type ty = elab_exp (sigma, e) if (ty != INT) error (“type should be INT”); ∑  e: int ∑ print(e): OK

  23. Elaboration of Statements void elab_stm (sigma, printBool(e)) = type ty = elab_exp (sigma, e) if (ty != BOOL) error (“type should be BOOL”); ∑ e: bool ∑ printBool(e): OK

  24. Elaboration of Declarations Sigma elab_decs (sigma, decs) = if (decs==[]) return sigma; // decs = type ID; decs’ if (ID\in sigma) error (“duplicated decl”); new_sigma = enter_table (sigma, type ID) return elab_decs(new_sigma, decs’); ID ∈dom(∑) ∑; type ID  decs: ∑’ ∑ type ID; decs: ∑’ ∑  : ∑

  25. Elaboration of Programs void elab_prog (decs stm) = sigma = elab_decs (decs); elab_stm (sigma, stm)  decs: ∑ ∑stm: OK  ∑ decs stm: OK

  26. Moral • There may be other information associated with identifiers, not just types, say: • Scope • Storage class • Access control info’ • … • All these details are handled by symbol tables (∑)!

  27. Implementation • Must be efficient! • lots of variables, functions, etc • Two basic approaches: • Functional • symbol table is implemented as a functional data structure (e.g., red-black tree), with no tables ever destroyed or modified • Imperative • a single table, modified for every binding added or removed • This choice is largely independent of the implementation language

  28. Functional Symbol Table • Basic idea: • when implementing σ2 = σ1 + {x:t} • creating a new table σ2, instead of modifyingσ1 • when deleting, restore to the old table • A good data structure for this is BST or red-black tree

  29. BST Symbol Table  ’ c: int c: int e: int a: char b: double

  30. Possible Functional Interface signature SYMBOL_TABLE = sig type ‘a t type key val empty: ‘a t val insert: ‘a t * key * ‘a -> ‘a t val lookup: ‘a t * key -> ‘a option end

  31. Imperative Symbol Tables • The imperative approach almost always involves the use of hash tables • Need to delete entries to revert to previous environment • made simpler because deletes follow a stack discipline • can maintain a stack of entered symbols, so that they can be later popped and removed from the hash table

  32. Possible Imperative Interface signature SYMBOL_TABLE = sig type ‘a t type key val insert: ‘a t * key * ‘a -> unit val lookup: ‘a t * key -> ‘a option val delete: ‘a t * key -> unit val beginScope: unit -> unit val endScope: unit -> unit end

  33. Implementation of Symbols • For several reasons, it will be useful at some point to represent symbols as elements of a small, densely packed set of identities • fast comparisons (equality) • for dataflow analysis, we will want sets of variables and fast set operations • It will be critically important to use bit strings to represent the sets • For example, your liveness analysis algorithm • More on this later

  34. Scope • How to handle lexical scope? • Many choices: • One table + insert and remove bindings during elaboration, as we enters and leaves a local scope • Stack of tables + insertion and removal always operated on stack-top • dragon compiler makes use of this

  35. One-table approach int x; σ={x:int} int f () σ1 = σ + {f:…} = {x:int, f:…} { if (4) { int x; σ2 = σ1 + {x:int} = {x:…, f:…, x:…} x = 6; } σ1 else { int x; σ4 = σ1 + {x:int} = {x:…, f:…, x:…} x = 5; } σ1 x = 8; } σ1 Shadowing: “+” is not commutative!

  36. Name Space struct list { int x; struct list *list; } *list; void walk (struct list *list) { list: printf (“%d\n”, list->x); if (list = list->list) goto list; }

  37. Name Space • It’s trivial to handle name space • one symbol table for each name space • Take C as an example: • Several different name spaces • labels • tags • variables • So …

  38. Types • The representation of types is highly language-dependent • Some key considerations: • name vs. structural equivalence • mutually recursive type definitions • errors handling

  39. Name vs. Structural Equivalence struct A { int i; } x; struct B { int i; } y; x = y; • In a language with structural equivalence, this program is legal • But not in a language with name equivalence (e.g., C) • For name equivalence, can generate a unique symbol for each defined type • For structural equivalence, need to recursively compare the types

  40. Mutually recursive type definitions • To process recursive and mutually recursive type definitions, need a placeholder • in ML, an option ref • in C, a pointer • in Java, bind method (read Appel) struct A { int data; struct A *next; struct B *b; }; struct B {…};

  41. Error Diagnostic • To recover from errors, it is useful to have an “any” type • makes it possible to continue more type-checking • In practice, use “int” or guess one • Similarly, a “void” type can be used for expressions that return no value • Source locations are annotated in AST!

  42. Summary • Elaboration checks the context-sensitive properties of programs • must take care of semantics of source programs • and may translate into more low-level forms • Usually the most big (complex) part in a compiler!

More Related