1 / 38

Abstract Syntax Trees

Abstract Syntax Trees. Compiler Baojian Hua bjhua@ustc.edu.cn. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Recap. Lexer Program source to token sequence Parser token sequence, and answer Y or N Today’s topic:

tglover
Download Presentation

Abstract Syntax Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Abstract Syntax Trees Compiler Baojian Hua bjhua@ustc.edu.cn

  2. Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

  3. Recap • Lexer • Program source to token sequence • Parser • token sequence, and answer Y or N • Today’s topic: • abstract syntax trees

  4. E E * E 15 ( E ) E + E 3 4 Abstract Syntax Trees • Parse trees encodes the grammatical structure of the source program • However, they contain a lot of unnecessary information • What are essential here?

  5. E E * E 15 ( E ) E + E 3 4 Abstract Syntax Trees • For the compiler to understand an expression, it only need to know operators and operands • punctuations, parentheses, etc. are not needed • Similar for statements, functions, etc.

  6. E E * E 15 ( E ) E + E 3 4 Abstract Syntax Trees * 15 + 3 4 Parse tree Abstract syntax tree

  7. Concrete vs Abstract Syntax • Concrete Syntax is needed for parsing • includes punctuation symbols, factoring, elimination of left recursion, depends on the format of the input • Abstract Syntax is a simpler, more convenient internal representation • clean interface between the parser and the later phases of the compiler

  8. S E E + T T T * F x F F 2 3 Concrete and Abstract Syntax 2 + 3 * x E ::= E + T | T T ::= T * F | F F ::= id | num | ( E )

  9. Concrete and Abstract Syntax 2 + 3 * x E ::= id | num | E + E | E * E | ( E ) + 2 * 3 x

  10. AST Data Structures • In the compiler, abstract syntax makes use of the implementation language to represent aspects of the grammatical structure • Highly target and implementation languages dependent • art more than science

  11. AST in C

  12. AST for “exp” in C /* data structures */ typedef struct E *E; enum kind {ID, INT, ADD, TIMES}; struct E { enum kind kind; union { char *id; int num; struct {E e1; E e2;} add; struct {E e1; E e2;} times; } u; }; // This technique is called tagged-union. E -> id | num | E + E | E * E | ( E )

  13. AST in C /* sample program “2+3*x” */ E e1 = malloc (sizeof (*e1)); e1->kind = INT; e1->u.num = 3; E e2 = malloc (sizeof (*e2)); e2->kind = ID; e2->u.id = “x”; E e3 = malloc (sizeof (*e3)); e3->kind = TIMES; e3->u.times.e1 = e1; e2->u.times.e2 = e2; … /* boring and error-prone :-( */ E ::= id | num | E + E | E * E | ( E )

  14. AST for “stm” in C /* data structures */ typedef List<S> SS; typedef struct S *S; enum kind {ASSIGN, PRINT}; struct S { enum kind kind; union { struct {char *id; E e;} assign; E print; } u; }; (* to encode “x:=3; print(x) *) prog = …; // leave to you… SS -> S | SS ; S S -> id := E | print (E)

  15. Operations are tree-walkings (* pretty printing *) int pp (E e){ switch (e->kind) { case INT: printf (“%d”, e->u.num); return; case ID: printf (“%s”, e->u.id); return; case ADD: printf (“(“); pp (e->u.add.e1); printf (“)”); printf (“ + “); printf (“(“); pp (e->u.add.e2); printf (“)”); return; case TIMES: /* similar */ default: error (“compiler bug”); } }

  16. Operations are tree-walkings (* number of nodes in an AST *) int numNodes (E e) { switch (e->kind) { case INT: return 1; case ID: return 1; case ADD: case TIMES: return 1 + numNodes (e->u.add.e1) + numNodes (e->u.add.e2); default: error (“compiler bug”); } } C compiler is stupid!

  17. AST in F#

  18. AST for “exp” in F# (* data structures *) type E = Int of int | Id of string | Add of E * E | Times of E * E E ::= id | num | E + E | E * E | ( E ) (* to encode “2+3*x” *) val prog = Add (Int 2 , Times (Int 3 , Id “x”)) (* Easy and happy! *)

  19. AST for “stm” in SML /* data structures */ type SS = S list type S = Assign of string * exp | Print of exp (* to encode “x:=3; print(x)” *) val prog = [Assign (“x”, 3), Print (“x”)] SS -> S | SS ; S S -> x := E | print (E)

  20. AST in F# (* number of nodes *) let rec numNodes e = match e with Int _ => 1 | Id _ => 1 | Add (e1, e2) => 1 + (numNodes e1) + (numNodes e2) | Times (e1, e2) => 1 + (numNodes e1) + (numNodes e2) (* Note this may be too inefficient, why? *)

  21. AST in F# (* tail-recursion using caching *) let rec numNodes (e, n) = match e with Int _ => 1 + n | Id _ => 1 + n | Add (e1, e2) => let val n’ = numNodes (e1, n) in numNodes (e2, 1+n’) end | Times (e1, e2) => …(*similar)

  22. AST in F# (* yet another version using reference *) val nodes = ref 0; val op ++ = fn x => x := !x + 1 let rec numNodes e = match e with Int _ => ++ nodes | Id _ => ++ nodes | Add (e1, e2) => (numNodes e1 ; ++ nodes ; numNodes e2) ) | Times (e1, e2) => … (* similar *)

  23. AST in Java

  24. AST for “exp” in Java /* data structures */ abstract class Exp {} class IntExp extends Exp { int i; IntExp (int i){ this.i = i; } } // contructors omitted from the following classes class IdExp extends Exp {String id;} class AddExp extends Exp {Exp e1; Exp e2;} class TimesExp extends Exp {Exp e1; Exp e2;} E ::= id | num | E + E | E * E | ( E )

  25. Local Class Hierarchy Exp E ::= id | num | E + E | E * E | ( E ) IntExp IdExp AddExp TimesExp /* to encode “2+3*x” */ Exp e = new AddExp(new IntExp (2) , new TimesExp (new IntExp (3) , new IdExp (“x”))) /* Not so ugly as that in C, but still boring */

  26. AST for “stm” in Java /* data structures */ class Stms{ LinkedList<Stm> stms; } class Stm{} class AssignStm extends Stm{ String x; Exp e; } class PrintStm extends Stm {Exp e;} (* to encode “x:=3; print(x)” *) val prog = LinkedList.addAll(new AssignStm (…) , new PrintStm(…)); stms -> stm | stms ; stm stm -> x := e | print (e)

  27. AST in Java (* number of nodes again *) int numNodes (Exp e) { if (e instanceof IntExp) return 1; else if (e instanceof IdExp) return 1; else if (e instanceof AddExp) { AddExp f = (Add)e; return 1 + numNodes(f.e1) + numNodes(f.e2); } … } But this break the modularity of Java. A better way is to use the so-called visitor pattern. Read Tiger chap 4 and do lab 2.

  28. How to construct AST automatically?

  29. AST Generations • Attribute-grammar scheme • each nonterminal may have a semantic value v associated with it • when the parser recognizes rule X -> s1 … sn • a semantic action will be executed • uses semantic values from symbols in si • when parsing completes successfully • parser returns semantic value associated with the start symbol • usually an abstract syntax tree

  30. AST Generations in tools • In a top-down parser, ASTs are returned (recursively) as values • Yacc-like tools encode this strategy in semantics actions

  31. List<S> S E /* AST generation in recursive decedent parser */ List<S> parse_stms () = List<S> list = new List<S>(); S stm = parse_stm (); list.addLast(stm); while (current_token == ;) eat (;); stm = parse_stm (); list.addLast (stm); return list; SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E

  32. List<S> S E /* AST generation in recursive decedent parser */ S parse_stm () = switch (current_token) case ID: String x = current_token; // remember the “x” eat(ASSIGN); E exp = parse_exp (); S stm = new AssignStm (x, exp); return stm; case PRINT: eat(PRINT); eat (LPAREN); E exp = parse_exp (); eat (RPAREN); S stm = new PrintStm (exp); return stm; SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E

  33. List<S> S E /* AST generation in recursive decedent parser */ E parse_addexp () = E e1 = parse_timesexp (); while (current_token == +) eat (+); E e2 = parse_timesexp (); E e3 = new AddExp (e1, e2); return e3; E parse_timesexp () = E e1 = parse_atom (); while (current_token == *) eat (*); E e2 = parse_atom (); E e3 = new AddExp (e1, e2); return e3; SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E

  34. List<S> S E /* AST generation in recursive decedent parser */ E parse_atom () = switch (current_token) case ID: return new IdExp (current_token); case NUM: return new NumExp (current_token); default: error (“want ID or NUM”, but got …); SS -> S | S ; SS S -> id := E | print (E) E -> id | num | E + E | E * E

  35. E AST generation in LR parser 2 F T E E + E + 3 E + F E + T 2 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 + 3 * 4 3 * 4 * 4  * 4  * 4 S + * 2 + T E 3 T * F 2 T 4 Each nonterminal is associated with a tree. 3 F 4 2 F 4 3 2 3 2

  36. AST generation in LR parser e -> e PLUS e ($$ = Add ($1, $3)) | e TIMES e ($$ = Times ($1, $3)) | ID ($$ = Id ($1)) | NUM ($$ = Num ($1))

  37. Source Position • In one-pass compiler, error messages are precise • early compilers never worry about this • But in a multi-pass compiler, source positions must be stored in AST for later use (* Example *) class AddExp{ Exp left; Exp right; int lineNum; int columnNum; }

  38. Summary • Abstract syntax trees are compiler internal data structures of source programs • interface between front-end and compiler later parts • Abstract syntax trees design is language-dependent • more art than science

More Related