1 / 109

Abstract Syntax

Abstract Syntax. CMSC CS431. Abstract Syntax Trees. So far a parser traces the derivation of a sequence of tokens The rest of the compiler needs a structural representation of the program Abstract syntax trees Like parse trees but ignore some details Abbreviated as AST.

hera
Download Presentation

Abstract Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Abstract Syntax CMSC CS431

  2. Abstract Syntax Trees • So far a parser traces the derivation of a sequence of tokens • The rest of the compiler needs a structural representation of the program • Abstract syntax trees • Like parse trees but ignore some details • Abbreviated as AST

  3. Abstract Syntax Tree. (Cont.) • Consider the grammar E  int | ( E ) | E + E • And the string 5 + (2 + 3) • After lexical analysis (a list of tokens) int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • During parsing we build a parse tree …

  4. Evaluation of semantic rules • Parse-tree methods (compile time) • Build a parse tree for each input • Build a dependency graph from the parse tree • Obtain evaluation order from a topological order of the dependency graph • Rule-based methods (compiler-construction time) • Predetermine the order of attribute evaluation for each production • Oblivious methods (compiler-construction time) • Evaluation order is independent of semantic rules • Evaluation order forced by parsing methods • Restrictive in acceptable attribute definitions

  5. Compile-time semantic evaluation Source Program Lexical Analyzer Program input Tokens Syntax Analyzer Parse tree / Abstract syntax tree Semantic Analyzer Results interpreters compilers Intermediate Code Generator Attributed AST Code Optimizer Code Generator Target Program

  6. Abstract Syntax vs. Concrete Syntax • Concrete syntax: the syntax programmers write • Example: different notations of expressions • Prefix + 5 * 15 20 • Infix 5 + 15 * 20 • Postfix 5 15 20 * + • Abstract syntax: the syntax recognized by compilers • Identifies only the meaningful components • The operation • The components of the operation e Parse Tree for 5+15*20 Abstract Syntax Tree for 5 + 15 * 20 e e + + e * e 5 * 5 20 20 15 15

  7. Abstract syntax trees • Condensed form of parse tree for representing language constructs • Operators and keywords do not appear as leaves • They define the meaning of the interior (parent) node • Chains of single productions may be collapsed S If-then-else B THEN S1 ELSE S2 IF B S1 S2 E + + T E 3 5 5 T 3

  8. Constructing AST • Use syntax-directed definitions • Problem: construct an AST for each expression • Attribute grammar approach • Associate each non-terminal with an AST • Each AST: a pointer to a node in AST E.nptr T.nptr • Definitions: how to compute attribute? • Bottom-up: synthesized attribute if we know the AST of each child, how to compute the AST of the parent? Grammar: E ::= E + T | E – T | T T ::= (E) | id | num

  9. Constructing AST for expressions (by Hand) • Associate each non-terminal with an AST • E.nptr, T.nptr: a pointer to ASTtree • Synthesized attribute definition: • If we know the AST of each child, how to compute the AST of the parent?

  10. Example: constructing AST Bottom-up parsing: evaluate attribute at each reduction 1. reduce 5 to T1 using T::=num: T1.nptr = leaf(5) 2. reduce T1 to E1 using E::=T: E1.nptr = T1.nptr = leaf(5) 3. reduce 15 to T2 using T::=num: T2.nptr=leaf(15) 4. reduce T2 to E2 using E::=T: E2.nptr=T2.nptr = leaf(15) 5. reduce b to T3 using T::=num: T3.nptr=leaf(b) 6. reduce E2-T3 to E3 using E::=E-T: E3.nptr=node(‘-’,leaf(15),leaf(b)) 7. reduce (E3) to T4 using T::=(E): T4.nptr=node(‘-’,leaf(15),leaf(b)) 8. reduce E1+T4 to E5 using E::=E+T: E5.nptr=node(‘+’,leaf(5), node(‘-’,leaf(15),leaf(b))) Parse tree for 5+(15-b) E5 E1 + T4 T1 ( E3 ) E2 - T3 5 T2 b 15

  11. Implementing AST in C E ::= E + T | E – T | T T ::= (E) | id | num • Define different kinds of AST nodes • typedef enum {PLUS, MINUS, ID, NUM} ASTNodeTag; • Define AST node typedef struct ASTnode { AstNodeTag kind; union { symbol_table_entry* id_entry; int num_value; struct ASTnode* opds[2]; } description; }; • Define AST node construction routines • ASTnode* mkleaf_id(symbol_table_entry* e); • ASTnode* mkleaf_num(int n); • ASTnode* mknode_plus(struct ASTnode* opd1, struct ASTNode* opd2); • ASTnode* mknode_minus(struct ASTnode* opd1, struct ASTNode* opd2); Grammar:

  12. Implementing AST in Java E ::= E + T | E – T | T T ::= (E) | id | num • Define different kinds of AST nodes • typedef enum {PLUS, MINUS, ID, NUM} ASTNodeTag; • Define AST node class ASTexpression { public ASTNodeTag kind(); }; class ASTidentifier inherit ASTexpression { private symbol_table_entry* id_entry; … } class ASTvalue inherit ASTexpression { private int num_value; … } class ASTplus inherit ASTexpression { private ASTnode* opds[2]; … } Class ASTminus inherit ASTexpression { private ASTnode* opds[2]; ... } • Define AST node construction routines • ASTexpression* mkleaf_id(symbol_table_entry* e); • ASTexpression* mkleaf_num(int n); • ASTexpression* mknode_plus(struct ASTnode* opd1, struct ASTNode* opd2); • ASTexpression* mknode_minus(struct ASTnode* opd1, struct ASTNode* opd2); Grammar:

  13. More ASTs Abstract syntax: S::= if-else E S S | while E S | E | ε E::= var | num | true | false | E bop E | uop E bop ::= < | <= | > | >= | && | = | + | * | …. uop ::= - | * | & | … class ASTstmt {…} class ASTifElse inherit ASTstmt { private ASTexpr* cond; private ASTstmt* tbranch; private ASTstmt* fbranch; …} class ASTwhile inherit ASTstmt { private ASTexpr* cond; private ASTstmt* body;…} class ASTexpr inherit ASTstmt {…} class ASTvar inherit ASTexpr {…} Abstract syntax tree if-else < while a b = < * b a 100 2 a

  14. AST Covered • We built AST by hand in the 1st Project • Lets start to see about an automated approach. • Lets see what the Galles text (see slide 15-42) comes to say about AST • Later we will look at some code

  15. Semantic Actions: An Example • Consider the grammar E  int | E + E | ( E ) • For each symbol X define an attribute X.val • For terminals, val is the associated lexeme • For non-terminals, val is the expression’s value (and is computed from values of subexpressions) • We annotate the grammar with actions: E  int { E.val = int.val } | E1 + E2 { E.val = E1.val + E2.val } | ( E1 ) { E.val = E1.val }

  16. Semantic Actions: An Example (Cont.) • String: 5 + (2 + 3) • Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ Productions Equations E  E1 + E2 E.val = E1.val + E2.val E1 int5 E1.val = int5.val = 5 E2 ( E3) E2.val = E3.val E3 E4 + E5 E3.val = E4.val + E5.val E4  int2 E4.val = int2.val = 2 E5 int3 E5.val = int3.val = 3

  17. Semantic Actions: Notes • Semantic actions specify a system of equations • Order of resolution is not specified • Example: E3.val = E4.val + E5.val • Must compute E4.val and E5.val before E3.val • We say that E3.val depends on E4.val and E5.val • The parser must find the order of evaluation

  18. Dependency Graph + E • Each node labeled E has one slot for the val attribute • Note the dependencies E2 E1 + int5 5 ( E3 + ) + E4 E5 2 int2 int3 3

  19. Evaluating Attributes • An attribute must be computed after all its successors in the dependency graph have been computed • In previous example attributes can be computed bottom-up • Such an order exists when there are no cycles • Cyclically defined attributes are not legal

  20. Semantic Actions: Notes (Cont.) • Synthesized attributes • Calculated from attributes of descendents in the parse tree • E.val is a synthesized attribute • Can always be calculated in a bottom-up order • Grammars with only synthesized attributes are called S-attributed grammars • Most frequent kinds of grammars

  21. Semantic Actions :Top-down Approach • Recursive-descent interpreter • Consider this grammar S -> E $ E -> T E’ E’-> +T E’ E’ -> - T E’ E-> T -> F T’ T’ -> * F T’ T’ -> / F T’ T’ -> F -> id F -> num F -> ( E ) • Needs “type” of non-terminals and tokens

  22. Recursive-descent interpreter int T() { switch (tok.kind) { case ID: case NUM: case LPAREN return Tprime( F() ); default:print(“expected ID, NUM, or left-paren”); skipto(T_follow); return 0; }} int Tprime(int a) {switch (tok.kind) { case TIMES: eat(TIMES); return Tprime(a*F()); case DIVIDE: eat(DIVIDE); return Tprime(a/F()); case PLUS: case MINUS: case RPAREN: case EOF: return a; default: /* error handling */ …… }}

  23. JavaCC version • Grammar S -> E $ E -> T ( + T | - T)* T -> F ( * F | - F)* F -> id | num | ( E ) Note: E –> T E’ E’ -> + T E’ | - T E’ | e

  24. JavaCC version – void Start() : { int i; } { i=Exp() <EOF> {System.out.println(i); } } int Exp() : { int a, i; } { a=Term() ( “+” i=Term() { a=a+i; } | “-” i=Term() { a=a+i; } )* { return a; } } Int Factor() : { Token t; int i; } { t = <IDENTIFIER > {return lookup(t.image); } | t=<INTEGER_LITERAL> {return Integer.parseInt(t.image);} | “(“ i=Exp() “)” {return i; } }

  25. Semantic Actions – Reduce and Shift • We can now illustrate how semantic actions are implemented for LR parsing • Keep attributes on the stack • On shift a, push attribute for a on stack • On reduce X ® a • pop attributes for a • compute attribute for X • and push it on the stack

  26. Performing Semantic Actions. Example • Recall the example from previous lecture E ® T + E1 { E.val = T.val + E1.val } | T { E.val = T.val } T ® int * T1 { T.val = int.val * T1.val } | int { T.val = int.val } • Consider the parsing of the string 3 * 5 + 8

  27. Performing Semantic Actions. Example |int * int + int shift int3| * int + int shift int3 * | int + int shift int3 * int5| + int reduce T ® int int3 * T5| + int reduce T ® int * T T15| + int shift T15 + | int shift T15 + int8| reduce T ® int T15 + T8|reduce E ® T T15 + E8|reduce E ® T + E E23|accept

  28. Inherited Attributes • Another kind of attribute • Calculated from attributes of parent and/or siblings in the parse tree • Example: a line calculator

  29. A Line Calculator • Each line contains an expression E  int | E + E • Each line is terminated with the = sign L  E = | + E = • In second form the value of previous line is used as starting value • A program is a sequence of lines P   | P L

  30. Attributes for the Line Calculator • Each E has a synthesized attribute val • Calculated as before • Each L has a synthesized attribute val L  E = { L.val = E.val } | + E = { L.val = E.val + L.prev } • We need the value of the previous line • We use an inherited attributeL.prev

  31. Attributes for the Line Calculator (Cont.) • Each P has a synthesized attribute val • The value of its last line P   { P.val = 0 } | P1 L { P.val = L.val; L.prev = P1.val } • Each L has an inherited attribute prev • L.prev is inherited from sibling P1.val • Example …

  32. Example of Inherited Attributes P • val synthesized • prev inherited • All can be computed in depth-first order + L P = + E3 +  0 + E4 E5 2 int2 int3 3

  33. Semantic Actions: Notes (Cont.) • Semantic actions can be used to build ASTs • And many other things as well • Also used for type checking, code generation, … • Process is called syntax-directed translation • Substantial generalization over CFGs

  34. Constructing An AST • We first define the AST data type • Supplied by us for the project • Consider an abstract tree type with two constructors: n mkleaf(n) = PLUS mkplus( , ) = T1 T2 T1 T2

  35. Constructing a Parse Tree • We define a synthesized attribute ast • Values of ast values are ASTs • We assume that int.lexval is the value of the integer lexeme • Computed using semantic actions E  int E.ast = mkleaf(int.lexval) | E1 + E2 E.ast = mkplus(E1.ast, E2.ast) | ( E1 ) E.ast = E1.ast

  36. PLUS PLUS 5 2 3 Parse Tree Example • Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • A bottom-up evaluation of the ast attribute: E.ast = mkplus(mkleaf(5), mkplus(mkleaf(2), mkleaf(3))

  37. Review • We can specify language syntax using CFG • A parser will answer whether s  L(G) • … and will build a parse tree • … which we convert to an AST • … and pass on to the rest of the compiler

  38. Abtract Parse Trees : Expression Grammar Abstract Syntax E -> E + E E -> E – E E -> E * E E -> E / E E -> id E -> num

  39. AST : Node types public abstract class Exp { public abstract int eval(): } public class PlusExp extends Exp { private Exp e1, e2; public PlusExp(Exp a1, Exp a2) { e1=a1; d2=a2; } public int eval() { return e1.eval()+e2.eval(): } } public class Identifier extends Exp {private String f0; public Indenfifier(String n0) { f0 = n0; } public int eval() { return lookup(f0); } } public class IntegerLiteral extends Exp {private String f0; public IntegerLiteral(String n0) { f0 = n0; } public int eval() { return Integer.parseInt(f0); } }

  40. JavaCC Example for AST construction Exp Start() : { Exp e; } { e=Exp() { return e; }} Exp Exp() : { Exp e1, e2; } { e1=Term() ( “+” e2=Term() { e1=new PlusExp(e1,e2); } | “-” e2=Term() { e1=new MinusExp(e1,e2); } )* { return a; } } Exp Factor() : { Token t; Exp e; } { t = <IDENTIFIER > {return new Identifier(t.image); } | t=<INTEGER_LITERAL> {return new IntegerLiteral(t.image);} | “(“ e=Exp() “)” {return e; } }

  41. Positions • Must remember the position in the source file • Lexical analysis, parsing and semantic analysis are not done simultaneously. • Necessary for error reporting • AST must keep the pos fields, which indicate the position within the original source file. • Lexer must pass the information to the parser. • Ast node constructors must be augmented to init the pos fields.

  42. JavaCC : Class Token • Each Token object has the following fields: • int kind; • int beginLine, beginColumn, endLine, endColumn; • String image; • Token next; • Token specialToken; • static final Token newToken(int ofKind); • Unfortunately, ….

  43. Visitors • “syntax separate from interpretation “style of programming • Vs. object-oriented style of programming • “Visitor pattern” • Visitor implements an interpretation. • Visitor object contains a visit method for each syntax-tree class. • Syntax-tree classes contain “accept” methods. • Visitor calls “accept”(what is your class?). Then “accept” calls the “visit” of the visitor.

  44. Example :Expression Classes public abstract class Exp { public abstract int accept(Visitor v): } public class PlusExp extends Exp { private Exp e1, e2; public PlusExp(Exp a1, Exp a2) { e1=a1; d2=a2; } public int accept(Visitor v) { return v.visit(this) ; } } public class Identifier extends Exp {private String f0; public Indenfifier(String n0) { f0 = n0; } public int accept(Visitor v) { return v.visit(this) ; } } public class IntegerLiteral extends Exp {private String f0; public IntegerLiteral(String n0) { f0 = n0; } public int accept(Visitor v) { return v.visit(this) ; } }

  45. An interpreter visitor public interface Visitor { public int visit(PlusExp n); public int visit(Identifier n); public int visit(IntegerLiteral n); } public class Interpreter implements Visitor { public int visit(PlusExp n) { return n.e1.accept(this) + n.e2.accept(this); } public int visit(Identifier n) { return looup(n.f0); } public int visit(IntegerLiteral n) { return Integer.parseInt(n.f0); }

  46. Abstract Syntax for MiniJava (I) Package syntaxtree; Program(MainClass m, ClassDecList c1) MainClass(Identifier i1, Identifier i2, Statement s) ---------------------------- abstract class ClassDecl ClassDeclSimple(Identifier i, VarDeclList vl, methodDeclList m1) ClassDeclExtends(Identifier i, Identifier j, VarDecList vl, MethodDeclList ml) ----------------------------- VarDecl(Type t, Identifier i) MethodDecl(Type t, Identifier I, FormalList fl, VariableDeclList vl, StatementList sl, Exp e) Formal(Type t, Identifier i)

  47. Abstract Syntax for MiniJava (II) abstract class type IntArrayType() BooleanType() IntegerType() IndentifierType(String s) --------------------------- abstract class Statement Block(StatementList sl) If(Exp e, Statement s1, Statement s2) While(Exp e, Statement s) Print(Exp e) Assign(Identifier i, Exp e) ArrayAssign(Identifier i, Exp e1, Exp e2) -------------------------------------------

  48. Abstract Syntax for MiniJava (III) abstract class Exp And(Exp e1, Exp e2) LessThan(Exp e1, Exp e2) Plus(Exp e1, Exp e2) Minus(Exp e1, Exp e2) Times(Exp e1, Exp e2) Not(Exp e) ArrayLookup(Exp e1, Exp e2) ArrayLength(Exp e) Call(Exp e, Identifier i, ExpList el) IntergerLiteral(int i) True() False() IdentifierExp(String s) This() NewArray(Exp e) NewObject(Identifier i) ------------------------------------------------- Identifier(Sting s) --list classes------------------------- ClassDecList() ExpList() FormalList() MethodDeclList() StatementLIst() VarDeclList()

  49. Syntax Tree Nodes - Details package syntaxtree; import visitor.Visitor; import visitor.TypeVisitor; public class Program { public MainClass m; public ClassDeclList cl; public Program(MainClass am, ClassDeclList acl) { m=am; cl=acl; } public void accept(Visitor v) { v.visit(this); } public Type accept(TypeVisitor v) { return v.visit(this); } }

  50. ClassDecl.java package syntaxtree; import visitor.Visitor; import visitor.TypeVisitor; public abstract class ClassDecl { public abstract void accept(Visitor v); public abstract Type accept(TypeVisitor v); }

More Related