160 likes | 189 Views
Leonidas Fegaras. Abstract Syntax. Abstract Syntax Tree (AST). A parser typically generates an Abstract Syntax Tree (AST): A parse tree is not an AST. get token. get next character. AST. scanner. parser. source file. token. E T E F T E
E N D
Leonidas Fegaras Abstract Syntax
Abstract Syntax Tree (AST) • A parser typically generates an Abstract Syntax Tree (AST): • A parse tree is not an AST get token get next character AST scanner parser source file token E T E F T E F T F id(x) + id(y) * id(z) + x * y z
Building Abstract Syntax Trees in Java abstract class Exp { } class IntegerExp extends Exp { public int value; public IntegerExp ( int n ) { value=n; } } class TrueExp extends Exp { public TrueExp () {} } class FalseExp extends Exp { public FalseExp () {} } class VariableExp extends Exp { public String value; public VariableExp ( String n ) { value=n; } }
Exp (cont.) class BinaryExp extends Exp { public String operator; public Exp left; public Exp right; public BinaryExp ( String o, Exp l, Exp r ) { operator=o; left=l; right=r; } } class UnaryExp extends Exp { public String operator; public Exp operand; public UnaryExp ( String o, Exp e ) { operator=o; operand=e; } }
Exp (cont.) class CallExp extends Exp { public String name; public List<Exp> arguments; public CallExp ( String nm, List<Exp> s ){ name=nm; arguments=s; } } class ProjectionExp extends Exp { public Exp value; public String attribute; public ProjectionExp ( Exp v, String a ) { value=v; attribute=a; } }
Exp (cont.) • class RecordElement { • public String attribute; • public Exp value; • public RecordElement ( String a, Exp v ) { attribute=a; value=v; } • } • class RecordExp extends Exp { • public List<RecordElement> elements; • public RecordExp ( List<RecordElement> el ) { elements=el; } • } • … or better: • class RecordExp extends Exp { • public Map<String,Exp> elements; • public RecordExp ( Map<String,Exp> el ) { elements=el; } • }
Examples • The AST for the input (x-2)+3 new BinaryExp("+", new BinaryExp("-", new VariableExp("x"), new IntegerExp(2)), new IntegerExp(3)) • The AST for the input f(x.A,true) new CallExp("f", Arrays.asList(new ProjectionExp(new VariableExp("x"), "A"), new TrueExp()))
Building ASTs in Scala Use case classes: sealed abstract class Exp case class TrueExp () extends Exp case class FalseExp () extends Exp case class IntegerExp ( value: Int ) extends Exp case class StringExp ( value: String ) extends Exp case class VariableExp ( name: String ) extends Exp case class BinaryExp ( operator: String, left: Exp, right: Exp ) extends Exp case class UnaryExp ( operator: String, operand: Exp ) extends Exp case class CallExp ( name: String, arguments: List[Exp] ) extends Exp case class ProjectionExp ( record: Exp, attribute: String ) extends Exp case class RecordExp ( arguments: List[(String,Exp)] ) extends Exp For example, the AST for the input (x-2)+3 BinaryExp("+",BinaryExp("-",VariableExp("x"),IntegerExp(2)),IntegerExp(3)) the AST for the input f(x.A,true) CallExp("f",List(ProjectionExp(VariableExp("x"),"A"),TrueExp()))
Adding Semantic Actions to a Parser int E () { int left = T(); if (current_token == '+') { read_next_token(); return left + E(); } else if (current_token == '-') { read_next_token(); return left - E(); } else error(); }; int T () { if (current_token=='num') { int n = num_value; read_next_token(); return n; } else error(); }; • Right-associative grammar: E ::= T + E | T - E T ::= num • After left factoring: E ::= T E' E' ::= + E | - E T ::= num • Recursive descent parser:
Adding Semantic Actions to a Parser int E () { return Eprime(T()); }; int Eprime ( int left ) { if (current_token=='+') { read_next_token(); return Eprime(left + T()); } else if (current_token=='-') { read_next_token(); return Eprime(left - T()); } else return left; }; int T () { if (current_token=='num') { int n = num_value; read_next_token(); return n; } else error(); }; • Left-associative grammar: E ::= E + T | E - T T ::= num • After left recursion elimination: E ::= T E' E' ::= + T E' | - T E' | T ::= num • Recursive descent parser:
Table-Driven Predictive Parsers • Use the parse stack to push/pop both actions and symbols but they use a separate semantic stack to execute the actions push(S); read_next_token(); repeat X = pop(); if (X is a terminal or '$') if (X == current_token) read_next_token(); else error(); else if (X is an action) perform the action; else if (M[X,current_token] == "X ::= Y1 Y2 ... Yk") { push(Yk); ... push(Y1); } else error(); until X == '$';
Example • Need to embed actions { code; } in the grammar rules • Suppose that pushV and popV are the functions to manipulate the semantic stack • The following is the grammar of an interpreter that uses the semantic stack to perform additions and subtractions: E ::= T E' $ { print(popV()); } E' ::= + T { pushV(popV() + popV()); } E' | - T { pushV(-popV() + popV()); } E' | T ::= num { pushV(num); } • For example, for 1+5-2, we have the following sequence of actions: pushV(1); pushV(5); pushV(popV()+popV()); pushV(2); pushV(-popV()+popV()); print(popV());
Bottom-Up Parsers • can only perform an action after a reduction • We can only have rules of the form X ::= Y1 ... Yn { action } where the action is always at the end of the rule; this action is evaluated after the rule X ::= Y1 ... Yn is reduced • How? In addition to state numbers, the parser pushes values into the parse stack • If we want to put an action in the middle of the right-hand-side of a rule, we use a dummy non-terminal, called a marker For example, X ::= a { action } b is equivalent to X ::= M b M ::= a { action }
CUP • Both terminals and non-terminals are associated with typed values • these values are instances of the Object class (or of some subclass of the Object class) • the value associated with a terminal is in most cases an Object, except for an identifier which is a String, for an integer which is an Integer, etc • the typical values associated with non-terminals in a compiler are ASTs, lists of ASTs, etc • You can retrieve the value of a symbol s at the right-hand-side of a rule by using the notation s:x, where x is a variable name that hasn't appeared elsewhere in this rule • The value of the non-terminal defined by a rule is called RESULT and should always be assigned a value in the action • eg if the non-terminal E is associated with an Integer object, then E ::= E:n PLUS E:m {: RESULT = n+m; :}
Machinery • The parse stack elements are of type struct( state: int, value: Object ) • int is the state number • Object is the value • When a reduction occurs, the RESULT value is calculated from the values in the stack and is pushed along with the GOTO state • Example: after the reduction by E ::= E:n PLUS E:m {: RESULT = n+m; :} the RESULT value is stack[top-2].value + stack[top].value which is the new value pushed in the stack along with the GOTO state
ASTs in CUP (calc.cup) • Need to associate each non-terminal symbol with an AST type • Using Scala case classes in Java (!) non terminal Expr exp; non terminal List expl; exp ::= exp:e1 PLUS exp:e2 {: RESULT = new BinOpExp(“+”,e1,e2); :} | exp:e1 MINUS exp:e2 {: RESULT = new BinOpExp(“-”,e1,e2); :} | id:nm LP expl:el RP {: RESULT = new CallExp(nm,el); :} | INT:n {: RESULT = new IntConst(n); :} ; expl ::= expl:el COMMA exp:e {: RESULT = append(e,el); :} | exp:e {: RESULT = cons(e,nil); :} ;