280 likes | 419 Views
Course Overview. PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion
E N D
Course Overview PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion • Interpretation • Review
Abstract Syntax Trees • So far we have talked about how to build a recursive descent parser which recognizes a given language described by an (LL 1) EBNF grammar. • Now we will look at • how to represent AST as data structures. • how to modify the parser to construct an AST data structure. • We make heavy use of Object–Oriented Programming! (classes, inheritance, dynamic method binding)
AST Representation: Possible Tree Shapes The possible form of AST structures is determined by an AST grammar (as described earlier in Chapter 1) Example: remember the Mini-triangle abstract syntax Command ::= V-name := ExpressionAssignCmd | Identifier ( Expression )CallCmd | if Expression then Command else CommandIfCmd | while Expression do CommandWhileCmd | let Declaration in CommandLetCmd | Command; CommandSequentialCmd
AST Representation: Possible Tree Shapes Example: remember the Mini-triangle abstract syntax (excerpt below) Command ::= VName := ExpressionAssignCmd | ... AssignCommand V E
AST Representation: Possible Tree Shapes Example: remember the Mini-triangle abstract syntax (excerpt below) Command ::= ... | Identifier ( Expression )CallCmd ... CallCommand Identifier E Spelling
AST Representation: Possible Tree Shapes Example: remember the Mini-triangle abstract syntax (excerpt below) Command ::= ... | if Expression then Command else CommandIfCmd ... IfCommand E C1 C2
AST abstract abstract LHS concrete Tag1 Tag2 … AST Representation: Java (or C++) Data Structures Example: Java classes to represent Mini-Triangle AST’s 1) A common (abstract) super class for all AST nodes public abstract class AST { ... } • 2) A Java class for each “type” of node. • abstract as well as concrete node types LHS ::= ... Tag1 | ... Tag2
Example: Mini Triangle Commands AST’s Command ::= V-name := ExpressionAssignCmd | Identifier ( Expression )CallCmd | if Expression then Command else CommandIfCmd | while Expression do CommandWhileCmd | let Declaration in CommandLetCmd | Command; CommandSequentialCmd public abstract class Command extends AST { ... } public class AssignCommand extends Command { ... } public class CallCommand extends Command { ... } public class IfCommand extends Command { ... } etc.
Example: Mini Triangle Command AST’s Command ::= V-name := ExpressionAssignCmd | Identifier ( Expression )CallCmd | ... public class AssignCommand extends Command { public Vname V; // variable on left side of := public Expression E; // expression on right side of := ... } public class CallCommand extends Command { public Identifier I; // procedure name public Expression E; // actual parameter ... } ...
AST Terminal Nodes public abstract class Terminal extends AST { public String spelling; ... } public class Identifier extends Terminal { ... } public class IntegerLiteral extends Terminal { ... } public class Operator extends Terminal { ... }
AST Construction Of course, every concrete AST class needs a constructor. Examples: public class AssignCommand extends Command { public Vname V; // Left side variable public Expression E; // right side expression public AssignCommand (Vname V, Expression E) { this.V = V; this.E=E; } ... } public class Identifier extends Terminal { public class Identifier (String spelling) { this.spelling = spelling; } ... }
AST Construction We will now show how to refine our recursive descent parser to actually construct an AST. N ::= X private NparseN( ) { // note that return type is N NtheAST; parse X and simultaneously construct theAST return theAST; }
Example: Construction Mini-Triangle AST’s Command ::= single-Command ( ;single-Command )* // old (recognizing only) version: private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); parseSingleCommand( ); } } // AST-generating version private CommandparseCommand( ) { Command theAST; theAST = parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); Command extraCmd = parseSingleCommand( ); theAST = new SequentialCommand (theAST, extraCmd); } return theAST; }
Example: Construction Mini-Triangle AST’s single-Command ::= Identifier ( :=Expression | ( Expression ) ) | ifExpression thensingle-Command elsesingle-Command | while Expression dosingle-Command | letDeclaration insingle-Command | beginCommandend private CommandparseSingleCommand( ) { Command comAST; parse it and construct AST return comAST; }
Example: Construction Mini-Triangle AST’s private CommandparseSingleCommand( ) { Command comAST; switch (currentToken.kind) { case Token.IDENTIFIER: parse Identifier ( := Expression | ( Expression ) ) case Token.IF: parseif Expression then single-Command else single-Command case Token.WHILE: parsewhile Expression do single-Command case Token.LET: parselet Declaration in single-Command case Token.BEGIN: parsebegin Command end default: report syntax error } return comAST; }
Example: Construction Mini-Triangle AST’s ... case Token.IDENTIFIER: // parse Identifier ( := Expression | ( Expression ) ) Identifier idAST = parseIdentifier( ); switch (currentToken.kind) { case Token.BECOMES: acceptIt( ); Expression expAST = parseExpression( ); comAST = new AssignmentCommand (idAST, expAST); break; case Token.LPAREN: acceptIt( ); Expression expAST = parseExpression( ); comAST = new CallCommand (idAST, expAST); accept(Token.RPAREN); break; } break; ...
Example: Construction Mini-Triangle AST’s ... break; case Token.IF: // parseif Expression then single-Command // else single-Command acceptIt( ); Expression expAST = parseExpression( ); accept(Token.THEN); Command thenAST = parseSingleCommand( ); accept(Token.ELSE); Command elseAST = parseSingleCommand( ); comAST = new IfCommand (expAST, thenAST, elseAST); break; case Token.WHILE: ...
Example: Construction Mini-Triangle AST’s ... break; case Token.BEGIN: // parsebegin Command end acceptIt( ); comAST = parseCommand( ); accept(Token.END); break; default: report syntax error } return comAST; }
Syntax Analysis: Scanner Dataflow chart Source Program (Stream of Characters) Scanner Error Reports Stream of “Tokens” Parser Error Reports Abstract Syntax Tree
Scanner Remember: public class Parser { private Token currentToken; private void accept (byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan( ); else report syntax error } private void acceptIt( ) { currentToken = scanner.scan( ); } public void parse( ) { ... ... } We have not yet implemented this
Steps for Developing a Scanner 1) Express the “lexical” grammar in EBNF (do necessary transformations) 2) Implement scanner based on this grammar (details explained later) 3) Modify scanner to keep track of spelling and kind of currently scanned token To save some time we’ll do steps 2 and 3 together
Developing a Scanner Express the “lexical” grammar in EBNF Token ::= Identifier | Integer-Literal | Operator | ;| : |:= | ~ | ( | ) | eot Identifier ::= Letter (Letter | Digit)* Integer-Literal ::= Digit Digit* Operator ::= +| - |* | / | < | > | = Separator ::= Comment | space | eol Comment ::= ! Graphic* eol Next perform substitution and left factorization... Token ::= Letter (Letter | Digit)* | Digit Digit* | +| - |* | / | < | > | = | ;| :(= | e) | ~ | ( | ) | eot Separator ::= ! Graphic* eol | space | eol
Developing a Scanner Now implement the scanner public class Scanner { private char currentChar; private StringBuffer currentSpelling; private byte currentKind; private char take (char expectedChar) { ... } // analogous to accept private char takeIt( ) { ... } // analogous to acceptIt // other private auxiliary methods and scanning methods go here public Tokenscan( ) { ... } }
Developing Scanner Scanner will return instances of Token public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0; INTLITERAL = 1; OPERATOR = 2; BEGIN = 3; CONST = 4; ... ... // in C++ can improve this by using an enum type public Token (byte kind, String spelling) { this.kind = kind; this.spelling = spelling; if spelling matches a keyword then change kind automatically (e.g. “begin” => 3, “const” => 4, …) } ... }
Developing a Scanner public class Scanner { private char currentChar = get first source char; private StringBuffer currentSpelling; private byte currentKind; private char take (char expectedChar) { // analogous to accept if (currentChar == expectedChar) { currentSpelling.append (currentChar); currentChar = get next source char; } else report lexical error } private char takeIt( ) { // analogous to acceptIt currentSpelling.append (currentChar); currentChar = get next source char; } ...
Developing a Scanner ... public Token scan( ) { // get rid of potential separators before scanning a token while ((currentChar == ‘!’) || (currentChar == ‘’) || (currentChar == ‘\n’ ) ) scanSeparator( ); currentSpelling = new StringBuffer( ); currentKind = scanToken( ); return new Token (currentkind, currentSpelling.toString( )); } private void scanSeparator( ) { ... } private byte scanToken( ) { ... } ... Developed in much the same way as parsing methods
Developing a Scanner Token ::= Letter (Letter | Digit)* | Digit Digit* | +| - |* | / | < | > | = | ;| :(=|e) | ~ | ( | ) | eot private byte scanToken( ) { switch (currentChar) { case ‘a’: case ‘b’: ... case ‘z’: case ‘A’: case ‘B’: ... case ‘Z’: scan Letter (Letter | Digit)* return Token.IDENTIFIER; case ‘0’: ... case ‘9’: scan Digit Digit* return Token.INTLITERAL; case ‘+’: case ‘-’: ... : case ‘=’: takeIt( ); return Token.OPERATOR; ...etc... }
Developing a Scanner Look at the identifier case in more detail ... return ... case ‘a’: case ‘b’: ... case ‘z’: case ‘A’: case ‘B’: ... case ‘Z’: scan Letter (Letter | Digit)* return Token.IDENTIFIER; case ‘0’: ... case ‘9’: ... ... return ... case ‘a’: case ‘b’: ... case ‘z’: case ‘A’: case ‘B’: ... case ‘Z’: scan Letter scan (Letter | Digit)* return Token.IDENTIFIER; case ‘0’: ... case ‘9’: ... ... return ... case ‘a’: case ‘b’: ... case ‘z’: case ‘A’: case ‘B’: ... case ‘Z’: takeIt( ); scan (Letter | Digit)* return Token.IDENTIFIER; case ‘0’: ... case ‘9’: ... ... return ... case ‘a’: case ‘b’: ... case ‘z’: case ‘A’: case ‘B’: ... case ‘Z’: takeIt( ); while (isLetter(currentChar) || isDigit(currentChar) ) scan (Letter | Digit) return Token.IDENTIFIER; case ‘0’: ... case ‘9’: ... ... return ... case ‘a’: case ‘b’: ... case ‘z’: case ‘A’: case ‘B’: ... case ‘Z’: takeIt( ); while (isLetter(currentChar) || isDigit(currentChar) ) takeIt( ); return Token.IDENTIFIER; case ‘0’: ... case ‘9’: ...