CMPE 152: Compiler Design October 22 Class Meeting

CMPE 152: Compiler Design October 22 Class Meeting. Department of Computer Engineering San Jose State University Fall 2019 Instructor: Ron Mak www.cs.sjsu.edu/~mak. ANTLR 4 Review. Feed ANTLR a .g4 grammar file . ANTLR generates (in Java or C++): a parser a lexer (scanner)

CMPE 152: Compiler Design October 22 Class Meeting

  2. ANTLR 4 Review • Feed ANTLR a .g4grammar file. • ANTLR generates (in Java or C++): • a parser • a lexer (scanner) • parse tree utilities • Therefore, for your compiler projects, you don’t have to write that code. • You must have a correct grammar file.

  3. ANTLR 4 Plugin for Eclipse • If you use the ANTLR 4 plugin, Eclipse will automatically generate a syntax diagramfrom the grammar. • See the tutorials at: http://www.cs.sjsu.edu/~mak/tutorials/index.html • Especially: http://www.cs.sjsu.edu/~mak/tutorials/InstallANTLR4Cpp.pdf • The plugin will generate a parse tree from a source program, according to the grammar.

  4. ANTLR 4 Plugin for Eclipse, cont’d • Do an External Tools Configuration to specify-Dlanguage="Cpp" to generate a parser and a lexer written in C++ • Otherwise, the default is Java. • Create a standard Eclipse C++ project and put the grammar file in it. • Right-click the grammar file and select Run As Generate ANTLR Recognizer. • Eclipse can also generate automatically if the grammar file changes.

  5. ANTLR Workflow The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  6. Syntax Error Handling • An ANTLR-generated parser has basic syntax error handling and recovery. • You can improve the error handling. 193 a = 5 b = 6 (a+b*2 (1+2)*3 Parsetree (Lispformat): (prog (stat (expr 193) \n) (stata = (expr 5) \n) (statb = (expr 6) \n) (stat (expr ( (expr (expra) + (expr (exprb) * (expr 2))) <missing ')'>) \n) (stat (expr (expr ( (expr (expr 1) + (expr 2)) )) * (expr 3)) \n)) line 4:6 missing ')' at '\n'

  7. Resolving Ambiguities • Is f() a function call as a standalone statement, or a function call in an expression? stat: expr ';' | ID '(' ')' ';' ; expr: ID '(' ')' | INT ; The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  8. Resolving Ambiguities, cont’d • Is begin a reserved word or an identifier? • ANTLR resolves an ambiguity by choosing the first alternative in the grammar. BEGIN : 'begin' ; ID : [a-z]+ ;

  9. ANTLR Parse Trees • A token stream is the “pipe” between the lexer and the parser. • Each token object records the start and stop character indexes into the character stream. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  10. ANTLR Parse Trees, cont’d • ANTLR generates a RuleNode subclass for each grammar rule. • They are called context objectsbecause they record everything about the recognition phaseof a rule. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  11. ANTLR Parse Trees, cont’d • The ANTLR-generated parser has corresponding parse tree node class names. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  12. ANTLR Pcl sample.pas • Pcl, a tiny subset of Pascal. • Use ANTLR to generate a Pcl parser and lexer and integrate them with our Pascal interpreter’s symbol table code. • ANTLR doesn’t do symbol tables • Parse a Pcl program and print the symbol table. • Sample program sample.pas: PROGRAM sample; VAR     i, j : integer;     alpha, beta5x : real; BEGIN     REPEAT         j := 3;         i := 2 + 3*j     UNTIL i >= j + 2;      IF i <= j THEN i := j;     IF j > i THEN i := 3*j     ELSE BEGIN         alpha := 9;         beta5x := alpha/3 - alpha*2;     END END.

  13. Pcl.g4 grammar Pcl;  // A tiny subset of Pascal program : header block '.' ; header  : PROGRAM IDENTIFIER ';' ; block   : declarations compound_stmt ; declarations : VAR decl_list ';' ; decl_list    : decl ( ';' decl )* ; decl         : var_list ':' type_id ; var_list     : var_id ( ',' var_id )* ; var_id       : IDENTIFIER ; type_id      : IDENTIFIER ; compound_stmt : BEGIN stmt_list END ; stmt : compound_stmt    # compoundStmt      | assignment_stmt  # assignmentStmt      | repeat_stmt      # repeatStmt      | if_stmt          # ifStmt      |                  # emptyStmt      ; Pcl.g4

  14. Pcl.g4, cont’d stmt_list       : stmt ( ';' stmt )* ; assignment_stmt : variable ':=' expr ; repeat_stmt     : REPEAT stmt_list UNTIL expr ; if_stmt         : IF expr THEN stmt ( ELSE stmt )? ; variable : IDENTIFIER ; expr : expr mul_div_op expr     # mulDivExpr      | expr add_sub_op expr     # addSubExpr      | expr rel_op expr         # relExpr      | number                   # numberConst      | IDENTIFIER               # identifier      | '(' expr ')'             # parens      ; number : sign? INTEGER ; sign   : '+' | '-' ; mul_div_op : MUL_OP | DIV_OP ; add_sub_op : ADD_OP | SUB_OP ; rel_op     : EQ_OP | NE_OP | LT_OP | LE_OP | GT_OP | GE_OP ; Pcl.g4

  15. Pcl.g4, cont’d PROGRAM : 'PROGRAM' ; BEGIN   : 'BEGIN' ; END     : 'END' ; VAR     : 'VAR' ; REPEAT  : 'REPEAT' ; UNTIL   : 'UNTIL' ; IF      : 'IF' ; THEN    : 'THEN' ; ELSE    : 'ELSE'; IDENTIFIER : [a-zA-Z][a-zA-Z0-9]* ; INTEGER    : [0-9]+ ; MUL_OP :   '*' ; DIV_OP :   '/' ; ADD_OP :   '+' ; SUB_OP :   '-' ; MUL_OP :   '*' ; DIV_OP :   '/' ; ADD_OP :   '+' ; SUB_OP :   '-' ; EQ_OP : '=' ; NE_OP : '<>' ; LT_OP : '<' ; LE_OP : '<=' ; GT_OP : '>' ; GE_OP : '>=' ; NEWLINE : '\r'? '\n' -> skip  ; WS      : [ \t]+ -> skip ; Pcl.g4

  16. Pcl Syntax Diagrams

  17. Pcl Syntax Diagrams, cont’d

  18. Pcl Syntax Diagrams, cont’d

  19. Pcl Syntax Diagrams, cont’d

  20. Pcl Syntax Diagrams, cont’d

  21. Assignment #6 • Write the first draft of the ANTLR 4 grammar file for your source language. • Use the Eclipse ANTLR plugin. • Generate a syntax diagram from the grammar. • Generate a parse tree from the source program. • Generate the parser and lexer. • For the External Tool Configuration, use: • Compile a sample source program. • Due: Friday, November 1. -no-listener -visitor -encoding UTF-8 -Dlanguage=Cpp

  22. Starter Main Program for Assignment #6 #include <iostream> #include <fstream> #include "antlr4-runtime.h" #include "PclLexer.h" #include "PclParser.h" #include "PclBaseVisitor.h" using namespace std; using namespace antlrcpp; using namespace antlr4; int main(intargc, const char *args[]) { ifstream ins; ins.open(args[1]); ANTLRInputStream input(ins); PclLexer lexer(&input); CommonTokenStream tokens(&lexer); PclParser parser(&tokens);     tree::ParseTree *tree = parser.program(); PclBaseVisitor compiler; compiler.visit(tree);     return 0; }

