1 / 22

CMPE 152: Compiler Design October 22 Class Meeting

CMPE 152: Compiler Design October 22 Class Meeting. Department of Computer Engineering San Jose State University Fall 2019 Instructor: Ron Mak www.cs.sjsu.edu/~mak. ANTLR 4 Review. Feed ANTLR a .g4 grammar file . ANTLR generates (in Java or C++): a parser a lexer (scanner)

jbrubaker
Download Presentation

CMPE 152: Compiler Design October 22 Class Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPE 152: Compiler DesignOctober 22 Class Meeting Department of Computer EngineeringSan Jose State UniversityFall 2019Instructor: Ron Mak www.cs.sjsu.edu/~mak

  2. ANTLR 4 Review • Feed ANTLR a .g4grammar file. • ANTLR generates (in Java or C++): • a parser • a lexer (scanner) • parse tree utilities • Therefore, for your compiler projects, you don’t have to write that code. • You must have a correct grammar file.

  3. ANTLR 4 Plugin for Eclipse • If you use the ANTLR 4 plugin, Eclipse will automatically generate a syntax diagramfrom the grammar. • See the tutorials at: http://www.cs.sjsu.edu/~mak/tutorials/index.html • Especially: http://www.cs.sjsu.edu/~mak/tutorials/InstallANTLR4Cpp.pdf • The plugin will generate a parse tree from a source program, according to the grammar.

  4. ANTLR 4 Plugin for Eclipse, cont’d • Do an External Tools Configuration to specify-Dlanguage="Cpp" to generate a parser and a lexer written in C++ • Otherwise, the default is Java. • Create a standard Eclipse C++ project and put the grammar file in it. • Right-click the grammar file and select Run As Generate ANTLR Recognizer. • Eclipse can also generate automatically if the grammar file changes.

  5. ANTLR Workflow The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  6. Syntax Error Handling • An ANTLR-generated parser has basic syntax error handling and recovery. • You can improve the error handling. 193 a = 5 b = 6 (a+b*2 (1+2)*3 Parsetree (Lispformat): (prog (stat (expr 193) \n) (stata = (expr 5) \n) (statb = (expr 6) \n) (stat (expr ( (expr (expra) + (expr (exprb) * (expr 2))) <missing ')'>) \n) (stat (expr (expr ( (expr (expr 1) + (expr 2)) )) * (expr 3)) \n)) line 4:6 missing ')' at '\n'

  7. Resolving Ambiguities • Is f() a function call as a standalone statement, or a function call in an expression? stat: expr ';' | ID '(' ')' ';' ; expr: ID '(' ')' | INT ; The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  8. Resolving Ambiguities, cont’d • Is begin a reserved word or an identifier? • ANTLR resolves an ambiguity by choosing the first alternative in the grammar. BEGIN : 'begin' ; ID : [a-z]+ ;

  9. ANTLR Parse Trees • A token stream is the “pipe” between the lexer and the parser. • Each token object records the start and stop character indexes into the character stream. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  10. ANTLR Parse Trees, cont’d • ANTLR generates a RuleNode subclass for each grammar rule. • They are called context objectsbecause they record everything about the recognition phaseof a rule. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  11. ANTLR Parse Trees, cont’d • The ANTLR-generated parser has corresponding parse tree node class names. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012

  12. ANTLR Pcl sample.pas • Pcl, a tiny subset of Pascal. • Use ANTLR to generate a Pcl parser and lexer and integrate them with our Pascal interpreter’s symbol table code. • ANTLR doesn’t do symbol tables • Parse a Pcl program and print the symbol table. • Sample program sample.pas: PROGRAM sample; VAR     i, j : integer;     alpha, beta5x : real; BEGIN     REPEAT         j := 3;         i := 2 + 3*j     UNTIL i >= j + 2;      IF i <= j THEN i := j;     IF j > i THEN i := 3*j     ELSE BEGIN         alpha := 9;         beta5x := alpha/3 - alpha*2;     END END.

  13. Pcl.g4 grammar Pcl;  // A tiny subset of Pascal program : header block '.' ; header  : PROGRAM IDENTIFIER ';' ; block   : declarations compound_stmt ; declarations : VAR decl_list ';' ; decl_list    : decl ( ';' decl )* ; decl         : var_list ':' type_id ; var_list     : var_id ( ',' var_id )* ; var_id       : IDENTIFIER ; type_id      : IDENTIFIER ; compound_stmt : BEGIN stmt_list END ; stmt : compound_stmt    # compoundStmt      | assignment_stmt  # assignmentStmt      | repeat_stmt      # repeatStmt      | if_stmt          # ifStmt      |                  # emptyStmt      ; Pcl.g4

  14. Pcl.g4, cont’d stmt_list       : stmt ( ';' stmt )* ; assignment_stmt : variable ':=' expr ; repeat_stmt     : REPEAT stmt_list UNTIL expr ; if_stmt         : IF expr THEN stmt ( ELSE stmt )? ; variable : IDENTIFIER ; expr : expr mul_div_op expr     # mulDivExpr      | expr add_sub_op expr     # addSubExpr      | expr rel_op expr         # relExpr      | number                   # numberConst      | IDENTIFIER               # identifier      | '(' expr ')'             # parens      ; number : sign? INTEGER ; sign   : '+' | '-' ; mul_div_op : MUL_OP | DIV_OP ; add_sub_op : ADD_OP | SUB_OP ; rel_op     : EQ_OP | NE_OP | LT_OP | LE_OP | GT_OP | GE_OP ; Pcl.g4

  15. Pcl.g4, cont’d PROGRAM : 'PROGRAM' ; BEGIN   : 'BEGIN' ; END     : 'END' ; VAR     : 'VAR' ; REPEAT  : 'REPEAT' ; UNTIL   : 'UNTIL' ; IF      : 'IF' ; THEN    : 'THEN' ; ELSE    : 'ELSE'; IDENTIFIER : [a-zA-Z][a-zA-Z0-9]* ; INTEGER    : [0-9]+ ; MUL_OP :   '*' ; DIV_OP :   '/' ; ADD_OP :   '+' ; SUB_OP :   '-' ; MUL_OP :   '*' ; DIV_OP :   '/' ; ADD_OP :   '+' ; SUB_OP :   '-' ; EQ_OP : '=' ; NE_OP : '<>' ; LT_OP : '<' ; LE_OP : '<=' ; GT_OP : '>' ; GE_OP : '>=' ; NEWLINE : '\r'? '\n' -> skip  ; WS      : [ \t]+ -> skip ; Pcl.g4

  16. Pcl Syntax Diagrams

  17. Pcl Syntax Diagrams, cont’d

  18. Pcl Syntax Diagrams, cont’d

  19. Pcl Syntax Diagrams, cont’d

  20. Pcl Syntax Diagrams, cont’d

  21. Assignment #6 • Write the first draft of the ANTLR 4 grammar file for your source language. • Use the Eclipse ANTLR plugin. • Generate a syntax diagram from the grammar. • Generate a parse tree from the source program. • Generate the parser and lexer. • For the External Tool Configuration, use: • Compile a sample source program. • Due: Friday, November 1. -no-listener -visitor -encoding UTF-8 -Dlanguage=Cpp

  22. Starter Main Program for Assignment #6 #include <iostream> #include <fstream> #include "antlr4-runtime.h" #include "PclLexer.h" #include "PclParser.h" #include "PclBaseVisitor.h" using namespace std; using namespace antlrcpp; using namespace antlr4; int main(intargc, const char *args[]) { ifstream ins; ins.open(args[1]); ANTLRInputStream input(ins); PclLexer lexer(&input); CommonTokenStream tokens(&lexer); PclParser parser(&tokens);     tree::ParseTree *tree = parser.program(); PclBaseVisitor compiler; compiler.visit(tree);     return 0; }

More Related