220 likes | 245 Views
Learn how to use the ANTLR 4 plugin in Eclipse for automated syntax diagram generation, error handling enhancements, and resolving ambiguities in compiler projects. Comprehensive guide with examples.
E N D
CMPE 152: Compiler DesignOctober 22 Class Meeting Department of Computer EngineeringSan Jose State UniversityFall 2019Instructor: Ron Mak www.cs.sjsu.edu/~mak
ANTLR 4 Review • Feed ANTLR a .g4grammar file. • ANTLR generates (in Java or C++): • a parser • a lexer (scanner) • parse tree utilities • Therefore, for your compiler projects, you don’t have to write that code. • You must have a correct grammar file.
ANTLR 4 Plugin for Eclipse • If you use the ANTLR 4 plugin, Eclipse will automatically generate a syntax diagramfrom the grammar. • See the tutorials at: http://www.cs.sjsu.edu/~mak/tutorials/index.html • Especially: http://www.cs.sjsu.edu/~mak/tutorials/InstallANTLR4Cpp.pdf • The plugin will generate a parse tree from a source program, according to the grammar.
ANTLR 4 Plugin for Eclipse, cont’d • Do an External Tools Configuration to specify-Dlanguage="Cpp" to generate a parser and a lexer written in C++ • Otherwise, the default is Java. • Create a standard Eclipse C++ project and put the grammar file in it. • Right-click the grammar file and select Run As Generate ANTLR Recognizer. • Eclipse can also generate automatically if the grammar file changes.
ANTLR Workflow The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012
Syntax Error Handling • An ANTLR-generated parser has basic syntax error handling and recovery. • You can improve the error handling. 193 a = 5 b = 6 (a+b*2 (1+2)*3 Parsetree (Lispformat): (prog (stat (expr 193) \n) (stata = (expr 5) \n) (statb = (expr 6) \n) (stat (expr ( (expr (expra) + (expr (exprb) * (expr 2))) <missing ')'>) \n) (stat (expr (expr ( (expr (expr 1) + (expr 2)) )) * (expr 3)) \n)) line 4:6 missing ')' at '\n'
Resolving Ambiguities • Is f() a function call as a standalone statement, or a function call in an expression? stat: expr ';' | ID '(' ')' ';' ; expr: ID '(' ')' | INT ; The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012
Resolving Ambiguities, cont’d • Is begin a reserved word or an identifier? • ANTLR resolves an ambiguity by choosing the first alternative in the grammar. BEGIN : 'begin' ; ID : [a-z]+ ;
ANTLR Parse Trees • A token stream is the “pipe” between the lexer and the parser. • Each token object records the start and stop character indexes into the character stream. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012
ANTLR Parse Trees, cont’d • ANTLR generates a RuleNode subclass for each grammar rule. • They are called context objectsbecause they record everything about the recognition phaseof a rule. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012
ANTLR Parse Trees, cont’d • The ANTLR-generated parser has corresponding parse tree node class names. The Definitive ANTLR 4 Reference by Terence Parr The Pragmatic Programmers, 2012
ANTLR Pcl sample.pas • Pcl, a tiny subset of Pascal. • Use ANTLR to generate a Pcl parser and lexer and integrate them with our Pascal interpreter’s symbol table code. • ANTLR doesn’t do symbol tables • Parse a Pcl program and print the symbol table. • Sample program sample.pas: PROGRAM sample; VAR i, j : integer; alpha, beta5x : real; BEGIN REPEAT j := 3; i := 2 + 3*j UNTIL i >= j + 2; IF i <= j THEN i := j; IF j > i THEN i := 3*j ELSE BEGIN alpha := 9; beta5x := alpha/3 - alpha*2; END END.
Pcl.g4 grammar Pcl; // A tiny subset of Pascal program : header block '.' ; header : PROGRAM IDENTIFIER ';' ; block : declarations compound_stmt ; declarations : VAR decl_list ';' ; decl_list : decl ( ';' decl )* ; decl : var_list ':' type_id ; var_list : var_id ( ',' var_id )* ; var_id : IDENTIFIER ; type_id : IDENTIFIER ; compound_stmt : BEGIN stmt_list END ; stmt : compound_stmt # compoundStmt | assignment_stmt # assignmentStmt | repeat_stmt # repeatStmt | if_stmt # ifStmt | # emptyStmt ; Pcl.g4
Pcl.g4, cont’d stmt_list : stmt ( ';' stmt )* ; assignment_stmt : variable ':=' expr ; repeat_stmt : REPEAT stmt_list UNTIL expr ; if_stmt : IF expr THEN stmt ( ELSE stmt )? ; variable : IDENTIFIER ; expr : expr mul_div_op expr # mulDivExpr | expr add_sub_op expr # addSubExpr | expr rel_op expr # relExpr | number # numberConst | IDENTIFIER # identifier | '(' expr ')' # parens ; number : sign? INTEGER ; sign : '+' | '-' ; mul_div_op : MUL_OP | DIV_OP ; add_sub_op : ADD_OP | SUB_OP ; rel_op : EQ_OP | NE_OP | LT_OP | LE_OP | GT_OP | GE_OP ; Pcl.g4
Pcl.g4, cont’d PROGRAM : 'PROGRAM' ; BEGIN : 'BEGIN' ; END : 'END' ; VAR : 'VAR' ; REPEAT : 'REPEAT' ; UNTIL : 'UNTIL' ; IF : 'IF' ; THEN : 'THEN' ; ELSE : 'ELSE'; IDENTIFIER : [a-zA-Z][a-zA-Z0-9]* ; INTEGER : [0-9]+ ; MUL_OP : '*' ; DIV_OP : '/' ; ADD_OP : '+' ; SUB_OP : '-' ; MUL_OP : '*' ; DIV_OP : '/' ; ADD_OP : '+' ; SUB_OP : '-' ; EQ_OP : '=' ; NE_OP : '<>' ; LT_OP : '<' ; LE_OP : '<=' ; GT_OP : '>' ; GE_OP : '>=' ; NEWLINE : '\r'? '\n' -> skip ; WS : [ \t]+ -> skip ; Pcl.g4
Assignment #6 • Write the first draft of the ANTLR 4 grammar file for your source language. • Use the Eclipse ANTLR plugin. • Generate a syntax diagram from the grammar. • Generate a parse tree from the source program. • Generate the parser and lexer. • For the External Tool Configuration, use: • Compile a sample source program. • Due: Friday, November 1. -no-listener -visitor -encoding UTF-8 -Dlanguage=Cpp
Starter Main Program for Assignment #6 #include <iostream> #include <fstream> #include "antlr4-runtime.h" #include "PclLexer.h" #include "PclParser.h" #include "PclBaseVisitor.h" using namespace std; using namespace antlrcpp; using namespace antlr4; int main(intargc, const char *args[]) { ifstream ins; ins.open(args[1]); ANTLRInputStream input(ins); PclLexer lexer(&input); CommonTokenStream tokens(&lexer); PclParser parser(&tokens); tree::ParseTree *tree = parser.program(); PclBaseVisitor compiler; compiler.visit(tree); return 0; }