240 likes | 273 Views
CS 3304 Comparative Languages. Lecture 8a: Using ANTLR 9 February 2012. Introduction. Discuss how to use ANTLR. Some material is taken from: A book by Terence Parr, “ The Definitive ANTLR Reference: Building Domain-Specific Languages ” http://www.pragprog.com/titles/tpantlr/
E N D
CS 3304Comparative Languages • Lecture 8a:Using ANTLR • 9 February 2012
Introduction • Discuss how to use ANTLR. • Some material is taken from: • A book by Terence Parr, “The Definitive ANTLR Reference: Building Domain-Specific Languages”http://www.pragprog.com/titles/tpantlr/ • An article by R. Mark Volkmann, “ANTLR 3”http://jnb.ociweb.com/jnb/jnbJun2008.htm
Why ANTLR? • ANTLR is a parser generator used to implement language interpreters, compilers, and other translators. • Most often used to build translators and interpreters for domain-specific languages (DSLs). • DSLs are usually very high-level languages used for specific tasks and particularly effective in a specific domain. • DSLs provide a more natural, high-fidelity, robust, and maintainable means of encoding a problem compared to a general-purpose language.
Definitions • Lexer: converts a stream of characters to a stream of tokens (ANTLR token objects know their start/stop character stream index, line number, index within the line, and more). • Parser: processes a stream of tokens, possibly creating an AST. • Abstract Syntax Tree (AST): an intermediate tree representation of the parsed input that is simpler to process than the stream of tokens and can be efficiently processed multiple times. • Tree Parser: processes an AST. • StringTemplate: a library that supports using templates with placeholders for outputting text (ex. Java source code).
Overall ANTLR Flow • A translator maps each input sentence of a language to an output sentence. • The overall translation problem consists of smaller problems mapped to well-defined translation phases (lexing, parsing, and tree parsing). • The communication between phases uses well-defined data types and structures (characters, tokens, trees, and ancillary structures). • Often the translation requires multiple passes so an intermediate form is needed to pass the input between phases. • Abstract Syntax Tree (AST) is a highly processed, condensed version of the input.
How to Write ANTLR Grammar I • Write the grammar using one or more files. A common approach is to use three grammar files, each focusing on a specific aspect of the processing: • The first is the lexer grammar, which creates tokens from text input. • The second is the parser grammar, which creates an AST from. • tokens • The third is the tree parser grammar, which processes an AST.
How To Write ANTLR Grammar II • This results in three relatively simple grammar files as opposed to one complex grammar file. • Optionally write StringTemplate templates for producing output. • Debug the grammar using ANTLRWorks. • Generate classes from the grammar. These validate that text input conforms to the grammar and execute target language “actions” specified in the grammar. • Write an application that uses the the generated classes.
ANTLR Grammar: Program.g grammar Program; program: statement+ ; statement: expression NEWLINE | ID '=' expression NEWLINE | NEWLINE ; expression: multiplicationExpression (('+'|'-') multiplicationExpression)* ; multiplicationExpression: atom ('*' atom)* ; atom: INT | ID | '(' expression ')' ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t')+ {skip();} ;
Using ANTLR • From Generate Menu select Generate Code menu item. • In gencode subdirectory three files are generated: • Program.tokens: The list of token-name, token-type assignments • ProgramLexer.java: The lexer (scanner) generated from Program.g. • ProgramParser.java: The parser generated from Program.g. • Create a tester class (with main), e.g. RunProgram.java. • Compile and run:javacRunProgram.javaProgramParser.javaProgramLexer.javajava RunProgram • Make sure that the ANTLR jar file is in your class path or included in your Java installation. • ProgramEvaluation.g adds evaluation statement (in Java) to Program.g (attribute grammar).
Main Class: RunProgram.java import org.antlr.runtime.*; public class RunProgram { public static void main(String[] args) throws Exception { ProgramParser parser = new ProgramParser( new CommonTokenStream( new ProgramLexer( new ANTLRInputStream(System.in) ) ) ); parser.program(); } }
Evaluate: ProgramEvaluation.g I grammar ProgramEvaluation; @header { import java.util.HashMap; } @members { HashMapsymbolTable = new HashMap(); } program: statement+ ; statement: expression NEWLINE {System.out.println($expression.value);} | ID '=' expression NEWLINE {symbolTable.put($ID.text, new Integer($expression.value));} | NEWLINE ;
Evaluate: ProgramEvaluation.g II expressionreturns [intvalue] : e=multiplicationExpression {$value = $e.value;} ('+' e=multiplicationExpression {$value += $e.value;} | '-' e=multiplicationExpression {$value -= $e.value;} )* ; multiplicationExpressionreturns [intvalue] : e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;} )* ;
Evaluate: ProgramEvaluation.g III atom returns [int value] : INT {$value = Integer.parseInt($INT.text);} | ID {Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | '(' expression ')' {$value = $expression.value;} ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;
Tree Grammar • A parse tree, which represents the sequence of rule invocations used to match an input stream. • Abstract Syntax Tree (AST) is an intermediate representation, a tree of some flavor and records not only the input symbols but also the relationship between those symbols as dictated by the grammatical structure. • All nodes in the AST are input symbol nodes. • Example: 3 + 4 * 5
Building AST with Grammars • Add AST construction rules to the parser grammar that indicate what tree shape you want to build. • ANTLR will build a tree node for every token matched on the input stream:options { output=AST; ASTLabelType=CommonTree;} • To specify a tree structure, simply indicate which tokens should be considered operators (subtree roots): • ! which tokens should be excluded from the tree. • ^ which tokens should be considered operators (subtree roots). • Rewrite rules: -> • Tree rewrite syntax.
Rewrite Rules • The rewrite rule makes a tree with the operator at the root and the identifier as the first and only child:statement:expression NEWLINE -> expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE -> ; • Symbol -> begins each rewrite rule. • Rewrite rules for AST construction are parser-grammar-to-tree-grammar mappings. • When an error occurs within a rule, ANTLR catches the exception, reports the error, attempts to recover (possibly by consuming more tokens), and then returns from the rule.
Tree Parser: ProgramTree.g I grammar ProgramTree; options { output=AST; ASTLabelType=CommonTree; } program: ( statement {System.out.println($statement.tree.toStringTree());} )+ ; statement: expression NEWLINE -> expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE -> ; expression: multiplicationExpression (('+'^|'-'^) multiplicationExpression)* ;
Tree Parser: ProgramTree.g II expression: multiplicationExpression(('+'^|'-'^) multiplicationExpression)* ; multiplicationExpression: atom ('*'^ atom)* ; atom: INT | ID | '('! expression ')'! ; ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; NEWLINE:'\r'? '\n' ; WS : (' '|'\t')+ {skip();} ;
Building Tree Grammars • The ANTLR notation for a tree grammar is identical to the notation for a regular grammar except for the introduction of a two-dimensional tree construct. • Make a tree grammar by cutting and pasting from the parser grammar by removing recognition grammar elements to the left of the -> operator, leaving the AST rewrite fragments. • ANTLR grammar file:tree grammar ProgramWalker;options {tokenVocab=ProgramTree;ASTLabelType=CommonTree;}
Tree Parser: ProgramWalker.g I treegrammarProgramWalker; options { tokenVocab=ProgramTree; ASTLabelType=CommonTree; } @header { import java.util.HashMap; } @members { HashMapsymbolTable = new HashMap(); } program: statement+ ;
Tree Parser: ProgramWalker.g II statement: expression {System.out.println($expression.value);} | ^('=' ID expression) {symbolTable.put($ID.text, new Integer($expression.value));} ; expression returns [int value] : ^('+' a=expression b=expression) {$value = a+b;} | ^('-' a=expression b=expression) {$value = a-b;} | ^('*' a=expression b=expression) {$value = a*b;} | ID { Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); elseSystem.err.println("undefined variable "+$ID.text); } | INT {$value = Integer.parseInt($INT.text);} ;
Main Program import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class RunProgramWalker { public static void main(String[] args) throws Exception { ProgramTreeParserparser = new ProgramTreeParser( new CommonTokenStream( new ProgramTreeLexer( new ANTLRInputStream(System.in)))); ProgramTreeParser.program_return r = parser.program(); ProgramWalker walker = new ProgramWalker( new CommonTreeNodeStream((CommonTree)r.getTree())); walker.program(); } }
Program Output x = 4 y = 3 z = x – y z w = z * (x + y) - 10 w (= x 4) (= y 3) (= z (- x y)) z (= w (- (* z (+ x y)) 10)) w 1 -3
Summary • ANTLR is a free, open source parser generator tool • ANTLR supports infinite lookahead for selecting the rule alternative that matches the portion of the input stream being evaluated, i.e. ANTLR supports LL(*). • Check online documentation at:http://www.antlr.org/http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home