1 / 24

CS 3304 Comparative Languages

CS 3304 Comparative Languages. Lecture 8a: Using ANTLR 9 February 2012. Introduction. Discuss how to use ANTLR. Some material is taken from: A book by Terence Parr, “ The Definitive ANTLR Reference: Building Domain-Specific Languages ” http://www.pragprog.com/titles/tpantlr/

gshoemaker
Download Presentation

CS 3304 Comparative Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 3304Comparative Languages • Lecture 8a:Using ANTLR • 9 February 2012

  2. Introduction • Discuss how to use ANTLR. • Some material is taken from: • A book by Terence Parr, “The Definitive ANTLR Reference: Building Domain-Specific Languages”http://www.pragprog.com/titles/tpantlr/ • An article by R. Mark Volkmann, “ANTLR 3”http://jnb.ociweb.com/jnb/jnbJun2008.htm

  3. Why ANTLR? • ANTLR is a parser generator used to implement language interpreters, compilers, and other translators. • Most often used to build translators and interpreters for domain-specific languages (DSLs). • DSLs are usually very high-level languages used for specific tasks and particularly effective in a specific domain. • DSLs provide a more natural, high-fidelity, robust, and maintainable means of encoding a problem compared to a general-purpose language.

  4. Definitions • Lexer: converts a stream of characters to a stream of tokens (ANTLR token objects know their start/stop character stream index, line number, index within the line, and more). • Parser: processes a stream of tokens, possibly creating an AST. • Abstract Syntax Tree (AST): an intermediate tree representation of the parsed input that is simpler to process than the stream of tokens and can be efficiently processed multiple times. • Tree Parser: processes an AST. • StringTemplate: a library that supports using templates with placeholders for outputting text (ex. Java source code).

  5. Overall ANTLR Flow • A translator maps each input sentence of a language to an output sentence. • The overall translation problem consists of smaller problems mapped to well-defined translation phases (lexing, parsing, and tree parsing). • The communication between phases uses well-defined data types and structures (characters, tokens, trees, and ancillary structures). • Often the translation requires multiple passes so an intermediate form is needed to pass the input between phases. • Abstract Syntax Tree (AST) is a highly processed, condensed version of the input.

  6. How to Write ANTLR Grammar I • Write the grammar using one or more files. A common approach is to use three grammar files, each focusing on a specific aspect of the processing: • The first is the lexer grammar, which creates tokens from text input. • The second is the parser grammar, which creates an AST from. • tokens • The third is the tree parser grammar, which processes an AST.

  7. How To Write ANTLR Grammar II • This results in three relatively simple grammar files as opposed to one complex grammar file. • Optionally write StringTemplate templates for producing output. • Debug the grammar using ANTLRWorks. • Generate classes from the grammar. These validate that text input conforms to the grammar and execute target language “actions” specified in the grammar. • Write an application that uses the the generated classes.

  8. ANTLR Grammar: Program.g grammar Program; program: statement+ ; statement: expression NEWLINE | ID '=' expression NEWLINE | NEWLINE ; expression: multiplicationExpression (('+'|'-') multiplicationExpression)* ; multiplicationExpression: atom ('*' atom)* ; atom: INT | ID | '(' expression ')' ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t')+ {skip();} ;

  9. Using ANTLR • From Generate Menu select Generate Code menu item. • In gencode subdirectory three files are generated: • Program.tokens: The list of token-name, token-type assignments • ProgramLexer.java: The lexer (scanner) generated from Program.g. • ProgramParser.java: The parser generated from Program.g. • Create a tester class (with main), e.g. RunProgram.java. • Compile and run:javacRunProgram.javaProgramParser.javaProgramLexer.javajava RunProgram • Make sure that the ANTLR jar file is in your class path or included in your Java installation. • ProgramEvaluation.g adds evaluation statement (in Java) to Program.g (attribute grammar).

  10. Main Class: RunProgram.java import org.antlr.runtime.*; public class RunProgram { public static void main(String[] args) throws Exception { ProgramParser parser = new ProgramParser( new CommonTokenStream( new ProgramLexer( new ANTLRInputStream(System.in) ) ) ); parser.program(); } }

  11. Evaluate: ProgramEvaluation.g I grammar ProgramEvaluation; @header { import java.util.HashMap; } @members { HashMapsymbolTable = new HashMap(); } program: statement+ ; statement: expression NEWLINE {System.out.println($expression.value);} | ID '=' expression NEWLINE {symbolTable.put($ID.text, new Integer($expression.value));} | NEWLINE ;

  12. Evaluate: ProgramEvaluation.g II expressionreturns [intvalue] : e=multiplicationExpression {$value = $e.value;} ('+' e=multiplicationExpression {$value += $e.value;} | '-' e=multiplicationExpression {$value -= $e.value;} )* ; multiplicationExpressionreturns [intvalue] : e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;} )* ;

  13. Evaluate: ProgramEvaluation.g III atom returns [int value] : INT {$value = Integer.parseInt($INT.text);} | ID {Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); else System.err.println("undefined variable "+$ID.text); } | '(' expression ')' {$value = $expression.value;} ; ID: ('a'..'z'|'A'..'Z')+ ; INT: '0'..'9'+ ; NEWLINE: '\r'? '\n' ; WS: (' '|'\t'|'\n'|'\r')+ {skip();} ;

  14. Tree Grammar • A parse tree, which represents the sequence of rule invocations used to match an input stream. • Abstract Syntax Tree (AST) is an intermediate representation, a tree of some flavor and records not only the input symbols but also the relationship between those symbols as dictated by the grammatical structure. • All nodes in the AST are input symbol nodes. • Example: 3 + 4 * 5

  15. Building AST with Grammars • Add AST construction rules to the parser grammar that indicate what tree shape you want to build. • ANTLR will build a tree node for every token matched on the input stream:options { output=AST; ASTLabelType=CommonTree;} • To specify a tree structure, simply indicate which tokens should be considered operators (subtree roots): • ! which tokens should be excluded from the tree. • ^ which tokens should be considered operators (subtree roots). • Rewrite rules: -> • Tree rewrite syntax.

  16. Rewrite Rules • The rewrite rule makes a tree with the operator at the root and the identifier as the first and only child:statement:expression NEWLINE -> expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE -> ; • Symbol -> begins each rewrite rule. • Rewrite rules for AST construction are parser-grammar-to-tree-grammar mappings. • When an error occurs within a rule, ANTLR catches the exception, reports the error, attempts to recover (possibly by consuming more tokens), and then returns from the rule.

  17. Tree Parser: ProgramTree.g I grammar ProgramTree; options { output=AST; ASTLabelType=CommonTree; } program: ( statement {System.out.println($statement.tree.toStringTree());} )+ ; statement: expression NEWLINE -> expression | ID '=' expression NEWLINE -> ^('=' ID expression) | NEWLINE -> ; expression: multiplicationExpression (('+'^|'-'^) multiplicationExpression)* ;

  18. Tree Parser: ProgramTree.g II expression: multiplicationExpression(('+'^|'-'^) multiplicationExpression)* ; multiplicationExpression: atom ('*'^ atom)* ; atom: INT | ID | '('! expression ')'! ; ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; NEWLINE:'\r'? '\n' ; WS : (' '|'\t')+ {skip();} ;

  19. Building Tree Grammars • The ANTLR notation for a tree grammar is identical to the notation for a regular grammar except for the introduction of a two-dimensional tree construct. • Make a tree grammar by cutting and pasting from the parser grammar by removing recognition grammar elements to the left of the -> operator, leaving the AST rewrite fragments. • ANTLR grammar file:tree grammar ProgramWalker;options {tokenVocab=ProgramTree;ASTLabelType=CommonTree;}

  20. Tree Parser: ProgramWalker.g I treegrammarProgramWalker; options { tokenVocab=ProgramTree; ASTLabelType=CommonTree; } @header { import java.util.HashMap; } @members { HashMapsymbolTable = new HashMap(); } program: statement+ ;

  21. Tree Parser: ProgramWalker.g II statement: expression {System.out.println($expression.value);} | ^('=' ID expression) {symbolTable.put($ID.text, new Integer($expression.value));} ; expression returns [int value] : ^('+' a=expression b=expression) {$value = a+b;} | ^('-' a=expression b=expression) {$value = a-b;} | ^('*' a=expression b=expression) {$value = a*b;} | ID { Integer v = (Integer)symbolTable.get($ID.text); if ( v!=null ) $value = v.intValue(); elseSystem.err.println("undefined variable "+$ID.text); } | INT {$value = Integer.parseInt($INT.text);} ;

  22. Main Program import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class RunProgramWalker { public static void main(String[] args) throws Exception { ProgramTreeParserparser = new ProgramTreeParser( new CommonTokenStream( new ProgramTreeLexer( new ANTLRInputStream(System.in)))); ProgramTreeParser.program_return r = parser.program(); ProgramWalker walker = new ProgramWalker( new CommonTreeNodeStream((CommonTree)r.getTree())); walker.program(); } }

  23. Program Output x = 4 y = 3 z = x – y z w = z * (x + y) - 10 w (= x 4) (= y 3) (= z (- x y)) z (= w (- (* z (+ x y)) 10)) w 1 -3

  24. Summary • ANTLR is a free, open source parser generator tool • ANTLR supports infinite lookahead for selecting the rule alternative that matches the portion of the input stream being evaluated, i.e. ANTLR supports LL(*). • Check online documentation at:http://www.antlr.org/http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home

More Related