E N D
1. 1 ANTLR v3 Overview(for ANTLR v2 users) Terence Parr
University of San Francisco
2. 2
3. 3 Block Info Flow Diagram
4. 4 Grammar Syntax
5. 5 Grammar improvements Single element EBNF like ID*
Combined parser/lexer
Allows ‘c’ and “literal” literals
Multiple parameters, return values
Labels do not have to be unique(x=ID|x=INT) {…$x…}
For combined grammars, warns when tokens are not defined
6. 6 Example Grammar
7. 7 Using the parser
8. 8 Improved grammar warnings they happen less often ;)
internationalized (templates again!)
gives (smallest) sample input sequence
better recursion warnings
9. 9 Recursion Warnings
10. 10 Nondeterminisms t.g:2:5: Decision can match input such as "A B" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
t.g:2:5: The following alternatives are unreachable: 2
11. 11 Runtime Objects of Interest Lexer passes all tokens to the parser, but parser listens to only a single “channel”; channel 99, for example, where I place WS tokens, is ignored
Tokens have start/stop index into single text input buffer
Token is an abstract class
TokenSourceanything answering nextToken()
TokenStreamstream pulling from TokenSource; LT(i), …
CharStreamsource of characters for a lexer; LT(i), …
12. 12 Error Recovery ANTLR v3 does what Josef Grosch does in Cocktail
Does single token insertion or deletion if necessary to keep going
Computes context-sensitive FOLLOW to do insert/delete
proper context is passed to each rule invocation
knows precisely what can follow reference to r rather than what could follow any reference to r (per Wirth circa 1970)
13. 13 Example Error Recovery
14. 14 Attributes New label syntax and multiple return values
Unified token, rule, parameter, return value, tree reference syntax in actions
Dynamically scope attributes!
15. 15 Label properties Token label reference properties
text, type, line, pos, channel, index, tree
Rule label reference properties
start, stop; indices of token boundaries
tree
text; text matched for whole rule
16. 16 Rule Scope Attributes A rule may define a scope of attributes visible to any invoked rule; operates like a stacked global variable
Avoids having to pass a value down
17. 17 Global Scope Attributes Named scopes; rules must explicitly request access
18. 18 Tree Support TreeAdaptor; How to create and navigate trees (like ASTFactory from v2); ANTLR assumes tree nodes are Object type
Tree; used by support code
BaseTree; List of children, w/o payload (no more child-sibling trees)
CommonTree; node wrapping Token as payload
ParseTree; used by interpreter to build trees
19. 19 Tree Construction Automatic mechanism is same as v2 except ^ is now ^^expr : atom ( '+'^^ atom )* ;
^ implies root of tree for enclosing subrulea : ( ID^ INT )* ; builds (a 1) (b 2) …
Token labels are $label not #label and rule invocation tree results are $ruleLabel.tree
Turn onoptions {output=AST;}(one can imagine output=text for templates)
Option: ASTLabelType=CommonTree;
20. 20 Tree Rewrite Rules Maps an input grammar fragment to an output tree grammar fragment
21. 21 Mixed Rewrite/Auto Trees Alternatives w/o -> rewrite use automatic mechanism
22. 22 Rewrites and labels Disambiguates element references or used to construct imaginary nodes
Concatenation += labels useful too:
23. 23 Loops in Rewrites Repeated elementID ID -> ^(VARS ID+)yields ^(VARS a b)
Repeated treeID ID -> ^(VARS ID)+yields ^(VARS a) ^(VARS b)
Multiple elements in loop need same size ID INT ID INT -> ^( R ID ^( S INT) )+yields(R a (S 1)) (R b (S 2))
Checks cardinality + and * loops
24. 24 Preventing cyclic structures Repeated elements get duplicateda : INT -> INT INT ; // dups INT!a : INT INT -> INT+ INT+ ; // 4 INTs!
Repeated rule references get duplicateda : atom -> ^(atom atom) ; // no cycle!
Duplicates whole tree for all but first ref to an element; here 2nd ref to atom results in a duplicated atom tree
*Useful example “int x,y” -> “^(int x) ^(int y)”decl : type ID (‘,’ ID)* -> ^(type ID)+ ;
25. 25 Predicated rewrites Use semantic predicate to indicate which rewrite to choose from
26. 26 Misc Rewrite Elements Arbitrary actionsa : atom -> ^({adaptor.createToken(INT,"9")} atom) ;
rewrite always sets the rule’s AST not subrule’s
Reference to previous value (useful?)
27. 27 Tree Grammars Syntax same as parser grammars, add^(root children…) tree element
Uses LL(*) also; even derives from same superclass! Tree is serialized to include DOWN, UP imaginary tokens to encode 2D structure for serial parser
28. 28 Code Generation Uses StringTemplate to specify how each abstract ANTLR concept maps to code; wildly successful!
Separates code gen logic from output; not a single character of output in the Java code
Java.stg: 140 templates, 1300 lines
29. 29 Sample code gen templates
30. 30 Internationalization ANTLR v3 uses StringTemplate to display all errors
Senses locale to load messages;en.stg: 76 templates
ErrorManager error number constants map to a template name; e.g.,
31. 31 Runtime Support Better organized, separated:org.antlr.runtimeorg.antlr.runtime.treeorg.antlr.runtime.debug
Clean; Parser has input ptr only (except error recovery FOLLOW stack); Lexer also only has input ptr
4500 lines of Java code minus BSD header
32. 32 Summary v3 kicks ass
it sort of works!
http://www.antlr.org/download/…
ANTLRWorks progressing in parallel