250 likes | 538 Views
Grammar Variation in Compiler Design. Carl Wu. Three topics. Syntax Grammar vs. AST Component(?)-based grammar Aspect-oriented grammar. Grammar vs. AST (I). How to automatically generate a tree from a grammar?. Grammar vs. AST (I). Stmt ::= Block | “if” Expr “then” Stmt
E N D
Three topics • Syntax Grammar vs. AST • Component(?)-based grammar • Aspect-oriented grammar
Grammar vs. AST (I) How to automatically generate a tree from a grammar?
Grammar vs. AST (I) Stmt ::= Block | “if” Expr “then” Stmt | IdUse “:=” Exp
Grammar vs. AST (I) Stmt ::= Block | “if” Exp “then” Stmt | IdUse “:=” Exp JastAdd Specification (Tree) abstract Stmt; BlockStmt : Stmt ::= Block; IfStmt : Stmt ::= Exp Stmt; AssignStmt : Stmt ::= IdUse Exp;
Grammar vs. AST (I) Restricted CFG Definition A ::= B C D √ => aggregation A ::= B | C | D √ => inheritance A ::= B C | D ×
Grammar vs. AST (I) RCFG Specification Stmt :: Block | IfStmt | AssignStmt IfStmt :: “if” Exp “then” Stmt AssignStmt :: IdUse “:=” Exp
Grammar vs. AST (II) Parse tree vs. IR tree
Grammar vs. AST (II) • In an IDE, there are multiple visitors for the same source code (>12 !). • Different requirement for the tree structure: • Syntax vs. semantics • Immutable vs. transformable (optimization) • Parse tree vs. IR tree
Grammar vs. AST (II) • Generate two tree structures from the same grammar! • One immutable, strong-typed, concrete parse tree – Read only! • One transferable, untyped, abstract IR tree – Read and write!
Grammar vs. AST (II) IfStmt :: “if” Exp “then” Stmt Class ASTNode{ protected ASTNode[] children; } class IfStmt extends ASTNode{ final protected Token token_if, Exp exp, Token token_then, Stmt stmt; IfStmt(Token token_if, Exp exp, Token token_then, Stmt stmt){ // parse tree construction this.token_if = token_if; this.exp = exp; this.token_then = token_then; this.stmt = stmt; // IR tree construction children[0] = exp; children[1] = stmt; } }
Component vs. module • What is the different between a component and a module? • What is a modularized grammar? • What is an ideal component-based grammar?
Component vs. module Grammar Module Grammar Module Grammar Component Grammar Component Grammar Parser Parser Parser Modularized grammar Component-based grammar
Benefits • Benefits from modularized grammar • Easy to read, write, change • Eliminate naming conflicts • Additional benefits brought from component-based grammar • Each component can be designed, developed and tested individually. • Any change to certain component does not require compiling all the other components. • Different type of grammars/parsing algorithms can be used for different component, e.g., one component can be LL, one can be LALR.
Difficulty in designing component-based grammar • No clear guards between two components. • Switch the control to a new parser or stay in the same? • Suitable for embed languages, e.g., Jscript in Html • Not suitable for an integral language, e.g., Java • Two much coupling between two components. • Not just reuse the component as a whole, may also reuse the internal productions and symbols. • Not applicable for LR parsers, once the table is built, you can’t reuse the internal productions (no way to jump into a table).
Aspect-oriented grammar • Join-point: grammar patterns that crosscut multiple productions • Punctuations, identifiers, modifiers…
Example • ";“ appears 25 times in one of the Java grammars • “.” appears 74 times in one of the Cobol grammars • Every one of them should be carefully placed!
<Sentence> ::= <Accept Stm> '.' | <Add Stm> '.' | <Add Stm Ex> <End-Add Opt> '.' | <Call Stm> '.' | <Call Stm Ex> <End-Call Opt> '.' | <Close Stm> '.' | <Compute Stm> '.' | <Compute Stm Ex> <End-Compute Opt> '.' | <Display Stm> '.' | <Divide Stm> '.' | <Divide Stm Ex> <End-Divide Opt> '.' | <Evaluate Stm> <End-Evaluate Opt> '.' | <If Stm> <End-If Opt>'.' | <Move Stm> '.' | <Move Stm Ex> <End-Move Opt> '.' | <Multiply Stm>'.' | <Multiply Stm Ex> <End-Multiply Opt> '.' | <Open Stm> '.' | <Perform Stm> '.' | <Perform Stm Ex> <End-Perform Opt> '.' | <Read Stm> '.' | <Read Stm Ex> <End-Read Opt> '.' | <Release Stm> '.' | <Rewrite Stm> '.' | <Rewrite Stm Ex> <End-Rewrite Opt> '.' | <Set Stm> '.' | <Start Stm> '.' | <Start Stm Ex> <End-Start Opt> '.' | <String Stm> '.' | <String Stm Ex> <End-String Opt> '.' | <Subtract Stm>'.' | <Subtract Stm Ex> <End-Substract Opt> '.' | <Write Stm> '.' | <Write Stm Ex> <End-Write Opt> '.' | <Unstring Stm>'.' | <Unstring Stm Ex> <End-Unstring Opt> '.' | <Misc Stm> '.' pointcut PreDot(): <Sentence>; after PreDot(): ‘.'
Another example pointcut Content(): … … before Content(): “(”; after Content(): “)”; Guarantee they match!
Grammar weaving Base Grammar Grammar Aspect Result grammar Parser