Grammar Variation in Compiler Design

Grammar Variation in Compiler Design Carl Wu

Three topics • Syntax Grammar vs. AST • Component(?)-based grammar • Aspect-oriented grammar

Grammar vs. AST (I) How to automatically generate a tree from a grammar?

Grammar vs. AST (I) Stmt ::= Block | “if” Expr “then” Stmt | IdUse “:=” Exp

Grammar vs. AST (I) Stmt ::= Block | “if” Exp “then” Stmt | IdUse “:=” Exp JastAdd Specification (Tree) abstract Stmt; BlockStmt : Stmt ::= Block; IfStmt : Stmt ::= Exp Stmt; AssignStmt : Stmt ::= IdUse Exp;

Grammar vs. AST (I) Restricted CFG Definition A ::= B C D √ => aggregation A ::= B | C | D √ => inheritance A ::= B C | D ×

Grammar vs. AST (I) RCFG Specification Stmt :: Block | IfStmt | AssignStmt IfStmt :: “if” Exp “then” Stmt AssignStmt :: IdUse “:=” Exp

Grammar vs. AST (II) Parse tree vs. IR tree

Grammar vs. AST (II) • In an IDE, there are multiple visitors for the same source code (>12 !). • Different requirement for the tree structure: • Syntax vs. semantics • Immutable vs. transformable (optimization) • Parse tree vs. IR tree

Grammar vs. AST (II) • Generate two tree structures from the same grammar! • One immutable, strong-typed, concrete parse tree – Read only! • One transferable, untyped, abstract IR tree – Read and write!

Grammar vs. AST (II) IfStmt :: “if” Exp “then” Stmt Class ASTNode{ protected ASTNode[] children; } class IfStmt extends ASTNode{ final protected Token token_if, Exp exp, Token token_then, Stmt stmt; IfStmt(Token token_if, Exp exp, Token token_then, Stmt stmt){ // parse tree construction this.token_if = token_if; this.exp = exp; this.token_then = token_then; this.stmt = stmt; // IR tree construction children[0] = exp; children[1] = stmt; } }

Component(?)-based grammar

Component vs. module • What is the different between a component and a module? • What is a modularized grammar? • What is an ideal component-based grammar?

Component vs. module Grammar Module Grammar Module Grammar Component Grammar Component Grammar Parser Parser Parser Modularized grammar Component-based grammar

Benefits • Benefits from modularized grammar • Easy to read, write, change • Eliminate naming conflicts • Additional benefits brought from component-based grammar • Each component can be designed, developed and tested individually. • Any change to certain component does not require compiling all the other components. • Different type of grammars/parsing algorithms can be used for different component, e.g., one component can be LL, one can be LALR.

Difficulty in designing component-based grammar • No clear guards between two components. • Switch the control to a new parser or stay in the same? • Suitable for embed languages, e.g., Jscript in Html • Not suitable for an integral language, e.g., Java • Two much coupling between two components. • Not just reuse the component as a whole, may also reuse the internal productions and symbols. • Not applicable for LR parsers, once the table is built, you can’t reuse the internal productions (no way to jump into a table).

Ideal vs. reality

Suggestions?

Aspect-oriented grammar

Aspect-oriented grammar • Join-point: grammar patterns that crosscut multiple productions • Punctuations, identifiers, modifiers…

Example • ";“ appears 25 times in one of the Java grammars • “.” appears 74 times in one of the Cobol grammars • Every one of them should be carefully placed!

Another example pointcut Content(): … … before Content(): “(”; after Content(): “)”; Guarantee they match!

Grammar weaving Base Grammar Grammar Aspect Result grammar Parser

What do you think?

Grammar Variation in Compiler Design

Grammar Variation in Compiler Design

Presentation Transcript

Compiler Design

Compiler Design

Compiler Design

Compiler Design

Compiler Design

Compiler Design

___________________________________________ COMPILER DESIGN

Compiler design

Compiler Design

Compiler design

Compiler design

Compiler design

Compiler design

Compiler design

Compiler design

Compiler design

Compiler design