440 likes | 531 Views
Banana Algebra:. Syntactic Language Extension via an Algebra of Languages and Transformations. Jacob Andersen [ jacand@cs.au.dk ] Aarhus University. Claus Brabrand [ brabrand@itu.dk ] IT University of Copenhagen. Abstract.
E N D
Banana Algebra: Syntactic Language Extension via an Algebra of Languages and Transformations Jacob Andersen [ jacand@cs.au.dk ] Aarhus University Claus Brabrand [ brabrand@itu.dk ] IT University of Copenhagen
Abstract We propose an algebra of languages and transformations as a means for extending languages syntactically. The algebra provides a layer of high-level abstractions built on top of languages(captured by CFGs) and transformations(captured by constructive catamorphisms). The algebra is self-contained in that any term of the algebra specifying a transformation can be reduced to a constant catamorphism, before the transformation is run. Thus, the algebra comes "for free" without sacrificing the strong safety and efficiency properties of constructive catamorphisms. The entire algebra as presented in the paper is implemented as the Banana Algebra Tool which may be used to syntactically extend languages in an incremental and modular fashion via algebraic composition of previously defined languages and transformations. We demonstrate and evaluate the tool via several kinds of extensions.
Outline • Introduction: "What is a Banana?" • Bananas for Language Transformation • Language Extension Pattern • Banana Algebra • Examples • Implementation • Related Work • Conclusion
What is a 'Banana' ? • Datatype; "list": • Banana("sum-of-list"): • Separation of recursion and evaluation • Implicit recursion on input structure • bottom-up re-combination of intermediate results list = Num N | Cons N * list listN [Num n] = n [Cons nl] = n + [l] (aka. "Catamorphism" ) (|n.n, (n,l ).n+l|)
Language Transformation • Bananas (statically typed): • Source language: 'LS' • Target language: 'LT' • Nonterminal-typing: '' • Reconstructors: 'c' (| LS -> LT [] c |) LS -> LT list = Num N | Cons N * list tree = Nil | Leaf N | Node N * tree * tree [list -> tree] [Num n]= Leaf n [Cons nl]= Node n (Nil) [l] Type-check'able!
"Growing Languages with Metamorphic Syntax Macros"[ Claus Brabrand | Michael Schwartzbach ] ( PEPM 2002 ) "The metafront System: Safe and Extensible Parsing and Transformation"[ Claus Brabrand | Michael Schwartzbach ] ( LDTA 2003, SCP J. 2007 ) Statically reduce: Banana Algebra (term) Banana (term) Banana Properties • Banana properties: • Simple(corresponds to: “simple recursion”) • Safe(syntactically safe + always terminate) • Efficient(linear time in size of input + output) • (Expressive)(…enough for interesting extensions) • Banana Algebra “for free” (16 banana ops): • Modular • Incremental • Simple • Safe • Efficient • (Expressive)
Outline • Introduction: "What is a Banana?" • Bananas for Language Transformation • Language Extension Pattern • Banana Algebra • Examples • Implementation • Related Work • Conclusion
Language Extension Pattern Numeral extension: Lambda-Calculus: 'LS' 'LT' Exp : var Id : lam Id * Exp : app Exp * Exp : zero : succ Exp : pred Exp Exp : var Id : lam Id * Exp : app Exp * Exp '' Nonterminal typing: [Exp -> Exp] Reconstructors: 'c' [var V] = var V [lam VE] = lam V[E] [app E1E2] = app [E1][E2] [zero] = lam z (var z) [succ E] = lam s [E] [pred E] = app [E] (lam z (var z)) Catamorphism: (| LS -> LT [] c |) Using very simple numeral encoding
Algebraic Solution ln+ll + lnl (| ln -> l[Exp -> Exp] [zero] = lam z (var z) [succ E] = lam s [E] [pred E] = app [E] ... |) ll idx l ln Exp : var Id : lam Id * Exp : app Exp * Exp Exp : zero : succ Exp : pred Exp
Languages (L): l v L\L L+L src( X ) tgt( X ) letv =LinL letxw =XinL Transformations (X): x w X\L X+X XX idx( L ) letv =LinX letxw =XinX Banana Algebra (|L -> L[] c |) { CFG }
Algebraic Laws • Idempotency of '+': • Commutativity of '+': • Associativity of '+': • Source-identity: • … LL + L L1 + L2L2 + L1 L1 + (L2 + L3) (L1 + L2) + L3 Target-identity: Ltgt(idx(L)) Lsrc(idx(L))
Outline • Introduction: "What is a Banana?" • Bananas for Language Transformation • Language Extension Pattern • Banana Algebra • Examples • Implementation • Related Work • Conclusion
Example Revisited --- "ln2l.x" --- letl = "l.l" inletln = "ln.l" in idx(l) + (| ln -> l[Exp -> Exp] Exp.zero = '\z.z' ; Exp.succ = '\s.$1' ; Exp.pred = '($1 \z.z)' ; |) --- "l.l" --- --- "ln.l" --- { Id = [a-z] [a-z0-9]* ; Exp.var : Id ; Exp.lam : "\\" Id "." Exp ; Exp.app : "(" Exp Exp ")" ; } { Exp.zero : "zero" ; Exp.succ : "succ" "(" Exp ")" ; Exp.pred : "pred" "(" Exp ")" ; }
Numerals + Booleans …with Nums & Bools? l+ln+lbl + …with Nums …with Bools lb+ll ln+ll + + ll lbl lnl idx idx l lb ln l
Java + Repeat --- "java.l" --- 575 lines { Java ... "try" Stm "catch" ... Name.id : Id ; } --- "repeat.l" --- { Stm.repeat : "repeat" Stm "until" "(" Exp ")" ";" ; } --- "repeat2java.x" --- letjava = "java.l" inletrepeat = "repeat.l" in idx(java) + (| repeat -> java[Exp -> Exp, Stm -> Stm] Stm.repeat = 'do $1 while (!($2));' ; |) 7 lines !
Concrete vs. Abstract Syntax Concrete syntax: Stm.repeat = 'do $1 while (!($2));' ; Exp (with explicit assoc./prec.): Abstract syntax: Stm.repeat = Stm.do(<1>, Exp.exp1( Exp1.exp2( Exp2.exp3( Exp3.exp4( Exp4.exp5( Exp5.exp6( Exp6.exp7( Exp7.neg( Exp8.par(<2>) ))))))))) ; Exp.or : Exp1 "||" Exp ; .exp1 : Exp1 ; Exp1.and : Exp2 "&&" Exp1 ; .exp2 : Exp2 ; Exp2.add : Exp3 "+" Exp2 ; .exp3 : Exp3 ; Exp7.neg : "!" Exp8 ; .exp8 : Exp8 ; Exp8.par : "(" Exp ")" ; .var : Id ; .num : IntConst ; (unambiguous: concrete abstract) NB: Tool supportsBOTH !
"FUN" Example The "FUN" Language: used for Teaching Functional Programming (at Aarhus University) Fun Basically The Lambda Calculus with…: numerals, booleans, arithmetic, boolean logic, local definitions, pairs, literals,lists, signs, comparisons, dynamic types, fixed-point combinators, … Fun grammar transform Literals Literals→Nums Unsigned arithmetic + booleans + definitions + pairs Nums→λ Bools→λ Defs→λ Pairs→λ + + + Lambda Calculus
"FUN" Example Component re-use Fun Fun + FunSigned Fun grammar transform Fun grammar transform + FunSigned GT Literals Literals→Nums Literals→Nums Signed arith→Nums Unsigned arithmetic + booleans + definitions + pairs Nums→λ Bools→λ Defs→λ Pairs→λ + + + Lambda Calculus
"FUN" Example Fun + FunSigned + FunCompare + FunTypesafe Fun GT + FunSigned GT + FunCompare GT + FunTypesafe GT 245x Banana Algebra ops 4 MB Banana ! Unsigned arithmetic + booleans + definitions + pairs Nums→λ Bools→λ Defs→λ Pairs→λ + + + Lambda Calculus
"FUN" Usage Statistics • Usage statistics (245x operators) in "FUN": • 58x { …cfg… }Constant languages • 51x "file.l"Language inclusions • 28x L + LLanguage additions • 23x vLanguage variables • 17x (|LL[]c|)Constant transformations • 17x X + XTransformation additions • 14x "file.x"Transformation inclusions • 10x let-inLocal definitions • 9x idx(L)Identity transformations • 8x XXCompositions • 4x L \ L Language restriction • 4x wTransformation variables • 2x src(X)Source extractions
Other Examples • Self-Application(The tool on itself!): • SQL embedding(in <bigwig>): • My-Java (endless variations): [L1 << L2] = '(L1 \ L2) + L2' [X1 << X2] = '(X1 \ src(X2)) + X2' Stm.select = 'factor (<2>) { if (<3>) return ( # \+ (<1>) ); }' java ( + sql) ( \ loops) o syntaxe_francais
Implementation The 'Banana Algebra' Tool: (3,600 lines of O'Caml) [ http://www.itu.dk/people/brabrand/banana-algebra/ ] Uses(underlying technologies): 'dk.brics.grammar': for parsing, unparsing, and ambiguity analysis ! 'XSugar': for transformation: "concrete syntax abstract XML syntax" 'XSLT': for transformation: "XML XML"
Outline • Introduction: "What is a Banana?" • Bananas for Language Transformation • Language Extension Pattern • Banana Algebra • Examples • Implementation • Related Work • Conclusion
Related Work (I/III) • Macro Systems: "Growing Languages with Metamorphic Syntax Macros"[ Claus Brabrand | Michael Schwartzbach ] ( PEPM 2002 ) "The metafront System: Safe and Extensible Parsing and Transformation"[ Claus Brabrand | Michael Schwartzbach ] ( LDTA 2003 , SCP J. 2007 )
Both; compared to bananas: More ambitious (expressivity) No termination guarantees (safety) Transformation "indirect" (simplicity) Related Work (II/III) • Attribute Grammars: • Language transformation (and extension)… • …via computation on AST's (using "inherited" or "synthesized" or … attributes) • E.g., Eli, JastAdd, Silver, … • Rewrite Systems: • Language transformation (and extension)… • …via syntactic rewriting, using encodings…: • gradually rewrite "S-syntax" to "T-syntax" • gradually rewrite "S-syntax" to "T-syntax" • E.g., Elan, TXL, ASF+SDF, Stratego/XT, … ST ST
Related Work (III/III) • Functional Programming: • Catas mimicked by "disciplined style" of fun. programming • …aided by: • Traversal functions (auto-synthesized from datatypes) • Combinator libraries • "Shortcut fusion" (to eliminate ' ' at compile-time) • Category Theory: • A lot of this work can be viewed as Category Theory: Basically ye olde issue: GPL vs. DSL
Statically reduce: Banana Algebra (term) Banana (term) Conclusion • IFbananasare sufficiently: • (Expressive) • THEN you get…: • Banana Algebra “for free” (16 banana ops): • Incremental • Modular • Simple • Safe • Efficient "Niche"
BONUS SLIDES - Reduction Semantics - If you want all the details: "Syntactic Language Extension via an Algebra of Languages and Transformations"[ Jacob Andersen | Claus Brabrand ] ( ITU Technical Report, Dec. 2008 )
Reduction Semantics • Environments: • Reduction relations: • Abbreviations: • ...as a short-hand for: • ...as a short-hand for: ENVL = VARLEXPL environment of languages ENVX = VARXEXPX environment of transformations ENVLENVXEXPLEXPL 'L' ENVLENVXEXPXEXPX 'X' ,|- L L l (,,L,l) 'L' ,|- X X x (,,X,x) 'X'
Semantics (L) [CONL] [VARL] l wfl ,lLl ,vL (v) ,LLl ,L'Ll' [RESL] ,L \ L'Lll' l ,LLl ,L'Ll' l~l' [ADDL] l ,L + L'Lll' l
Semantics (L) ,XX (| lS -> lT [] c |) [SRCL] ,src (X)LlS ,XX (| lS -> lT [] c |) [TGTL] ,tgt (X)LlT [v=l],L'Ll' ,LLl [LETL] ,letv=LinL'Ll'
Semantics (X) (|lS->lT[]c|) ,LTLlT ,LSLlS wfx [CONX] ,(| LS -> LT [] c |)X (| lS -> lT [] c |) ,XXx ,LLl [VARX] ,wX (w) [RESX] ,X \ LXxl x ,XXx ,X'Xx' x~x' [ADDX] x ,X + X'Xxx' x
Semantics (X) ,XX (| lS -> lT [ ] c |) ,X'X (| lS' -> lT' ['] c' |) lT lS' [COMPX] l ,X'XX (| lS -> lT' [' ] c' c |) ,LLl [IDXX] ,idx (L)X (| l -> l [id(l)] idc(l) |) ,[w=x]X' Xx' ,XXx [LETL] ,letxw=XinX'Xx'
BONUS SLIDES - More Examples -
Numeral & Boolean Extension • Numeral Extension (catamorphism): • Boolean Extension (catamorphism): [var V] = var [V] [lam VE] = lam [V][E] [app E1E2] = app [E1][E2] [zero] = lam z (var z) [succ E] = lam s [E] [pred E] = app [E] (lam z (var z)) Exp : var Id : lam Id * Exp : app Exp * Exp : zero : succ Exp : pred Exp Exp : var Id : lam Id * Exp : app Exp * Exp [var V] = var [V] [lam VE] = lam [V][E] [app E1E2] = app [E1][E2] [true] = lam a (lam b (var a)) [false] = lam a (lam b (var b)) [if E1E2E3] = app (app [E1][E2]) [E3] Exp : var Id : lam Id * Exp : app Exp * Exp : true : false : if Exp Exp Exp Exp : var Id : lam Id * Exp : app Exp * Exp
Lambda with Booleans lb+ll + lbl (| lb -> l[Exp -> Exp] [true] = '\a.\b.a' [false] = '\a.\b.b' [if E1 E2 E3] = '(([E1][E2]) [E3])' |) ll idx l lb Exp : var Id : lam Id * Exp : app Exp * Exp Exp : true : false : if Exp Exp Exp
Incremental Development --- "li.l" --- --- "l.l" --- { Id = [a-z] [a-z0-9]* ; Exp.var : Id ; Exp.lam : "\\" Id "." Exp ; Exp.app : "(" Exp Exp ")" ; } { Exp.id : "id" ; } --- "li2l.x" --- let l = "l.l" in idx(l) + (| "li.l" -> l[Exp -> Exp] Exp.id : '\z.z' ; |) --- "ln.l" --- { Exp.zero : "zero" ; Exp.succ : "succ" Exp ; Exp.pred : "pred" Exp ; } --- "ln2l.x" --- --- "ln2li.x" --- let l = "l.l" in idx(l) + (| "ln.l" -> l[Exp -> Exp] Exp.zero : '\z.z' ; Exp.succ : '\x.$1' ; Exp.pred : '($1 \z.z)' ; |) let l = "l.l" in idx(l) + (| ln -> l+"li.l" [Exp -> Exp] Exp.zero : 'id' ; Exp.succ : '\x.$1' ; Exp.pred : '($1 id)' ; |) --- "ln2l.x" --- "li2l.x" o "ln2li.x"
Example cont'd • Both statically reduce to samecatamorphism: (| Exp.app : Exp.app($1, $2) ; Exp.lam : Exp.lam($1, $2) ; Exp.pred : Exp.app($1, Exp.lam(Id("z"), Exp.var(Id("z")))) ; Exp.succ : Exp.lam(Id("x"), $1) ; Exp.var : Exp.var($1) ; Exp.zero : Exp.lam(Id("z"), Exp.var(Id("z"))) ; |) { Id = [a-z] [0-9a-z]* ; Exp.app : "(" Exp Exp ")" ; Exp.lam : "\" Id "." Exp ; Exp.pred : "pred" Exp ; Exp.succ : "succ" Exp ; Exp.var : Id ; Exp.zero : "zero" ; } { Id = [a-z] [0-9a-z]* ; Exp.app : "(" Exp Exp ")" ; Exp.lam : "\" Id "." Exp ; Exp.var : Id ; } -> [Exp -> Exp, Id->Id]
Usage Scenarios • Programmers: • May extend existing languages (~ syntax macros) • Developers: • May embedDSLs into host languages (SQL in Java) • Developers (and teachers): • May incrementally specify multi-layeredlanguages • Compiler writers: • May rely on tool and implement only a small core • (and then specify the rest externally as extensions)
BONUS SLIDES - Parsing & Error Reporting -
Parsing • Parsing(XSugar): • Variant of Earley's algorithm: O( ||3 ) • Can parse anycontext-free grammar • Closed under union of languages • Support for production priority • Tool easily adapts to other parsing algorithms
. ASTL / ~L L Ambiguity: parsingunparsing • Unparsing: • Canonical whitespace . . ASTL / ~L L . . • Parsing: • Grammar ambiguity
Ambiguity Analysis • Ambiguity Analysis: • Using implementation ( ) on: • Sourcelanguage; • Target language; and/or • …all intermediate languages (somewhat expensive) • (Note: Ambiguity analysis comes with XSugar tool) "Analyzing Ambiguity of Context-Free Grammars"[ Claus Brabrand | Robert Giegerich | Anders Møller ] ( CIAA 2007 ) "dk.brics.grammar" [ by Anders Møller ]
Error Reporting • Error reporting: • Static parse-error (O'Caml-lex): • Static transformation error (XSugar): • (is actually a parse-error in a cata reconstructor) • Dynamic parse-error (XSugar): • Dynamic transformation error: • impossible :-) Prototype *** In ln2l.x (4,4)-(4,7): Parse error at "Exp" *** Parse error at character 6 (line 1, column 7) in /tmp/shape84e645.txt Could be improved *** Parse error at character 23 (line 1, column 24) in /dev/stdin