Typed Compilation of Recursive Datatypes

Typed Compilation ofRecursive Datatypes Joseph C. Vanderwaart, Derek Dreyer, Leaf Petersen, Karl Crary, Robert Harper, and Perry Cheng Carnegie Mellon University TLDI 2003

SML Datatypes • Elegant mechanism for defining recursive variant types, such as: datatype intlist = Nil | Cons of int * intlist • Important that constructor applications and pattern matching should be implemented efficiently • Subject of this talk: • How to implement SML datatypes efficiently in a type-preserving compiler

Formal Framework • Harper and Stone’s type-theoretic interpretation of Standard ML: • “Elaborates” SML programs into a type theory • Reasons for using HS: • Models first phase of type-preserving compiler, in particular the TILT compiler (developed at CMU) • Can explain datatype semantics in terms of type theory

Overview • Three interpretations of datatypes: • Harper-Stone interpretation • Transparent interpretation • Coercion interpretation • Comparison on three axes: • Efficiency • Fidelity to the Definition of SML • Meta-theoretic complexity

The Harper-Stone Interpretation

Datatype Semantics • SML datatypes are generative: • Identical datatype declarations in separate modules yield distinct (abstract) types • HS elaborates datatypes as modules providing: • The datatype itself defined as a recursive sum type • Functions to construct and destruct values of the datatype • HS models generativity by “sealing” the datatype module with an abstract signature

ExpDec Example datatype exp = VarExp of var | LetExp of dec * exp and dec = ValDec of var * exp | SeqDec of dec * dec VarExp(v) ¼ “v” LetExp(d,e) ¼ “let d in e” ValDec(v,e) ¼ “val v = e” SeqDec(d1,d2) ¼ “d1; d2”

ExpDec Implementation structure ExpDec :> EXPDEC = struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) fun exp_in x = rollexp(x) fun exp_out x = unrollexp(x) fun dec_in x = rolldec(x) fun dec_out x = unrolldec(x) end

ExpDec Interface signature EXPDEC = sig type exp type dec val exp_in : var + (dec * exp) -> exp val exp_out : exp -> var + (dec * exp) val dec_in : (var * exp) + (dec * dec) -> dec val dec_out : dec -> (var * exp) + (dec * dec) end

Elaborating Constructor Calls • Client of the datatype does the injection into the sum,then calls the datatype’s “in” function: VarExp(v) ÃExpDec.exp_in(inj1(v)) LetExp(d,e) ÃExpDec.exp_in(inj2(d,e)) ValDec(v,e) ÃExpDec.dec_in(inj1(v,e)) SeqDec(d1,d2) ÃExpDec.dec_in(inj2(d1,d2)) • But the cost of function calls to the in functions is too expensive.

Inlining the Constructor Calls • We would like to inline the roll’s to avoid calling the exp_in and dec_in functions: VarExp(v) ÃrollExpDec.exp(inj1(v)) LetExp(d,e) ÃrollExpDec.exp(inj2(d,e)) ValDec(v,e) ÃrollExpDec.dec(inj1(v,e)) SeqDec(d1,d2) ÃrollExpDec.dec(inj2(d1,d2)) • But the definitions of exp and dec are not known outside of ExpDec, so inlining the roll’s is ill-typed!

Separate Compilation • Not a problem if client of datatype defined in same compilation unit: • Unseal the datatype )roll’s become well-typed • Is a problem if client of datatype is defined in separately compiled module: • Datatype is an abstract import of client • Can’t assume knowledge of implementation • Similar problem for datatypes in functor arguments

A Transparent Interpretation

Making Datatypes Transparent • Expose the implementation of a datatype as a recursive sum type in its interface: signature EXPDEC = sig type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) (* in and out function specs as before *) end • Inlining calls to the in and out functions is now well-typed outside of ExpDec

Implications of Transparency • Datatypes are no longer generative • Identically defined datatypes are “visibly” equal • More types are equivalent, more programs may typecheck • Matching a datatype specification is harder • To match a datatype spec, a datatype must now be implemented as a particular recursive sum type • Depending on how you define recursive type equivalence, fewer programs may typecheck!

Transparent Matching Example struct datatype exp = VarExp of var | LetExp of dec * exp and dec = ValDec of var * exp | SeqDec of dec * dec end :> sig type exp datatype dec = ValDec of var * exp | SeqDec of dec * dec end ?

Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end :> sig type exp datatype dec = ValDec of var * exp | SeqDec of dec * dec end ?

Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end :> sig type exp type dec = m1(b).(var * exp + b * b) end ?

? = Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end :> sig type exp type dec = m1(b).(var * exp + b * b) end ?

Notation • Use  to stand for a recursive type, i.e.: d ::= mk(a1,...,an).(t1,...,tn) (k 2 1..n) • Expansion of a recursive type: expand(d)For example, if intlist = m a. 1 + int * a then expand(intlist) = 1 + int * intlist

Iso-Recursive Types • Iso-recursive equivalence is purely structural: • d¹ expand(d), but the two are isomorphic • rolld : expand(d) !d • unrolld : d! expand(d) • Works fine for H-S with abstract datatypes, but…

Transparent Matching Example struct type exp = m1(a,b).(var + b * a, var * a + b * b) type dec = m2(a,b).(var + b * a, var * a + b * b) end :> sig type exp type dec = m1(b).(var * exp + b * b) end ? X

Equi-Recursive Types • Another form of recursive type equivalence: • d = expand(d) • ma.t(a) represents unique solution of a = t(a) • d = ma.t(a) iff d = t(d) • Equi-recursive equivalence is sufficient: • dec matches its specification • Enables transparent interpretation to accept all valid SML datatype matchings

A Hybrid Equivalence • Equi-recursive equivalence is overkill: • Unnecessary to equate a recursive type with a non-recursive type (its expansion) • Hybrid of iso- and equi-recursive equivalence: • Based on FLINT intermediate lang. [League and Shao] • Restriction of Amadio-Cardelli algorithm • Only equates d’s with d’s • Paper gives details of the hybrid algorithm, along with formal argument that it is sufficient

Complications • Strong versions of type equivalence not well studied outside simply typed -calculus. (TILT IL’s have h.-o. constructors, singleton kinds…) • Conflicts with SML semantics: • Datatypes no longer generative. • Problems involving datatypes in sharing andwheretype constraints. • To implement SML, must handle these issues another way.

The Coercion Interpretation

Those in and out Functions • Recall the definitions given during elaboration: fun in(x) = roll(x) fun out(x) = unroll(x) • Consider the roll and unroll operations. • Commonly implemented as “no-ops”. That is, the values v and roll(v) are represented the same. So, roll and unroll are just “retyping” operators, or coercions. • Untyped machine code for in/out same as for the identity function.

-> -> -> -> • At runtime,exp_in, exp_out act as the identity, but: • Cannot be recognized from the type ExpDec Revisited signature EXPDEC = sig type exp type dec val exp_in : var + (dec * exp) exp val exp_out : exp var + (dec * exp) val dec_in : (var * exp) + (dec * dec) dec val dec_out : dec (var * exp) + (dec * dec) end ) ) ) ) • New type constructor: t1)t2 • Inhabited only by coercive terms • Coerciveness of exp_in, exp_out reflected in type • Applications can be ignored at runtime

Coercions • New constructs for the internal language: • Coercion values fold/unfold replace rolld/unrolld • Special type 1)2 distinguishes them from functions. • Special application syntax: v @ e • Define in/out using coercions val in : expand(d) )d = fold val out : d ) expand(d) = unfold • Define constructor app’s using coercion app’s VarExp(x) Ã ExpDec.exp_in@(inj1(x))

Coercion Erasure • Why are coercion applications better than function applications? Because: • A closed value of coercion type can only be fold or unfold. • No work is required at run time to apply either fold or unfold. • To compile v@e, generate the same code as for e. • Safety argument (in the paper) • Formalized via a translation into an untyped target calculus.

Performance • Run times of benchmarks under 3 interpretations. • Harper-Stone ¼ 37% slower than the others • Coercion interpretation about the same as transparent. • Coercion interpretation is faithful to SML semantics, requires only simple extension to the type theory.

Conclusion

Typed Compilation of Recursive Datatypes