270 likes | 388 Views
Formalization of Generics for the .NET Common Language Runtime. Dachuan Yu (Yale University) Andrew Kennedy , Don Syme (Microsoft Research Cambridge). Introduction. Upcoming revision of Microsoft .NET platform includes support for parametric polymorphism (“generics”) in
E N D
Formalization of Generics for the .NET Common Language Runtime Dachuan Yu (Yale University)Andrew Kennedy, Don Syme (Microsoft Research Cambridge)
Introduction • Upcoming revision of Microsoft .NET platform includes support for parametric polymorphism (“generics”) in • Programming languages C#, Visual Basic, Managed C++ • Common Language Runtime (the “virtual machine”) • Visual Studio (Integrated Development Environment) • Libraries • Previous work (PLDI’01) described implementation techniques used in the CLR • Now we formalize the polymorphic intermediate language and aspects of the implementation
CLR: The big picture C# program Visual Basicprogram SML.NETprogram C# compiler Visual Basic compiler SML.NET compiler IL IL IL Native binary Loader & JIT front-end Native interop Common Language Runtime JIT IL Remoting Garbage collector Security JIT code-gen Threads ExceptionHandling Machine code
CLR: The big picture C# program Visual Basicprogram SML.NETprogram C# compiler Visual Basic compiler SML.NET compiler IL IL IL Native binary Loader & JIT front-end Native interop Common Language Runtime JIT IL Remoting Garbage collector Security JIT code-gen Threads ExceptionHandling Machine code
High-level design of generics • Type parameterization for all declarations • classes e.g. class Set<T> • interfaces e.g. interface IComparable<T> • structse.g. struct HashBucket<K,D> • methods e.g. static void Reverse<T>(T[] arr) • delegates (“first-class methods”) e.g. delegate void Action<T>(T arg)
Good design => Tricky Implementation • Unrestricted instantiationList<string> ls = new List<string>(); // reference typesList<double> ld = … // primitive typesList<Pair<string,double>> lsd = … // struct types • Full support for run-time typesif (x is Set<string>) { ... } // type-test y = (List<T>) z; // checked cast • Recursion in instantiationsclass List<T> : ICloneable<List<T>> // finiteclass C<T> { C<C<T>> fld; } // infinite
Why formalize? • In previous work (POPL’01, Gordon & Syme) the aim was a type soundness proof for a subset of IL (Baby IL) • Our aims are different: • Implementation techniques used in the CLR product are subtle and difficult to get right (=> bugs, perhaps security holes) • We’d like to validate those techniques • Current JIT- and pre-compilers are not type-preserving • Our formalization provides a basis for typed compiler intermediate languages for more capable and robust compilers • It’s also difficult to express and apply optimizations • Formalization makes this easier • By-product is a generic variant on Baby IL
Formalization: the big picture BILG classes and methods BILG = “Baby IL with Generics”A tiny subset of MS-IL Specialize generic classes and methodsShare instantiations w.r.t. data representationIntroduce types-as-valuesOptimize use of types-as-values BILC classes and methods BILC = “Baby IL with Constrained generics”A typed intermediate language more suitable for code-generation
class ArrayUtils { static List<T> ArrayToList<T>(T[] arr){ …new List<T>()… } } class List<T> { virtual List<T> Append(object obj) { …(List<T>) obj… …new ListCell<T>…} } Illustrative example, in C# Want to share generated code for ArrayToList over different instantiations of T Pass type parameters at runtime? Look up type representations at runtime? Want to share generated code for List over different instantiations of T Look up type representations at runtime? How do we know what T is?
Source Language: BILG • “Baby IL with Generics” • Purely functional, à la Featherweight Java (Igarashi, Pierce, Wadler) • Primitive types & generic classes • Inheritance-based subtyping • Generic methods (static and virtual) • Type-case operation (isinst) inspects run-time type of object • No overloading, no interfaces, no abstract methods, no structs (“value classes”), no delegates, no boxing, no null values, no heap, no bounded polymorphism • Just enough to demonstrate most of the implementation techniques! • Typing rules & big-step semantics in paper • Easier to work with big-step • ¬ 9 v. e v taken as definition of divergence
Source language: BILG (type) T,U ::= X | int32 | int64 | I (inst type) I ::= C<T1,…,Tn> (class def) cd ::= class C<X1,…,Xn > : I {T1 f1 ;…;Tm fm; md1 … mdk } (method def ) md ::= static T m<X1,…,Xn>(T1,…,Tm) { e; } | virtual T m<X1,…,Xn>(T1,…,Tm) { e; } (method ref) M ::= I::m<T1,…,Tn> (expr) e ::= ldc.i4 i4 | ldc.i8 i8 | ldarg x | e1 … en newobj I | e ldfld I::f | e1 … en call M | e e1 … en callvirt M | e isinst I or e
BILG typing and evaluation for isinst E ` e : I E ` e’ : I’ E ` e isinst I’ or e’ : I fr` e I’(f1=v1,…,fn=vn) ` I’ <: I fr ` e isinst I or e’ I’(f1=v1,…,fn=vn) fr` e I’(f1=v1,…,fn=vn) ` ¬(I’ <: I) fr` e’ v’ fr ` e isinst I or e’ v’
BILG typing and evaluation for isinst E ` e : I E ` e’ : I’ Observe: Types affect evaluation They cannot be erased They serve static and dynamic purposes E ` e isinst I’ or e’ : I fr` e I’(f1=v1,…,fn=vn) `I’ <: I fr ` e isinst I or e’ I’(f1=v1,…,fn=vn) fr` e I’(f1=v1,…,fn=vn) `¬(I’ <: I)fr` e’ v’ fr ` e isinst I or e’ v’
Target Language: BILC • Similar to BILG, but adds • Representation constraints on type parameters • ref: “must be a reference type” • i4: “must be a 32-bit integer” • i8: “must be a 64-bit integer” • Types-as-values • RT is a value representing closed type T • The value RT has singleton type Rep(T), interpreted as “is a value representing the type T” • Construct reps for open types mkrepC<T1,…,Tn>(e1,…,en) creates a type-rep for C<T1,…,Tn> given type-reps for T1,…,Tn • Semantics given by small-step reduction relation
Target language: BILC (subset) (type)T,U::=X | int32 | int64 | I (inst type)I::=C<T1,…,Tn> (extended types) ::= T | Rep(T) (constraint) s ::= ref | i4 | i8 (class def)cd::=class C<X1 :s1,…,Xn :sn> : I {T1 f1 ;…;Tm fm; md1 … mdk } (method def )md::=static T m<X1 :s1,…,Xn :sn>(1,…, k) { e; } | virtual T m<X1 :s1,…,X :sn>(1,…, k ) { e; } (method ref) M ::= I::m<T1,…,Tn> (expr) e ::= i4 | i8 | x | I(e,e1,…,en) | e ldfld I::f | e1 … en call M | e e1 … en callvirt M | e isinstIe or e | RT | mkrepC<T1,…,Tn>(e1,…,en)
Some typing and reduction rules E ` C<T1,…,Tn> ok E ` e1 : Rep(T1) … E ` en : Rep(Tn) E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>) E ` e : I’ E ` e’ : Rep(I) E ` e’’ : I “Reflected subtyping”:RIÁ RI’ iff I <: I’ E ` e isinstI e’ or e’’ : I v = I(w,v1,…,vn) w Á w’ ` (v isinstT w or v’) ! v v = I(w,v1,…,vn) w § w’ ` (v isinstT w or v’) ! v’
Some typing and reduction rules E ` C<T1,…,Tn> ok E ` e1 : Rep(T1) … E ` en : Rep(Tn) E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>) E ` e : I’ E ` e’ : Rep(I) E ` e’’ : I E ` e isinstI e’ or e’’ : I v = I(w,v1,…,vn) w Á w’ Observe: Types do not affect evaluation They can be erased They serve only static purposes ` (v isinstT w or v’) ! v v = I(w,v1,…,vn) w § w’ ` (v isinstT w or v’) ! v’
Example • Static generic method in BILG: static List<T> Conv<T>(object a) { …a isinst List<T>… • Translated to BILC: static Listi Convi(object a) { …a isinstTreei RTreei)… static Listl Convl(object a) { …a isinstTreel RTreel… static Listr<T> Convr<T:ref>(Rep(T) r, object a) { …a isinstListr<T> (mkrepListr<T>(r))… Specialized code for T= int32 Specialized code for T= int64 Code shared for reference types Extra parameter representing T Lookup/Create type rep at runtime
We need more… • So far: • specialization, sharing, and separation of run-time types from static types • but mkrep is a costly operation, requiring type-rep creation at runtime • Idea: instead of passing representations for type parameters, pass representations of types that we actually need:static Listr<T> Convr<T:ref>(Rep(Listr<T>) r, object a) { …a isinstListr<T>(r)… Extra parameter representing List<T>
We need more… • In general, we need many type-reps in a single method body • So we pass around dictionaries of type-reps • What type does a dictionary of type-reps have? • At its simplest, it is just a tuple e.g. Rep(List<X>) £ Rep(Vec<Vec<X>>) is type of a two-slot dictionary containing type-reps for List<X> and Vec<Vec<X>> • In general, dictionaries may contain cycles (e.g. for mutually recursive methods), so we need recursive values and their types • Worse still, polymorphic recursion requires “infinite” dictionaries • Simpler: use name-based types for dictionaries • reps for methods: Rep(M), RM, mkrepM(e1,…,en) • statically: each Rep-type determines a particular tuple of other Rep-types • dynamically: each type-rep RT or method-rep RM determines a tuple of type-rep/method-rep values
Target language: BILC (full) (type)T,U::=X | int32 | int64 | I (inst type)I::=C<T1,…,Tn> (ext type) ::= T | Rep(T) | Rep(M) (constraint) s ::= ref | i4 | i8 (class def)cd::=class C<X1 :s1,…,Xn :sn> : I {T1 f1 ;…;Tm fm; md1 … mdk } with 1,…,p (method def )md::=static T m<X1 :s1,…,Xn :sn>(1,…, k) { e; } with 1,…,p | virtual T m<X1 :s1,…,X :sn>(1,…, k) { e; } (method ref) M ::= I::m<T1,…,Tn> (expr) e ::= i4 | i8 | x | I(e,e1,…,en) | e ldfld I::f | e1 … en call M | e e1 … en callvirt M | e isinstIe or e | RT | RM | mkrepC<T1,…,Tn>(e1,…,en) | mkrepC<T1,…,Tn>::m<U1,…,Uk>(e1,…,en,e1,…,ek) | objdicti e | mdicti e
Translation scheme • Static generic methods: • Extra dictionary parameter associated with method • Accessed using mdicti(e) • Virtual methods in generic classes • Obtain dictionary through type of object • Accessed using objdict_i(e) • Generic virtual methods: • Dictionary type not known statically (body could be overridden) • So pass reps for type parameters and construct type-reps at runtime using mkdrep
In the paper… • Complete formalization of BILG, BILC, and a translation • Theorems: • Translation preserves types • Translation preserves behaviour • And in forthcoming technical report: • Full proofs • Type erasure theorem: types in BILC do not affect evaluation
Future work • Extend BILG and the translation to cover more features • Value classes (structs) • Would satisfy representation constraint of form [s1,…,sn] where s1,…,sn are constraints on the fields’ representations • Now have unbounded number of specializations • All methods on generic structs whose code is shared take a dictionary parameter • Need treatment of boxing • Flexible specialization policies • Less sharing: e.g. full specialization of selected types • More sharing: e.g. share all instantiations of C<T> by boxing and unboxing appropriately (cf ML)
Future work: structural typing • Flexible specialization interacts badly with run-time types based on name-equivalence • Instead, describe dictionaries using structural typing: • Products:Rep(List<X>) £ Rep(X) is two-slot dictionary with type-reps for List<X> and X • Circular dictionaries => Recursive types e.g. D. Rep(Vec<X>) £ (Rep(Set<X>) £ D) • Polymorphic recursion in code => Higher-kinded recursive types e.g. (D. X. Rep(Vec<X>) £ D(Set<X>)) string
Related work • Rep(T) • Crary, Weirich, Morrisett: “Intensional polymorphism in type-erasure semantics” • Dictionary-passing for polymorphism implementation • Saha and Shao (ML) • Viroli and Natali (Java)