500 likes | 652 Views
Innovation in the Real World: Making Generics Mainstream. Jim Miller Architect, Common Language Runtime Andrew Kennedy Microsoft Research Don Syme Microsoft Research. A remark of Archimedes quoted by Pappus of Alexandria, c. AD 340. Agenda. Introduction What are Generics?
E N D
Innovation in the Real World: Making Generics Mainstream Jim MillerArchitect, Common Language Runtime Andrew KennedyMicrosoft Research Don SymeMicrosoft Research A remark of Archimedes quoted by Pappus of Alexandria, c. AD 340
Agenda • Introduction • What are Generics? • Phase I: Research Prototype • Phase II: Joint Development • Phase III: New Feature! • Phase IV: What’s Missing? • Phase V: Moving the World
What This Talk Is About Research One New Feature:Generics Product Development
Agenda • Introduction • What are Generics? • Phase I: Research Prototype • Phase II: Joint Development • Phase III: New Feature! • Phase IV: What’s Missing? • Phase V: Moving the World
In English . . . Instead of defining StackOfInt, StackOfString, etc., use class Stack<T> { void Push(T item) { … } T Pop() { … } T TopOfStack() { … } } static Stack<int> IntStack; static Stack<string> StringStack; • Type safe (compile and design time support) • Shared code (better perf, easier maintenance)
Polymorphic Programming Languages Standard ML Eiffel O’Caml C++ Ada Clu GJ Haskell Mercury Miranda Pizza
By 2005: Managed C++ C# Visual Basic Java Cobol, Fortran, …?
Design for multiple languages C++Give me template specialization C++Can I write class C<T> : T C#Just give me decent collection classes C++And template meta-programming JavaRun-time types please Visual BasicDon’t confuse me! EiffelAll generic types covariant please HaskellRank-n types? Existentials? Kinds? Type classes? MLFunctors are cool! SchemeWhy should I care? COBOLChange my call syntax!?!?
Simplicity => no odd restrictions interface IComparable<T> { int CompareTo(T other); } class Set<T> : IEnumerable<T> where T : IComparable<T>{ private TreeNode<T> root; public static Set<T> empty = new Set<T>(); public void Add(T x) { … } public bool HasMember(T x) { … }}Set<Set<int>> s = new Set<Set<int>>(); Interfaces and superclass can be instantiated Constraints can reference type parameter (“F-bounded polymorphism”) Even statics can use type parameter Type arguments can be value or reference types
Non-goals • C++ style template meta-programmingLeave this to source-language compilers • Higher-order polymorphism, existentialsLet’s get the basics right first!
Agenda • Introduction • What are Generics? • Phase I: Research Prototype • Phase II: Joint Development • Phase III: New Feature! • Phase IV: What’s Missing? • Phase V: Moving the World
MSR PrototypeMay 1999 – Feb. 2000 • Started with not-yet-completed V1 sources • Private copy, no reason to stay in sync • Discussed approach with product team • Modified key CLR data structures • Object layout • Runtime type representation • Virtual dispatch tables • Modified the JIT compiler • Modified the C# compiler • ~9 months, 2 researchers
Compiling polymorphism, as was Two main techniques: • Specialize code for each instantiation • C++ templates, MLton & SML.NET monomorphization • good performance • code bloat (though not a problem with modern C++ impls) • Share code for all instantiations • Either use a single representation for all types (ML, Haskell) • Or restrict instantiations to “pointer” types (Java) • no code bloat • poor performance (extra boxing operations required on primitive values)
Compiling polymorphism in the Common Language Runtime • Polymorphism is built-in to the intermediate language (IL) and the execution engine • CLR performs “just-in-time” type specialization • Code sharing avoids bloat • Performance is (almost) as good as hand-specialized code
Code sharing • Rule: • share field layout and code if type arguments have same representation • Examples: • Representation and code for methods in Set<string> can be also be used for Set<object> (string and object are both 32-bit GC-traced pointers) • Representation and code for Set<long> is different from Set<int> (int uses 32 bits, long uses 64 bits)
Exact run-time types • We want to supportif (x is Set<string>) { ... } else if (x is Set<Component>) { ... } • But representation and code is shared between compatible instantiations e.g. Set<string> and Set<Component> • So there’s a conflict to resolve… • …and we don’t want to add lots of overhead to languages that don’t use run-time types (ML, Haskell)
Object representation in the CLR vtable ptr vtable ptr element type fields no. of elements elements normal object representation:type = vtable pointer array representation:type is inside object
Object representation for generics • Array-style: store the instantiation directly in the object? • extra word (possibly more for multi-parameter types) per object instance • e.g. every list cell in ML or Haskell would use an extra word • Alternative: make vtable copies, store instantiation info in the vtable • extra space (vtable size) per type instantiation • expect no. of instantiations << no. of objects • so we chose this option
Object representation for generics x : Set<string> y : Set<object> vtable ptr vtable ptr fields fields code for Add Add Add code for HasMember HasMember HasMember ToArray ToArray code for ToArray … … string object
Selling The Results • Presented prototype to product teams • Reviewed design with product teams • Reviewed code with product teams Sold! Provided • researchers port their work to the active code base • … and complete missing items • … and train the product team on the new code • … and remain on-board to answer questions
Agenda • Introduction • What are Generics? • Phase I: Research Prototype • Phase II: Joint Development • Phase III: New Feature! • Phase IV: What’s Missing? • Phase V: Moving the World
What’s in the design? • Type parameterization for all declarations • classes e.g. class Set<T> • interfaces e.g. interface IComparable<T> • structse.g. struct HashBucket<K,D> • methods e.g. static void Reverse<T>(T[] arr) • delegates (“first-class methods”) e.g. delegate void Action<T>(T arg)
Life Is HellFeb. 2000 – Nov. 2002 • In a live tree with 150 other developers! • Especially if you are 6000 miles away, • in a time zone that’s off by 8 hours, • and connected by a slow Internet connection • There were “some issues” with the prototype • Additional work to flesh out design • Reflection • Debugging • Performance • Pre-compilation (“Ngen”)
Precompilation (ngen) • JIT compilation is flexible, but • can lead to slow startup times • increases working set (must load JIT compiler, code pages can’t be shared between processes) • Instead, we can pre-compile • .NET CLR has “ngen” tool for native generation • IL is compiled to x86 up-front • runtime data structures (vtables etc) are persisted in native image • read-only pages (e.g. code) can be shared between processes • loader now responsible only for “link” step (cross-module fix-ups)
Ngen for generics • For non-generic code, to ngen an assembly: • just compile every class and method in the assembly • perhaps inline a little across assemblies • For generic code: • compile every generic class and method, but at what instantiations? • just reference types? (code is shared) • or some “commonly-used” types? (e.g. int) • we don’t know statically what instantiations will be used • it’s a “separate compilation” problem
Ngen all instantiations • Our approach: • always compile generic code for reference-type instantiations • for value type instantiations, compute the transitive closure of instantiations used by the assembly • compile code for those instantiations not already present in other linked ngen images • leads to code duplication • at load-time, just pick one • has some interesting interactions with app-domain code-sharing policy (see SPACE’04 paper on Don Syme’s home page)
NGen: example MyCollections Client1 Client2 class List<T>class Set<T>…Set<int>… struct Point…List<Point>…Set<int>…List<int>… class Window…List<Window>……List<int>… ngen x86 for List<object> x86 for Set<object> x86 for Set<int> x86 for List<Point>x86 for List<int> x86 for List<int>
NGen: when we can’t • JIT is still required for • instantiations requested through reflection (“late-bound”)e.g. typeof(List<>).BindGenericParameters(typeof(int)) • generic virtual methods • double dispatch, on instantiation and class of object • polymorphic recursion (unbounded number of instantiations)
Issues and Resolutions • Getting the Results Out • MSR wanted to share their work • CLR wouldn’t allow live source out ⇨ Port work to Rotor and release in source form • Remote Development Issues ⇨ One researcher, two months in Redmond ⇨ Coordinate check-in times • Transfer of Ownership ⇨ Code reviews ⇨ Phone calls, email lists, and accountability
Plan Of Record • Generics will be in the CLR in “Whidbey” • Generics will be in C# in Whidbey • Class libraries will ship a separate “generic collections” class • Not part of mscorlib, the lowest-level library • Generic interfaces to be added to a select few basic types (arrays implement IList<T>, etc.) • Generics are not CLS compliant in Whidbey • Give time to other languages to implement them • Not required in base libraries for Whidbey • Generics must be “forward compatible” • Old runtimes can execute new code, provided they don’t use generics
Agenda • Introduction • What are Generics? • Phase I: Research Prototype • Phase II: Joint Development • Phase III: New Feature! • Phase IV: What’s Missing? • Phase V: Moving the World
Forward Compatibility • Doesn’t seem too bad at first • But what if a program uses Reflection? • If the underlying system uses generics, the application program will see them even if it doesn’t use them • And what about debugging? • What if an old debugger tries to debug a program that uses generics? • And what about serialization? • It’s risky, and it’s fragile • And for other reasons we abandoned it … • A security-related change to the metadata • But it’s too late to change the basic design
Generics Are “In the Build”Nov. 2002 – May 2004 • C# implements them fully • VB does user acceptance testing • Users like the feature • But they find it confusing • VB reworks the language design, retests, and finds them “usable by Mort” • Longhorn library developers start to use them • Managed C++ provides support for them
Announcement! • Anders Hejlsberg announces generics in C# • No backing off the feature now! • Early customer feedback (inside and outside Microsoft) is very positive • But customers report bugs and design problems • Performance is a serious issue for internal users • Product team takes primary ownership • But still needs support from MSR, especially on design issues • Like the constraint language
What’s in the design (2)? Constraints on type parameters • class constraint (“must extend”)e.g. class Grid<T> where T : Control • interface constraints (“must implement”)e.g. class Set<T> where T : IComparable<T> • type parameter constraints (“must subtype”)e.g. class List<T> { void AddList<U>(List<U> items) where U : T } • 3 special cases • Can be instantiated (“new”) • Can be null (“nullable”) • Must be a value type (“struct”)
And What About Perf? • Do generics really provide performance? • It depends on how you ask the question… • And who is asking the question • Or at least why they are really asking the question
My Perf Measurements • Note: • First three columns are based on my “natural” implementation of QuickSort(Array). • Second three are based on Andrew Kennedy’s QuickSort(Array, ComparisonOperation)
What’s Our Recommendation? • Performance numbers are never • Simple • Complete • Repeatable • “Apples to apples” isn’t always the question • Sometimes absolute performance is paramount • Sometimes ease-of-use is paramount • Usually it’s a combination of both • Guidelines differ based on the audience
Agenda • Introduction • What are Generics? • Phase I: Research Prototype • Phase II: Joint Development • Phase III: New Feature! • Phase IV: What’s Missing? • Phase V: Moving the World
Early Adopters • Pre-Beta releases circulated to select customers • Feedback is very positive • Lots of suggestions • Including a complete rewrite of collections • Including many previously requested features • We’re ready for Beta 1 • So we can only do a few items, and only the most important
Remember… C++Give me template specialization C++Can I write class C<T> : T C#Just give me decent collection classes C++And template meta-programming JavaRun-time types please Visual BasicDon’t confuse me! EiffelAll generic types covariant please HaskellRank-n types? Existentials? Kinds? Type classes? MLFunctors are cool! SchemeWhy should I care? COBOLChange my call syntax!?!?
What’s in the design (3)? • Variance annotations on type parameters (CLR only) • covariant subtypinginterface IEnumerator<+T> { T get_Current(); bool MoveNext(); }so IEnumerator<string> assignable to IEnumerator<object> • contravariant subtypinginterface IComparer<-T> { int Compare(T x, int y); } so IComparer<object> assignable to IComparer<string>
Agenda • Introduction • What are Generics? • Phase I: Research Prototype • Phase II: Joint Development • Phase III: New Feature! • Phase IV: What’s Missing? • Phase V: Moving the World
Standards and the CLS • All changes for generics submitted to ECMA • And later to ISO • Common Language Specification • A “deal” between compiler writers and library designers • Remember the plan of record? • No generics in the CLS • Schedules are readjusted • Longhorn (OS) will ship Whidbey • later version had been planned • Compilers enforce CLS rules • Windows API (WinFX) uses generics heavily • Library teams want generics in the CLS
Remember… C++Give me template specialization C++Can I write class C<T> : T C#Just give me decent collection classes C++And template meta-programming JavaRun-time types please Visual BasicDon’t confuse me! EiffelAll generic types covariant please HaskellRank-n types? Existentials? Kinds? Type classes? MLFunctors are cool! SchemeWhy should I care? COBOLChange my call syntax!?!?
Moving the World • The CLS is the lever . . . • Languages sign up to be “consumers” or “extenders” • Library designers sign up to live within the rules • But it isn’t well placed • Nothing requires languages to live up to their part • Not all rules can be mechanically checked • And Microsoft doesn’t have central enforcement • And using it is painful • Will languages move forward or pull out? • It depends on their customers (developers) • And the importance to them of the libraries • And the complexity/cost of implementation • Will libraries stay within the bounds? • It depends on their customers (developers) • And the complexity/cost of implementation
Pointers • If you’re interested: • “Design and Implementation of Generics for .NET”, PLDI’01 • “Formalization of Generics for .NET”, POPL’04 • “Transposing F to C#”, CCPE, 2004 • “Generics, Pre-compilation and Sharing”, SPACE’04 • http://research.microsoft.com/~akenn • Download Whidbey Beta1: • http://msdn.microsoft.com/vs2005 • Download prototype generics implementation (Gyro) extending the Shared Source CLI: • http://research.microsoft.com/projects/clrgen