240 likes | 354 Views
Java Generics Robert “Corky” Cartwright Rice University 19 Jan 2005. Motivation. In 1998, Java represented a quantum leap forward in mainstream programming technology. How could the PL community make it better?
E N D
Java GenericsRobert “Corky” CartwrightRice University19 Jan 2005
Motivation In 1998, Java represented a quantum leap forward in mainstream programming technology. How could the PL community make it better? Enriching the data model and associated type system. “Adding genericity” in the terminology of OO language designers.
Rules of the Game for Extending Java • Upward compatibility: old program binaries behave as before (excluding programs that make extensive use of reflection) • No changes to the JVM (except libraries) • Interoperability between old and new code • Extension through revised compiler (javac) and class loader • Coherence with the existing language design
Potential changes • Moving primitive types into the object type hierarchy (not done correctly in C#). • Parametric polymorphism for classes, interfaces, and methods. • Full “first class” genericity: allow type variables as superclasses supporting abstraction with respect to the superclass of a class definition (OO mixins).
Blueprint for Extending Java • Design a coherent extension of existing language. • Implement the extension without changing the JVM including run-time libraries (except additions). • Extensions must be supported entirely by the source language compiler (javac) and extensions to class loader. • Ensure that the overhead should be low. • Key tricks: • A source program may generate extra class files (Precedent: inner classes). • Class files may be augmented by new attributes.
Adding Genericity (Parametric Polymorphism) • What is parametric polymorphism?The parameterization of types, e.g., adding a parameter to the Listtype so that List<Integer> designates a list containing only integers. • In Java, coherence is a challenging problem: • Array types are already generic with co-variant subtyping (Integer[] is a subtype of Number[]). • Co-variant subtyping conflicts with flexible static type checking (updates are contra-variant). • Supporting generic run-time types requires significant new execution machinery. • Container classes should easily migrate to corresponding generic classes, e.g. VectorVector<T>
Generics in Java 1.5 (Odersky, Wadler, et al) • Any class or method can be parameterized by type, which introduces type variables just as introduces data variables in Scheme. Class C<T> { … /* T can be used almost anywhere an ordinary type is used */ } Class D { <T> T first(List<T>) { /* The scope of T is the method definition */ …} … } • Each type parameter has an upper bound (Objectby default) specified by an extends clause, e.g., class E<T extends Number> { … } • Type parameters are non-variantly subtyped,e.g.Vector<Number> is unrelated to Vector<Integer>. • Parametric classes and methods are implemented using “type erasure”; every reference to a generic type variable is replaced by its bound. All of the instantiations of a parametric (generic) classes are implemented by a single erased class. Similarly, all of the instantiations of a polymorphic method are implemented by a single erased method.
Understanding Type Erasure • In essence, type erasure translates parameterized code to the standard idiom used to simulate genericity in ordinary Java, e.g., Vector<Integer> Vector augmented by casts where required; these generated casts never fail. • Technical complications: compiler must bridge methods to connect parametric and erased signatures for a method. The parametric signature appears in byte code when a class A extends an instantiated generic class B<E>, e.g. class Environment extends Vector<Binding> {…public Binding elementAt(int i) { … } }
What Java 5.0 Generics Omit • Absence of run-time types inconsistent with naked type parameters and built-in array type: new T(), new T[], new T[][], … are all invalid. • Absence of run-time types inconsistent with run-time type tests provided by Java: instanceof Vector<T> is invalid. (Vector<T>) and (T) are invalid. Exception types cannot be parametric. • Per-class-instance static fields not an option.
Do Run-time Generic Types Matter? Yes. Awkwad to code around absence of: • Isolated parametric allocation [hacked API’s]new T(),new T[],new T[][], ... . • Parametric casts [JSR14](T) ... , (T[]) … , (Vector<T>) … , …. • Instantiated casts[cloning, integration of legacy code](Vector<Integer>) ... , (List<Number>) … , …. • Per-class-instantiation static fields (singletons!)
Co-variant Wild Card Types • New form of parameterized type that allows a wildcard (“*”) as a type argument in paramterized type, e.g., Vector<*>. • Every usage of the wildcard operator has an upper bound (Object by default). • Contra-variant form is analogous but rarely used.
More General Approach: NextGen(Allen, Cartwright, and Steele) • Supports exactly the same extension syntax as Java 1.5, less the restrictions. • All types are available at run-time for casting and instanceof tests. • Lightweight homogeneous (code shared across parametric instantiations) implementation. • Performance of prototype compiler is encouraging.
NextGen Implementation Strategy Augment GJ implementation relying on type-erasure. • Use lightweight “instantiation” classes (generated on demand) to specify run-time types • Replace type dependent operations in base classes by abstract methods (snippets)and override them in instantiation classes
Observations • Performance difference between different JVM’s is much greater than difference between GJ, NextGen, and Java. • Implementation tuning of JIT can eliminate essentially all of the performance penalty through code specialization and method inlining. • Specialization provides opportunity for performance gains! Explicit generic type information provides guidance on how code should be specialized.
Beyond NextGen • Object inlining of boxed primitive types (easy to do with new wrapper classes). • Full Genericity: using parameterized types anywhere that they are sensible. Only significant restriction on use of generic types in NextGen: class C<T implements I> extends T
Why Mixins • Mixins allow programs to abstract directly over uniform class extensions [decorator pattern is the Java workaround for this limitation]. class AddScrollBar<T implements Window> extends T implements ScrollableWindow { …} • Mixins provide the machinery for defining a components within the language as generic classes: class Module<B> {[static] class A extends B {…}… }
Semantics of Mixins: Two Options • Raw macro-expansion (C++ templates) • Performed on demand (lazily) by class loader • Lacking in hygiene • Hygienic macro-expansion • Methods in superclass argument are renamed to avoid accidental overriding • Example: class AddHiddenProperty<T implements Widget> extends Timplements Hideable { private boolean isHidden = false; public boolean isHidden() { return isHidden; } public void setHidden(boolean b) { isHidden = b;} } What if T already contains the method public boolean isHidden()
Type Checking for Mixins • Hygienic formulation is straightforward; legality of a mixin application only depends on whether the type arguments satisfy their specified bounds. • Non-hygienic case is more difficult; a type argument may contain a method that conflicts with a method introduced in a mixin. It is doubtful that these constraints can be checked by a class compiler (like javac) because type arguments can flow across a program via type application.
Challenging issue in mapping mixin genericity onto the JVM Constraints: • Compatible with existing Java binaries. • Must enforce mixin hygiene by systematically renaming some methods in the class loader to avoid accidental overriding. • Extension of the existing NextGen implementation.
Strategy In class loader, rename all methods m in all classes by prefixing them with the mangled name of the class in which they are introduced. Example: method name valuein class interp.Interp becomes $interp$Interp$value Complication bridge methods for interface methods must forward method dispatches on interface types to corresponding methods in classes
Implementation Subtleties • Same method signature may appear in different interfaces, e.g. interface I { …; void next(); …} interface J { …; void next(); …} class C implements I { } class Foo<T> extends T implements J { } Consider: Foo<C> Implements both I.next() and J.next() Must include forwarding methods for both. Solution: class loader prefixes names of methods in interfaces by the interface name where they are introduced. This extension will also enable us to support multiple “per-interface” definitions for a given method signature in a class in Java source.
Implementation Subtleties (cont.) • In principle, several different instantiations of the same generic interface could be implemented by a class in a source program. Java 5.0 and NextGen disallow these programs because we cannot distinguish the methods after erasure. It rarely happens in practice, but it is a corner case that we must handle. (Failure on class loading is not a very satisfactory solution.) Question: can we eliminate this restriction in our extension of NextGen to support mixins?
Supporting Multiple Instantiations of the Same Interface We can modify the NextGen compiler to use instantiated interfaces instead of erased interfaces in both type declarations and the prefixing of method names in interfaces. This approach introduces some extra code because it significantly reduces code sharing. Every interface method call with a receiver type that is an instantiated interface with a free type variable must be implemented by a snippet (since the snippet code will call different methods for different receiver types).
Status of Project • Beta release of NextGen compiler should be available within the next month from www.cs.rice.edu/~javaplt Based on Sun Java 1.5 compiler. Distribution is binary only at this point. • Prototype of MixGen (NextGen + Mixins) will be ready for internal testing by the end of Spring. • Beta release of MixGen during the summer.