1 / 24

Titanium: A High Performance Java-Based Language

Titanium: A High Performance Java-Based Language. Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul Hilfinger, Arvind Krishnamurthy, Ben Liblit, Carleton Miyamoto, Geoff Pike, Luigi Semenzato,. Talk Outline. Motivation Extensions for uniprocessor performance

Download Presentation

Titanium: A High Performance Java-Based Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Titanium: A High Performance Java-Based Language Katherine Yelick Alex Aiken, Phillip Colella, David Gay, Susan Graham, Paul Hilfinger, Arvind Krishnamurthy, Ben Liblit, Carleton Miyamoto, Geoff Pike, Luigi Semenzato,

  2. Talk Outline • Motivation • Extensions for uniprocessor performance • Extensions for parallelism • A framework for domain-specific languages • Status and performance

  3. Programming Challenges on Millennium • Large scale computations • Optimized simulation algorithms are complex • Use of hierarchical parallel machine • Cost-conscious programming Minimization algorithms Unstructured meshes ? Adaptive meshes

  4. Titanium Approach • Performance is primary goal • High uniprocessor performance • Designed for shared and distributed memory • Parallelism constructs with programmer control • Optimizing compiler for caches, communication scheduling, etc. • Expressiveness secondary goal • Based on safe language: Java • Safety simplifies programming and compiler analysis • Framework for domain-specific language extensions

  5. New Language Features • Immutable classes • Multidimensional arrays • also: points and index sets as first-class values • multidimensional iterators • Memory management • semi-automated zone-based allocation • Scalable parallelism • SPMD model of execution with global address space • Language-level synchronization • Support for grid-based computation

  6. Java Objects • Primitive scalar types: boolean, double, int, etc. • access is fast • Objects: user-defined and from the standard library • has level of indirection (pointer to) implicit • arrays are objects • all objects can be checked for equality and a few other operations 3 true r: 7.1 i: 4.3

  7. Immutable Classes in Titanium • For small objects, would sometimes prefer • to avoid level of indirection • pass by value • extends the idea of primitive values (1, 4.2, etc.) to user-defined values • Titanium introduces immutable classes • all fields are final(implicitly) • cannot inherit from (extend) or be inherited by other classes • needs to have 0-argument constructor, e.g., Complex () immutable class Complex { ... } Complex c = new Complex(7.1, 4.3);

  8. Arrays in Java • Arrays in Java are objects • Only 1D arrays are directly supported • Array bounds are checked (as in Fortran) • Multidimensional arrays as arrays of arrays are slow and cannot transform into contiguous memory

  9. Titanium Arrays • Fast, expressive arrays • multidimensional • lower bound, upper bound, stride • concise indexing: A[p] instead of A(i, j, k) • Points • tuple of integers as primitive type • Domains • rectangular sets of points (bounds and stride) • arbitrary sets of points • Multidimensional iterators

  10. Example: Point, RectDomain, Array Point<2> lb = [1, 1]; Point<2> ub = [10, 20]; RectDomain<2> R = [lb : ub : [2, 2]]; double [2d] A = new double[R]; … foreach (p in A.domain()) { A[p] = B[2 * p]; } • Standard optimizations: • strength reduction • common subexpression elimination • invariant code motion • removing bounds checks from body

  11. Memory Management • Java implemented with garbage collection • Distributed GC too unpredictable • Compile-time analysis can improve performance • Zone-based memory management • extends existing model • good performance • safe • easy to use

  12. Zone-Based Memory Management • Allocate objects in zones • Release zones manually Z1 Zone Z1 = new Zone(); Zone Z2 = new Zone(); T x = new(Z1) T(); x T y = new(Z2) T(); x.field = y; x = y; delete Z1; Z2 y delete Z2; // error

  13. Sequential Performance Times in seconds (lower is better).

  14. Model of Parallelism { • Single Program, Multiple Data • fixed number of processes • each process has own local data • global synchronization (barrier) n processes start ... barrier ... barrier ... ... barrier ... end

  15. lv lv lv lv lv lv gv gv gv gv gv gv Global Address Space • Each process has its own heap • References can span process boundaries Other processes Process 0 LOCAL HEAP LOCAL HEAP Class T { … } T gv; T lv = null; if (thisProc() == 0) { lv = new T(); // allocate locally } gv = broadcast lv from 0; // distribute … gv.field ...

  16. Global vs. Local References • Global references may be slow • distributed memory: overhead of a few instructions when using a global reference to access a local object • shared memory: no performance implications • Solution: use local qualifier • statically restrict references to local objects • example: T local lv = null; • use only in critical sections

  17. Global Synchronization Analysis • In Titanium, processes must synchronize at the same textual instances of barrier() doThis(); barrier(); boolean x = someCondition(); if (x) { doThat(); barrier(); } doSomeMore(); barrier();

  18. Global Synchronization Analysis • In Titanium, processes must synchronize at the same textual instances of barrier() • Singleness analysis statically guarantees correctness by restricting the values of variables that control program flow doThis(); barrier(); boolean single x = someCondition(); if (x) { doThat(); barrier(); } doSomeMore(); barrier();

  19. Support for Grid-Based Computation R Point<2> lb = [0, 0]; Point<2> ub = [6, 4]; RectDomain<2> R = [lb : ub : [2, 2]]; … Domain<2> red = R + (R + [1, 1]); foreach (p in red) { … } (6, 4) (0, 0) R + [1, 1] (7, 5) (1, 1) red (7, 5) Gauss-Seidel relaxation with red-black ordering (0, 0)

  20. Implementation • Strategy • compile Titanium into C (currently C++) • Posix threads for SMPs (currently Solaris threads) • Lightweight Active Messages for communication • Status • runs on SUN Enterprise 8-way SMP • runs on Berkeley NOW • trivial ports to 1/2 dozen other architectures • tuning for sequential performance

  21. Titanium Status • Titanium language definition complete. • Titanium compiler running. • Compiles for uniprocessors, NOW; others soon. • Application developments ongoing. • Many research opportunities.

  22. Parallel Performance Speedup • Numbers from Ultrasparc SMP • Parallel efficiency good • EM3D (unstructured kernel) • 3D AMR limited by algorithm Number of processors

  23. Future Directions • Use of framework for domain-specific languages • Fluids and AMR done • Unstructured meshes and sparse solvers • Better programming tools • debuggers, performance analysis • Optimizations • analysis of parallel code and synchronization done • optimizations for caches on uniprocessors and SMPs underway • load balancing on clusters of SMPs

  24. Conclusions • Performance • sequential performance consistently close to C/FORTRAN • currently: 80% slower to 25% faster • sequential efficiency very high • Expressiveness • safety of Java with small set of performance features • extensible to new application domains • Portability, compatibility, etc. • no gratuitous departures from Java standard • compilation model easily supports new platforms

More Related