1 / 45

Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator)

Vienna University of Technology August 2 nd Dr. Ian Rogers, Research Fellow, The University of Manchester ian.rogers@manchester.ac.uk. Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator). Presentation Outline. The JAMAICA project

sauda
Download Presentation

Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vienna University of Technology August 2nd Dr. Ian Rogers, Research Fellow, The University of Manchester ian.rogers@manchester.ac.uk Highly Parallel, Object-Oriented Computer Architecture(also the Jikes RVM and PearColator)

  2. Presentation Outline • The JAMAICA project • What are the problems? • Hardware design solutions • Software solutions • Where we are • The Jikes RVM – behind the scenes • PearColator – a quick overview

  3. Problems • “The job of an engineer is to identify problems and find better ways to solve them”, James Dyson (I'm sure many others) • There are many problems currently in Computer Science and more on the horizon • Problem solving is adhoc, and many good solutions aren't successful • Let's look at the problems from the bottom up

  4. Problems in Process • Heat and power are problems • Smaller sizes (<45nm) lead to problems with process variations, degradation over time, more transient errors. • 3D or stacked designs have significant problems • Simulation must be repeated for all possible design and environment variations so statistically a design should work

  5. Problems in Architecture • Speed of interconnect • Die area is very large • Knowing your market “tricks” are key to realising performance – especially in the embedded space • Move away from general purpose design – GPUs, physics processors

  6. Problems in Systems Software • Lag of systems software behind new hardware • How to virtualise systems with minimal cost • Lots of momentum in existing solutions • Problems with natively executable code: • needs to be run in a hardware/software sandbox • no dynamic optimisation of code with libraries and operating system • cost of hardware to support virtualization

  7. Problems in Compilers • The hardware target keeps moving • The notion of stacked languages, and virtual machines isn't popular • Why aren't we better at instruction selection? • embedded designs have vasts amount of assembler code targeting exotic registers and ISAs • How to parallelize for thousands of contexts • Machine learning?

  8. Problems for Applications • Application writers have an abundance of tools and wisdom to listen to, the wisdom often conflicts • Application concerns: • performance • maintainability (evolution?) • time to implement • elegance of solution

  9. Problems for Consumers • Cost • Migration of systems • Legacy support

  10. Recap • Process: lots of transistors, lots of problems • Architecture: speed of interconnect, complexity • Systems software: momentum • Compilers: stacking, using new architectures • Applications: lots of tools and wisdom, concerns • Consumers: how much does it cost? What about my current business systems and processes?

  11. Oh dear, it's a horrible problem, what we need is a new programming language • Why? • parallel programming is done badly at the moment • we should teach CS undergraduates this language • we will then inherently get parallel software • Why not? • CSP has already been here [Hoare 1978] • clusters are already solving similar problems using existing languages • it's easy to blame the world's problems on C

  12. Do we need another programming language? • What I think is needed are domain specific languages, or domain specific uses of a language: • extracting parallelism from Java implies work not necessary at a mathematical abstraction – MatlabP, Mathematica, Fortran 90 • codecs, graphics pipelines, network processors – languages devised here should express just what's necessary to do the job • message passing to avoid use of shared memory

  13. Virtual Machine Abstraction • We don't need another programming language, we need common abstractions • This abstraction will be inherently parallel but not inherently shared memory • Java is a reasonable choice and brings momentum

  14. Architecture • Point-to-point asynchronous interconnect • Simultaneous Multi-Threading to hide latencies (e.g. Sun Niagara, Intel HyperThreading) • Object-Oriented – improve GC, simplify directories • Transactional – remove locks, enable speculation

  15. Object-Oriented Hardware • Proposed in the Mushroom project from Manchester • Recent papers by Wright and Wolzcko • Address the cache using object ID and offset L1 Data Cache Offset Object ID

  16. Object-Oriented Hardware • On a cache miss the object ID is translated to the object’s address L1 Data Cache MISS! Offset Object ID

  17. Object-Oriented Hardware • We can re-use the TLB • Having a map allows objects to be moved without altering references • Only object headers will contain locks Object to Address Map Virtual Memory Object ID TLB

  18. Transactional Hardware • Threads reading the same object can do so regardless of their epoch • (based on [Ananian et al., 2005]) Transaction Object ID

  19. Transactional Hardware • When a later epoch thread writes to an object a clone is made Transaction Object ID

  20. Transactional Hardware • If an earlier thread writes to an object the later threads using that object rollback Transaction Object ID

  21. Object-Oriented and Transactional Hardware • Again the TLB can remove some of the cost Transaction Object ID Map Virtual Memory TLB

  22. Speculation • Speculative threads created with predicted input values and expected not to interact with other non-speculative threads • Transaction can complete if we didn’t rollback and inputs to thread were as predicted • Can speculate at: • Method calls • Loops

  23. Operating Systems • Supporting an object based and virtual memory view of a system implies extra controls in our system • Therefore, we want the whole system software stack inside the VM

  24. Operating Systems

  25. Where we are • Jikes RVM and JNode based Java operating systems • Open source dynamic binary translator (arguably state-of-the-art performance) • Simulated architecture • Parallelizing JVM, working for do-all loops, new work on speculation and loop pipelining • Lots of work on the other things I've talked about

  26. The Jikes RVM • Overview of the adaptive compilation system: • Methods recompiled based on their predicted future execution time and the time taken to compile • Some optimisation levels are skipped

  27. The baseline compiler • Used to compile code the first time it’s invoked • Very simple code generation: Load t0, [locals + 0] Store [stack+0], t0 Load t0, [locals + 4] Store [stack+4], t0 Load t0, [stack+0] Load t1, [stack+4] Add t0, t0, t1 Store [stack+0], t0 Load t0, [stack+0] Store [locals + 0], t0 • iload_0 • iload_1 • iadd • istore_0

  28. The baseline compiler • Pros: • Easy to port – just write emit code for each bytecode • Minimal work needed to port runtime and garbage collector • Cons: • Very slow

  29. The boot image • Hijack the view of memory (mapping of objects to addresses) • Compile list of primordial classes • Write view of memory to disk (the boot image) • The boot image runner loads the disk image and branches into the code block for VM.boot

  30. The boot image • Problems: • Difference of views between: • Jikes RVM • Classpath • Bootstrap JVM • Fix by writing null to some fields • Jikes RVM runtime needs to keep pace with Classpath

  31. The runtime • M-of-N threading • Thread yields are GC points • Native code can deadlock the VM • JNI written in Java with knowledge of C layout • Classpath interface written in Java

  32. The Jikes RVM • Overview of the adaptive compilation system: • Methods recompiled based on their predicted future execution time and the time taken to compile • Some optimisation levels are skipped

  33. The optimizing compiler • Structured from compiler phases based on HIR, LIR and MIR phases from Muchnick • IR object holds instructions in linked lists in a control flow graph • Instructions are an object with: • One operator • Variable number of use operands • Variable number of def operands • Support for def/use operands • Some operands and operators are virtual

  34. The optimizing compiler • HIR: • Infinite registers • Operators correspond to bytecodes • SSA phase performed • LIR: • Load/store operators • Java specific operators expanded • GC barrier operators • SSA phase performed • MIR: • Fixed number of registers • Machine operators

  35. The optimizing compiler • Factored control graph: • Don’t terminate blocks on Potentially Exceptioning Instructions (PEIs) • Bound check • Null check • Checks define guards which are used by: • Putfield, getfield, array load/store, invokevirtual • Eliminating guards requires propagation of use

  36. The optimizing compiler • Java – can we capture and benefit from strong type information? • Extended Array SSA: • Single assignment • Array – Fortran style - a float and an int array can’t alias • Extended – different fields and different objects can’t alias • Phi operator – for registers, heaps and exceptions • Pi operator – define points where knowledge of a variable is exposed. E.g. A = new int[100], later uses of A can know the array length is 100 (ABCD)

  37. The optimizing compiler • HIR: Simplification, tail recursion elimination, estimate execution frequencies, loop unrolling, branch optimizations, (simple) escape analysis, local copy and constant propagation, local common sub-expression elimination • SSA in HIR: load/store elimination, redundant branch elimination, global constant propagation, loop versioning • AOS framework

  38. The optimizing compiler • LIR: Simplification, estimate execution frequencies, basic block reordering, branch optimizations, (simple) escape analysis, local copy and constant propagation, local common sub-expression elimination • SSA in LIR: global code placement, live range splitting • AOS framework

  39. The optimizing compiler • MIR: instruction selection, register allocation, scheduling, simplification, branch optimizations • Fix-ups for runtime

  40. Speculative Optimisations • Often in a JVM there’s potentially not a complete picture, in particular for dynamic class loading • On-stack replacement allows optimisation to proceed with a get out clause • On-stack replacement is a virtual Jikes RVM instruction

  41. Applications of on-stack replacement • Safe invalidation for speculative optimisation • Class hierarchy-based inlining • Deferred compilation • Don’t compile uncommon cases • Improve dataflow optimization and improve compile time • Debug optimised code via dynamic deoptimisaton • At break-point, deoptimize activation to recover program state • Runtime optimization of long-running activities • Promote long-running loops to higher optimisation levels

  42. PearColator

  43. PearColator • Decoder: • Disassembler • Interpreter (Java threaded) • Translator • Generic components: • Loaders • System calls • Memory

  44. PearColator

  45. Thanks and… any questions?

More Related