240 likes | 367 Views
Towards a Java Multiprocessor. Christof Pitter, Martin Schoeberl Vienna University of Technology, Austria. 27.September 2007. LEON3 by. ARM11 MPCcore by. Motivation. Chip multiprocessing (CMP) Actual trend in server & desktop systems Embedded systems Challenge for hard real-time systems
E N D
Towards a Java Multiprocessor Christof Pitter, Martin Schoeberl Vienna University of Technology, Austria 27.September 2007
LEON3 by ARM11 MPCcore by Motivation • Chip multiprocessing (CMP) • Actual trend in server & desktop systems • Embedded systems • Challenge for hard real-time systems • RT-Java promising topic for future Towards a Java Multiprocessor
Our Goal • Chip-multiprocessor (CMP) • global shared memory • Java Optimized Processor (JOP) = Java VM in hardware • Time predictable • Still good performance • Implementation in FPGA Towards a Java Multiprocessor
Our Goal II Towards a Java Multiprocessor
Agenda • CMP Architecture • Memory Model • Cache Memory • Synchronization • FPGA Implementation • Benchmark Results • Conclusion • Future Work Towards a Java Multiprocessor
Memory Model Shared Memory Distributed Shared Memory Towards a Java Multiprocessor
Why Shared Memory? JVM memory areas 2 shared data areas: Heap, Method area Towards a Java Multiprocessor
Why no NoC? • No use for a network • Multiple masters to a slave • Masters communicate through memory • May introduce long latencies • Hardware Overhead SoC bus Towards a Java Multiprocessor
Cache Memory • Cache coherence conflicts avoided by architecture • Stack cache: private data for each thread • Method cache: read-only memory • Heap not cached Towards a Java Multiprocessor
Synchronization • Protect parallel access to shared objects • JVM: associates a lock with each object • JOP: activation & deactivation of interrupts • CMP: • Use of one global lock for the heap • Future work: multiple locks • Avoidance of priority inversion • Priority inheritance locks Towards a Java Multiprocessor
Proposed Architecture Towards a Java Multiprocessor
FPGA Implementation • Up to 3 JOPs • Memory arbiter • SoC bus (SimpCon) • External shared memory • Development board: • Altera Cyclone EP1C12 • 1Mbyte SRAM Towards a Java Multiprocessor
Simple SoC Interconnect (SimpCon) • Synchronous SoC bus • Point-to-point communication • Master-Slave interconnection • Signals only valid for 1 cycle • Master can continue execution • Signal rdy_cnt: • Informs master of availabe data • Fast data transfer due pipelining Towards a Java Multiprocessor
Memory Arbiter I • Resolves conflicts of competing memory requests • SimpCon interface: • Masters with arbiter • Arbiter with slave • Scalable for variable # of CPUs Towards a Java Multiprocessor
Memory Arbiter II • Fixed priority arbitration scheme • Priority established by unique CPU ID • Lowest ID is top priority • Zero-cycle arbitration: • Arbitration process happens in same cycle • No bus request phase (AMBA) • Increases memory bandwidth • Will it scale? Reduces fmax Towards a Java Multiprocessor
Experiments • Performance measurements on real hardware • Benchmark JavaBenchEmbedded • Real world application tasks: • Lift (elevation controller in automation factory) • Kfl (node of distributed motor control system) • One task per CPU • Performance measured in iterations/s Towards a Java Multiprocessor
Benchmark Results I • Comparison between dual JOP against single JOP • Same frequency (80 MHz) • Single JOP result: • Lift 13138 iterations/s • Dual JOP result: Towards a Java Multiprocessor
Benchmark Results II • Comparison between tripple JOP against single JOP • Maximum frequencies • Single JOP result at 100 MHz • Lift 16425 iterations/s • Tripple JOP result at 75 MHz: Towards a Java Multiprocessor
Speedup Towards a Java Multiprocessor
Resource Consumption • Cyclone EP1C12Q240 by Altera (12060 LE, 29,25 KB) Towards a Java Multiprocessor
Maximum Frequency Towards a Java Multiprocessor
Conclusion • Proposed Java CMP with shared memory • Verification of CMP architecture • Dual JOP & Tripple JOP prototypes running in real hardware • Performance measurements: • Dual JOP 1.58 times better perf. @ fmax • Tripple JOP 2.1 times better perf. @ fmax Towards a Java Multiprocessor
Future Work • Synchronization: multiple locks • Improvement of memory arbiter: • Different arbitration schemes for time predictability • Zero-cycle latency? • Experiments with more cores on FPGA • RT-Scheduling for CMP Towards a Java Multiprocessor
Thank You! Questions & Comments