200 likes | 362 Views
Just-in-Time Compilation for FPGA Processor Cores. Andrew Becker 1 , Scott Sirowy 2 , Frank Vahid Department of Computer Science and Engineering University of California, Riverside {abecker | ssirowy | vahid}@cs.ucr.edu 1. Now at EPFL 2. Now at ESRI.
E N D
Just-in-Time Compilation for FPGA Processor Cores Andrew Becker1, Scott Sirowy2, Frank Vahid • Department of Computer Science and Engineering • University of California, Riverside • {abecker | ssirowy | vahid}@cs.ucr.edu • 1. Now at EPFL 2. Now at ESRI This work was supported in part by the National Science Foundation (CNS1016792) and by the Semiconductor Research Corporation (GRC 2143.001)
Motivation • SystemC useful capture language • Concurrency, structure, timing • Simulation typical, but in-system I/O often useful • Design/synthesis to FPGA may take hours/days and require advanced tools Switches/LEDs Cameras/displays In-system I/O Simulation
Background • Want rapid design iteration with in-system I/O • Compile design description; avoid design/synthesis • Previously: Hybrid approach—SystemC bytecode SystemC Code Bytecode class CLK_GEN : public sc_module { sc_in<bool> clock; … CLK_GEN(){ … process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start: ADDI $2 $2 1 ADDI $3 $0 7 … Simulator (no in-system I/O) Design/synthesis (time-consuming) Compiler … Portable SystemC-on-a-chip – Sirowy [CODES+ISSS ’09]
Background • Emulate bytecode in engine on FPGA • Fast compilation • Bytecode also portable (FPGA-device independent) FPGA Bytecode process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start: ADDI $2 $2 1 ADDI $3 $0 7 … class CLK_GEN : public sc_module { sc_in<bool> clock; … CLK_GEN(){ … Compiler Emulation Engine In-system I/O Portable SystemC-on-a-chip – Sirowy [CODES+ISSS ’09]
Emulation Engine • Discrete event simulator • C code on a processor • (Currently Microblaze soft-core; could be hard-core) • Support-circuits for architectural features, peripheral I/O Peripheral Bus Processor Core UART Event Kernel LEDs Instruction Mem. Buttons Read Signal Memory Frame Buffer Write Signal Memory
Caveat Emptor • Emulation is slow • On soft-core, is even slower than PC simulation • Won't meet many real-time constraints
This work – Speed up emulator • First analyzed emulator performance
Low-Hanging Fruit • 69% of time spent emulating bytecode • Two strategies to reduce • Reduce each instruction’s emulation time • Reduce instruction memory latency
First Step • Reduce instruction emulation time • Optimize event kernel? Peripheral Bus Processor Core UART Event Kernel LEDs Instruction Mem. Buttons Read Signal Memory Frame Buffer Write Signal Memory
First Step • Reduce instruction emulation time • Optimize event kernel? • Just-in-time (JIT) compile bytecode to native processor code, done transparently by event kernel Peripheral Bus Processor Core UART Event Kernel LEDs Instruction Mem. Buttons Read Signal Memory Frame Buffer Write Signal Memory
Just-in-Time Compilation of Bytecode • Implemented SystemC-bytecode to Microblaze JIT compiler • 3x speedup; still portable • Tunable delay/jitter • Still want more speed Emulation Engine Emulation Engine Machine Code Bytecode Machine Code Machine Code Machine Code process(clock) READ $1 dataRdy BGT $1 $0 Start J Done Start:ADDI $2 $2 1 ADDI $3 $0 7 … IMM 0xDEAD LWI $11 $0 0xBEEF BGTI $11 Start BRAI Done Start: … JIT Event Kernel
Further Improvement • Reduce instruction memory latency • Add dedicated small, fast memory for JIT code on a fast, local bus • Unique JIT possibility due to FPGA configurability
Architecture Changes Peripheral Bus Local Memory Bus Processor Core UART LEDs JIT Mem. Instr. Mem. Buttons Read Signal Memory Frame Buffer Write Signal Memory Emulation Engine
Even Further Improvement • 23% of time spent maintaining signal queue • What can be done? • Optimize signal queue maintenance code?
FPGA FPGA Extra Resources Emulation Engine Emulation Engine Common Denominator • FPGA offers configurability • Engine designer can make tradeoffs • Trade hardware resources for speed
FPGA FPGA Extra Resources Emulation Engine Emulation Engine Common Denominator • FPGA offers configurability • Engine designer can make tradeoffs • Trade hardware resources for speed • Add another soft-core?
Even Further Improvement • 23% of time spent maintaining signal queue • What can be done? • Optimize signal queue maintenance code? • Offload job to coprocessor • Again, unique JIT option due to FPGAconfigurability
Architecture Changes Peripheral Bus Local Memory Bus Processor Core UART LEDs Signal Queue JIT Mem. Instr. Mem. Buttons Read Signal Memory Frame Buffer Emulation Memory Controller Write Signal Memory Emulation Engine
Conclusions • Approach rapid design iteration with in-system I/O • Uses • Education (typically loose timing constraints) • System prototypes that can tolerate real-time slowdown (e.g., slow frame rate) • Portable and flexible • Engine design sets speed, not compiler or CAD flow • This work: 15x speedup via normal JIT (3x) + FPGA-specific JIT (5x) • But, still orders of magnitude slower than design/synthesis • Future work: Bytecode accelerators, JIT synthesis