230 likes | 244 Views
This project involves creating a custom ASIC/FPGA circuit for edge detection in images using SystemC and HDL languages. The system allows for precise communication and execution among components, supporting both software and hardware aspects. Emulation on various platforms is made possible by SystemC bytecode. Pinapa Front End extracts architecture features, while the bytecode back end generates executable code. The process involves mapping, synthesis, and in-system emulation to ensure efficient implementation.
E N D
Portable SystemC-on-a-Chip Department of Computer Science and Engineering University of California, Riverside {ssirowy,bmiller, vahid}@cs.ucr.edu Scott Sirowy, Bailey Miller, and Frank Vahid† †Also with the Center for Embedded Computer Systemsat UC Irvine This work was supported in part by the National Science Foundation and the Office of Naval Research
go address data Edge Detector Memory Controller s1 s2 s6 s3 s4 s8 s9 s7 + + + + + + + + + + + + + 255 - - MIN Pixel Value Introduction: Prototyping Circuits and Systems Task: Create a custom ASIC/FPGA circuit to detect edges in an image
+ + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems data go address Edge Detector Memory Controller s3 s4 s1 s2 s7 s8 s9 s6 - - 255 MIN Capture in HDL -- VHDL/Verilog File Entity Edge_Detector is Port { clk : in std_logic; rst : in std_logic; data: in std_logic_vec … }; …
+ + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems data go address • SystemC • C++ based • Creation, instantiation, and connection of components • Precisely timed communication and execution among concurrently executing components • Supports both “software” and “hardware” constructs and semantics Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN Pixel Value Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos();
data go address Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems • Simulation • Requires environment modeling • Sometimes hard! • Does not interact with real I/O Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Simulation on Desktop PC
data go address Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems • Implementation • Mapping to microprocessor / coprocessor system • Interfacing Issues • Synthesis Issues • Size Constraints Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Mapping & Synthesis
data go address Edge Detector Memory Controller s3 s4 s9 s1 s2 s8 s6 s7 - - 255 MIN + + + + + + + + + + + + + Introduction: Prototyping Circuits and Systems • In-System Emulation • Quickly-obtained simulation interaction with real I/O • Prior to time-consuming mapping and synthesis • But slower Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Emulation
Processor Processor FPGA In-System Emulation of SystemC • How? • Port publicly available SystemC libraries to target platforms • SystemC executable has built-in event kernel • Libraries are large and require OS support SystemC Description
Compiler VM VM VM Bytecode • Modern portability approach • Java, C# Java, C# Bytecode Virtual Machine (VM): Program that executes bytecode May JIT compile to native architecture Opteron Pentium Atom
Compiler VM VM VM SystemC Bytecode? SystemC SystemC Bytecode Opteron + FPGA Pentium FPGA
Portable SystemC-on-a-Chip Task: Create a custom circuit to detect edges in an image Processor Emulation Engine SystemC Bytecode Compiler SystemC Bytecode SystemC Description Processor Processor Emulation Engine Processor FPGA SystemC bytecode can run on any platform that supports the SystemC emulation engine, without the need for recompilation or synthesis Emulation Engine Emulation Accelerators
SystemC Bytecode Compiler Pinapa Front End AST Link ELAB Bytecode Back End SystemC Bytecode Register Allocation Code Generation SystemC Bytecode Compiler class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); } • Pinapa Front End (Moy, EMSOFT’05) • Extracts architectural features and behavior of each process • Uses modified versions of GCC and the SystemC kernel • Bytecode Back End • Flattens original SystemC circuit • Generates SystemC bytecode that preserves architecture and behavioral information • Output is a human-readable text file SystemC Description
SystemC Bytecode • Sequential Instructions • Based on the RISC MIPS instruction set • Efficient emulation (Davis 2003) • Spatial Instructions • Includes meta instructions for defining architectural features, bit width specific computations, and reading and writing signals --header signal clock : 1 signal reset : 1 signal memory_in : 32 signal fb_data : 32 signal leds : 4 process(clock) READ $1 memory_in ADD $2 $0 3 ADD $3 $2 $1 WRITE $3 s1 ADDI $1 $0 1 WRITE $1 dataReady END process(dataReady) READ $5 val6 SW $5 24($0) READ $5 val7 … ADDI $10 $0 0 ADDI $7 $0 0 ADDI $13 $0 8 … END SystemC Bytecode Spatial Constructs MIPS-like sequential instructions
SystemC Emulation Engine • Must support a basic SystemC interface • Clock • Reset • 16 I/O pins • 8KB Input Memory • 8KB Output Memory • UART • Platforms with more advanced I/O might support more features • Increased Memory • Extended General Purpose I/O Output I/O SystemC Circuit Clock UART Tx Reset Input Mem Addr Input I/O Input Mem Stream UART Rx Output Mem Addr Input Mem Data Output Mem Data
USB Interface USB Download Interface SystemC Emulation Engine • Real I/O Peripherals • Representative of many systems • Emulation Engine Kernel • Virtual Machine • Discrete Event Kernel • Peripheral Access and Hooks • Optional USB Download Interface Emulation Engine Main Processor Input Memory Output Memory Instruction Memory UART Read Signal Memory Buttons Write Signal Memory LEDs Emulation Engine Kernel and Support Peripherals I/O Peripherals
Emulation Engine Acceleration • For some SystemC applications, emulation can be slow • An Edge Detection circuit required ~10 minutes to process a 320x240 image * Emulation Engine Main Processor Input Memory SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs * on a 100 MHz/SRAM Microblaze SystemC Emulation Engine implementation
Accelerator 1 Accelerator 2 Accelerator 3 Emulation Engine Acceleration • For some SystemC applications, emulation can be slow • An Edge Detection circuit required ~10 minutes to process a 320x240 image * Emulation Engine Main Processor Input Memory SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs • If available, use platform FPGA to create bytecode accelerators • Execute SystemC bytecode natively FPGA Accelerators speedup emulation * on a 100 MHz Microblaze SystemC Emulation Engine implementation
SystemC Bytecode Accelerators Accelerator Register File Bus, start, load logic RISC Datapath Local Mem • MIPS-like multicycle RISC datapath • Communicates to core emulator via memory-mapped registers • # of accelerators limited to # of masters allowed on bus Emulation Engine Main Processor Input Memory SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA
Accelerator Accelerator Accelerator Accelerator Accelerator Accelerator SystemC-on-a-Chip Implementation Virtex5 VLX110T * Virtex4 Ml403 Xilinx Spartan 3E Platform *Currently building PowerPC (50 MHz) Microblaze (100 MHz) Microblaze (50 MHz) Main Processor PLB PLB Bus Platform OPB SRAM+BRAM SRAM Main Memory BRAM # Emulation Accelerators >3 1-2 0-1 * Demo
SystemC Bytecode Compiler Pinapa AST Link ELAB Back End SystemC-on-a-Chip Implementation • SystemC Bytecode compiler • 3,500 lines of code + Pinapa (20,000 lines) Emulation Engine Main Processor Input Memory Output Memory Instruction Memory • SystemC Emulation Engine • 3,000 lines of C + 8,000 lines of VHDL UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA
SystemC-on-a-Chip Implementation Accelerator Register File Bus, start, load logic RISC Datapath Local Mem • SystemC Bytecode Accelerator • 2,000 lines of VHDL • Area: ~3000 Slices • Clock Frequency: 50-100 MHz Emulation Engine Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA
SystemC-on-a-Chip Experiments Competitive with SystemC PC Simulation, but with the benefits of real I/O Emulation Engine Execution Time Main Processor Input Memory Output Memory Instruction Memory UART Read Signal Memory Base Emulation on Virtex 4 USB Interface Base Emulation on Virtex 5 Buttons Write Signal Memory Emulation + Accelerators (Virtex 4) LEDs Emulation + Accelerators (Virtex 5) Execution Time Normalized to SystemC running on a 2.8 GHz Intel Xeon Accelerator 1 Accelerator 2 Accelerator 3
Conclusions • Introduced SystemC Bytecode as a means to emulate SystemC for prototyping • For platforms with FPGA resources, introduced bytecode accelerators to speed up SystemC performance • Outperforms emulation by over 100X • As proof of concept, built 3 test platforms and tested multiple SystemC circuits without having to recompile or synthesize • Future Directions • Emulation architecture improvements • Synthesizing SystemC just-in-time?