20 likes | 163 Views
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick. FPGAs
E N D
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick • FPGAs • A field-programmable gate array (FPGA) is a programmable logic device which can be configured to implement any logical function • They are made up of: configurable logic blocks programmable interconnects • FPGAs are programmed with a schematic or hardware description language (HDL) design • Design Flow • FPGAs and application-specific integrated circuits (ASICs) are designed according to HDL hardware design flow • Traditional HDLs include VHDL and Verilog CLB architecture CLB pin layout images: http://en.wikipedia.org/wiki/fpga • The Handel-C Language • Handel-C is a behavioral HDL by Celoxica • It is made up of: • A subset of ANSI-C language elements • Extensions for concurrency • A set of variable width primitive types • A set of architectural types such as interfaces and rams • Each assignment statement takes 1 clock cycle • Example 8-bit multiplier in Handel-C: • Shared Memory and OpenMP • A shared memory system has multiple processing cores with access to a common, shared memory • Shared memory can be accessed by each processor simultaneously • Communication and synchronization is achieved through shared variables • OpenMPis an API for shared memory parallel programming in C/C++ • Parallelism is specified explicitly through a set of pragmadirectives • Run-time library functions control environment settings such as the number of threads set clock = external; void main (void) { int 8 result; interface bus_in (int 8 a, int 8 b) input (); interface bus_out () output (int 8 data_out = result); result = input.x * input.y; }
Representing a C program • Source code is parsed and represented as an abstract syntax tree • OpenMP-Handel-C Translator • Wong et al. created the OpenMP-Handel-C translator [1] • It is based on C-Breeze, a C compiler infrastructure • Their modifications include: • Addition of new abstract syntax tree nodes for OpenMPpragmas • Addition of the OpenMP grammar to the GNU Flex/Bison-based parser • Modifications to C-Breeze’s built-in C-to-C translator enabling C-to-Handel-C translation based on a set of porting rules • The OpenMP abstract syntax tree nodes generate Handel-C code that implement the supported OpenMP directives • Data types supported for translation are: int, char, and long int is_even (int x) { if (x % 2 == 0) return 1; else return 0; }Example source program and AST representation • Translator Limitations • No OpenMP run-time library functions • Number of threads is fixed at compile time • Nested parallelism is not supported • Parallel reduction variables must be 32-bit integers • All variables of type int map to 32-bit registers, which may use more resources than necessary [1] Leow, Y.Y.; Ng, C.Y.; Wong, W.F. Generating Hardware from OpenMP Programs. IEEE International Conference on Field-Programmable Technology 2006 / FPT 2006. 73-80. • Benchmark Methodology • An initial set of tests have been developed: • A Mandelbrot set generator • Miller-Rabin primality test • Systolic sequence alignment • The translated OpenMP programs are compiled to VHDL in Celoxica’s DK 5.0, and then the VHDL is synthesized into hardware using Xilinx’s ISE 9.1 • Resource usage and performance data is recorded • Variable Bit Width • Better control over resource usage should lead to better performance • A new compiler directive was implemented to allow variable bit width • Register widths are automatically adjusted when translating expressions whose widths don’t match #pragma handelc width 8 int x; #pragma handelc function return 8 params (8, 16) int my_function (int param1, int param2); Example C program fragment with bit width annotations int 8 x; inline int 8 my_function (int 8 param1, int 16 param2); Translated C program fragment • Preliminary Results • The Mandelbrot set was generated with a resolution of 640x480 pixels • Varying bit width settings were used for program variables • Resulting resource usage and performance data was collected • Ran out of hardware resources for the 48-bit version after 6 threads • Resource usage and execution time decreased • Future Work • Complete the remaining benchmark tests • Implementation of OpenMP library functions such as omp_get_thread_id() • Study the feasibility of a tool that determines the optimal number of threads • Integrate the improved translator with other tools being developed by the Reconfigurable Computing Research Group