1.53k likes | 1.67k Views
Program Development Environments. Languages & Tools. Kris Gaj George Mason University. Acknowledgements. Companies, centers, and sponsors. AMI Cray Mitrion NCSA SGI SRC Star Bridge DoD/LUCITE. Acknowledgements. GWU/GMU students. Esmail Chitalwala (GWU/Star Bridge)
E N D
Program Development Environments Languages & Tools Kris Gaj George Mason University
Acknowledgements Companies, centers, and sponsors • AMI • Cray • Mitrion • NCSA • SGI • SRC • Star Bridge • DoD/LUCITE
Acknowledgements GWU/GMUstudents • Esmail Chitalwala (GWU/Star Bridge) • Hatim Diab (GWU) • Esam El-Araby (GWU) • Miaoqing Huang (GWU) • Hoang Le (GMU) • Allen Michalski (GMU/USC) • Nandkishore Sastry (GMU) • Chang Shu (GMU) • Mohamed Taher (GWU) • Proshanta Saha (GWU)
SRC Programming Model Microprocessor FPGA Libraries of macros function_1 macro_1 macro_2 macro_3 macro_4 ………………………. main.c macro_1(a, b, c) macro_2(b, d) macro_2(c, e) function_1() function_2() VHDL FPGA function_2 I/O a macro_3(s, t) macro_1(n, b) macro_4(t, k) Macro_1 ANSI C c b Macro_2 Macro_2 MAP C (subset of ANSI C) d e I/O
HLL FPGA system HDL SRC Program Partitioning C function for P P system C function for MAP VHDL macro
SRC Compilation Process Application sources Macro sources .mc or .mf files . . vhd or or .v files .c or .f files HDL HDL sources sources Logic synthesis Logic synthesis .v files .v files MAP Compiler P Compiler Netlists . . ngo ngo files files Object .o files .o files files Place & Route Place & Route Linker Linker .bin files .bin files Configuration Application bitstreams executable
SRC Libraries of Hardware Macros Vendor libraries of hardware macros • basic integer and floating-point arithmetic • digital signal processing • User libraries of hardware macros • developed by GWU/GMU/USC 2002-2006 • Secret-key cipher encryption & breaking • Binary Galois Field arithmetic • (polynomial basis & normal basis representation) • Elliptic Curve Arithmetic • Long integer modular arithmetic (RSA) • Sorting • Image processing • Bioinformatics • See http://hpc.gwu.edu/library
Star Bridge Programming Environment - Viva Star Sheets Library Object
.ngo files .bin files Star Bridge Compilation Process User input Netlists Graphical User Interface Xilinx VIVA Place & Route Configuration bitstreams Application executable
Cray XD1 Programming Flows The MathWorks int mask (a, m) Mitrion-C { return (a & m); } MATLAB/ Simulink High-level Flow Synthesis Xilinx Mitrion SystemGenerator process (a, m) is VHDL, begin Verilog z <= a and m; end process; VHDL or Verilog VHDL/Verilog Synthesis Mentor Graphics Gate-level EDIF a Synopsys z m Synplicity Xilinx Standard Flow Xilinx Place & Route 01001011010101 01010110101001 01000101011010 10100101010101 Source: [Cray, MAPLD05]
Behavioral Simulation (VCS, Modelsim) Design Synthesis (Synplify Pro, Amplify) Metadata Processing (Python) Static Timing Analysis (ISE Timing Analyzer) Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c Altix HDL-based SGI Altix Programming Flow Design iterations Design Verification Design Entry (Verilog, VHDL) .v, .vhd .v, .vhd IA-32 Linux Machine .v, .vhd .edf Design Implementation (ISE) .ncd, .pcf .cfg .bin
HLL Design Entry (Handel-C, Mitrion C, Viva) RTL Generation and Integration with Core Services Behavioral Simulation (VCS, Modelsim) Design Synthesis (Synplify Pro, Amplify) Metadata Processing (Python) Static Timing Analysis (ISE Timing Analyzer) Design Implementation (ISE) Device Programming (RASC Abstraction Layer, Device Manager, Device Driver) Real-time Verification (gdb) .c Altix HLL-based SGI Altix Programming Flow Design Verification .v, .vhd .v, .vhd .v, .vhd IA-32 Linux Machine .edf .ncd, .pcf .cfg .bin
Mitrion-CProgramming Model for Cray & SGI Microprocessor FPGA Mitrion Distributed Processor Architecture (platform dependent) Application code (platform independent) VHDL main.c Mitrion-C Mitrion Compiler & Configurator function_1(in1) start_fpga() FPGA function_1(in2) start_fpga() RAM application on the distributed processor ANSI C based on Mitrion API Input &output I/O
Compiling A Mitrion Program Mitrion-C Source code Mitrion Software Development Kit Compiler Processor Machine-code Processor Architecture Processor Configurator Simulator & Debugger Processor HW-Design (VHDL IP Core) FPGA
The Mitrion Platform 1) The Mitrion Virtual Processor • A fine-grain massively parallel, configurable soft-core processor • 10-30 times faster than traditional CPUs 2) The Mitrion-C programming language • An intrinsically parallel C-family language 3) The Mitrion Software Development Kit • Compiler • Debugger/Simulator • Processor configurator
A New Processor Architecture Specifically For FPGAs int:48<30> main() { int:48 prev = 1; int:48 fib = 1; int:48<30> fibonnacci = for(i in <1..30>) { fib = fib+prev; prev = fib; } <>fib; } fibonnacci; Architecture design goal: • High silicon utilization • Take advantage of FPGA re-configurability Goal achieved by: • Allow processor to be massively parallel • Allow processor to be fully adapted to algorithm ?
Processor Architecture: A Cluster-On-A-Chip • Non-Von Neumann architecture • Processor architecture more like a cluster • Very Fine-Grain Parallelism • Normal clusters run a block of code on each PE1 • Mitrion runs a single instruction on each PE • Each PE adapted to optimally run its instruction • Network topology specific for algorithm • No Instruction Stream, instead Data Stream 1) PE = Processing Element
A C-family Language • Basic syntax is the same as for other C-family languages • Examples: • Blocks are surrounded by { } • Assignment with = • Statements end with ; • if, for, while • Most of the usual c operators • C-style comments (though nestable)
Types • Basic types int/uintsigned/unsigned integer boolean boolean value (true/false) float Floating point realvalue bits Bit vector format • Free bit width int:2424 bit signed integer uint:1919 bit unsigned integer float:24.8 IEEE-754 single precision float • Collections int:24[100]Vector (indexable collection) int:14<100>List (no index)
Language constructs Operators if(a>b) ... while(i<10) ... for(i in <0..999>) ... foreach (e in vector) ... int:8 function(int:8 a) ...
A C-family Language • Important differences • No pointers • No dynamic allocation • Static general recursion only • Though loop structures may be dynamic
HLL Program Entry for FPGA Accelerator Boards Graphical Data Flow Diagram HDL Software Traditional Hardware Software Extended (e.g. Corefire) Hardware Increased productivity Increased capability to describe parallel execution
GraphicalData Flow Diagram HDL HLL Program Entry for Reconfigurable Computers Software Star Bridge COM objects porting EDIF Hardware Software SRC Hardware HDL macros Increased productivity Increased capability to describe parallel execution
GraphicalData Flow Diagram HDL HLL Program Entry for Reconfigurable Computers CrayXD1 with Simulink Software Simulink Hardware Xilinx System Generator SGI or Cray with Mitrion Software Mitrion Processor Hardware Mitrion-C Increased productivity Increased capability to describe parallel execution
General hierarchy of library files suggested by SRC Computers Inc.
Structure of the SRC macro repository < top of repository > < macros > <lib # 1 > <lib # 2 > <lib # 3 > common rev_d rev_e rev_f macro2 macro3 macro1 InfoFile BlkBoxFile DebugCodeFile DataSheet hdlfile
Platform independent HDL file: macro.v or macro.vh Verilog or VHDL code defining the macro Debug Code File: macro.c provides the equivalent C functionality for the macro Data sheet file: datasheet contains the documentation for the macro Platform dependent Blk Box File: blackbox.v Interface (black box) definition for the macro in Verilog Info File: info Info file entry for this macro Files describing an SRC macro
HLL (C, Fortran) HLL (C, Fortran) FPGA system HLL (C, Fortran) HLL (C, Fortran) Library Development - SRC LLL (ASM) P system HDL (VHDL, Verilog) Library Developer Application Programmer
GDF (Viva) GDF (Viva) FPGA system GDF (Viva) GDF (Viva) Library Development - StarBridge HLL, LLL (C++, ASM) P system HDL (VHDL, Verilog) Library Developer Application Programmer
Software libraries and their role in the development of SRC libraries
Roles of software libraries source of test vectors for VHDL macros| emulation of hardware during debugging performance comparison
How to approach porting your application to reconfigurable computers? 1. Identify class of applications 2. Identify basic operations required by your applications 3. Determine the existence of the RC library of such operations 4. Determine the existence of the microprocessor library of such operations 5. Determine the right granularity for the required library operations
Classes of applications • input/output intensive applications • bulk data encryption (DES, IDEA, and RC5 encryption) 2. computationally intensive applications • secret-key cipher breaking based on the exhaustive key search (DES, IDEA, RC5 breakers) • public-key cipher breaking based on factoring 3. latency-critical applications • cipher key agreement and signature (ECC schemes, RSA)
Example 1 Cryptography: High-throughput encryption
Cipher message cryptographic key K bits ciphertext
Secret-key ciphers key of Alice and Bob - KAB key of Alice and Bob - KAB Network Decryption Encryption Bob Alice
High-Throughput Encryption . . . . Mi+2 Mi+1 Mi K0 Encryption algorithms: DES, 3DES, AES, RC5, IDEA, etc. Encryption Ci+2 Ci+1 Ci
Fully Pipelined Architecture Loop unrolling Pipeline stages inside of cipher rounds New input & new output every clock cycle . . . . Round 1 . . . . Round 2 . . . . . . . Round k . . . .
#include <libmap.h> void encryption (uint64_t sdata[], uint64_t key, uint64_t *hardware_timein, uint64_t *hardware_timeprocess, uint64_t*hardware_timeout, int mapnum) { OBM_BANK_A (S1OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_B (S2OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_C (S3OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_D (S4OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_E (S5OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_F (S6OBM, uint64_t, MAX_OBM_SIZE) uint32_t encrypt_decrypt; //0:encrypt 1:decrypt int i, nbytes; uint64_t t1,t2,t3,t4; Encryption on SRC-6 – No streamingencryption.mc (1)
encrypt_decrypt = 0; nbytes = MAX_OBM_SIZE * 8*3; start_timer(); read_timer(&t1); DMA_CPU(CM2OBM, S1OBM, MAP_OBM_stripe(1,"A,B,C"), sdata, 1, nbytes, 0); wait_DMA(0); read_timer(&t2); for(i=0;i<MAX_OBM_SIZE;i++) { des (S1OBM[i], key, encrypt_decrypt, &S4OBM[i]); des (S2OBM[i], key, encrypt_decrypt, &S5OBM[i]); des (S3OBM[i], key, encrypt_decrypt, &S6OBM[i]); } read_timer(&t3); Encryption on SRC-6 – No streamingencryption.mc (2)
Encryption on SRC-6 – No streamingencryption.mc (3) DMA_CPU(OBM2CM, S4OBM, MAP_OBM_stripe(1,"D,E,F"), sdata, 1, nbytes, 5); wait_DMA(5); read_timer(&t4); *hardware_timein = t2-t1; *hardware_timeprocess = t3-t2; *hardware_timeout = t4-t3; }
Encryption on SRC-6 – No streamingdes_blkbx.v module des ( desOut, desIn, keyin, decrypt, clk ) /* synthesis syn_black_box syn_noprune=1 */ ; output [63:0] desOut; input [63:0] desIn; input [63:0] keyin; input decrypt; input clk /* synthesis syn_noclockbuf=1 */ ; endmodule
Encryption on SRC-6 – No streamingdes.info (1) BEGIN_DEF "des" MACRO = "des"; LATENCY = 17; STATEFUL = NO; EXTERNAL = NO; PIPELINED = YES; INPUTS = 3: I0 = INT 64 BITS (desIn[63:0]) I1 = INT 64 BITS (keyin[63:0]) I2 = INT 32 BITS (decrypt) ; OUTPUTS = 1: O0 = INT 64 BITS (desOut[63:0]) ; IN_SIGNAL : 1 BITS "clk" = "CLOCK";
Encryption on SRC-6 – No streamingdes.info (2) DEBUG_HEADER = $ void des__dbg (long long desin, long long keyin, int decrypt, long long *desout); $; DEBUG_FUNC = $ #include <des.h> void des__dbg(long long desin, long long keyin, int decrypt, long long *desout) { des_(desout, &desin, &keyin, &decrypt); } $; END_DEF
#include <libmap.h> void encryption (uint64_t sdata[], uint64_t key, uint64_t *hardware_timeprocess, uint64_t *hardware_timeout, int mapnum) { OBM_BANK_A (S1OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_B (S2OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_D (S4OBM, uint64_t, MAX_OBM_SIZE) OBM_BANK_E (S5OBM, uint64_t, MAX_OBM_SIZE) uint32_t encrypt_decrypt; //0:encrypt 1:decrypt int i, nbytes; uint64_t t1,t2,t3; Stream_64 S0, S1; uint64_t v0, v1; encrypt_decrypt = 0; nbytes = MAX_OBM_SIZE * 8*2; Encryption on SRC-6 - with streamingencryption.mc (1)
start_timer(); read_timer(&t1); #pragma src parallel sections { #pragma src section { stream_dma_cpu_dual (&S0, &S1, PORT_TO_STREAM, S1OBM, DMA_A_B, sdata, 1, nbytes); } #pragma src section { for (i=0; i<MAX_OBM_SIZE; i++) { get_stream (&S0, &v0); get_stream (&S1, &v1); des (v0, key, encrypt_decrypt, &S4OBM[i]); des (v1, key, encrypt_decrypt, &S5OBM[i]); }; } } Encryption on SRC-6 – with streamingencryption.mc (2)