310 likes | 474 Views
CREC: A Novel Reconfigurable Computing Design Methodology. Octavian Cret , K a lm a n Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania. Introduction. CREC: low-cost general-purpose reconfigurable computer; Dynamically generated architecture;
E N D
CREC: A Novel Reconfigurable Computing Design Methodology Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania
Introduction • CREC: low-cost general-purpose reconfigurable computer; • Dynamically generated architecture; • Built in a Hardware/Software CoDesign manner; • Based on FPGA devices, on VHDL language and high level language (Java); • No need for integration in a dedicated VLSI chip.
CREC’s Main Features • Reconfigurable RISC computer; • Parallel computer: each register has an associated Execution Unit (EU); • All the EUs have an identical structure, and each one is able to execute any kind of instruction from the CREC Instruction Set; • Having a greater number of EUs has the advantage of introducing Instruction Level Parallelism.
The Parallel Compiler (I.) • Parses the CREC-RISC source code; • Takes important decisions upon the execution system that will be generated; • Divides a program that is written in a sequential manner into portions of code to be executed at the same time; • Determines the minimal number of program slices; • Determines which instructions will be executed in parallel in each slice.
The Parallel Compiler (II.) • Uses a set of rules; • An example: each slice can contain at most one Load, Store or Jump instruction; • Reads the application source code (in CREC assembly language) and generates a file in a specific format, giving a description of the tailored CREC; • The resulting CREC architecture contains only the hardware needed to execute the subset of instructions used in the program.
Results of the Parallel Compiler • The size of the various functional parts; • The subset of instructions involved; • The number of execution units (N); • The sequence of instructions making up the program; • The resulting CREC architecture contains only the hardware needed to execute the subset of instructions used in the program.
Slices • The instructions that are assigned to each EU to be executed at a same moment of time make up a program slice; • The whole program is divided into slices; • The slice’s size depends on the designed number of execution units used for program execution.
Program Example • Classical, non-optimal multiplication of two integers without overflow check using three EUs • Program sequence, and the instruction scheduling: • [1] MOV R1,2 • [2] MOV R2,3 • [3] MOV R3,3 • [4] ADD R1,R2 • [5] DEC R3 • [6] JNZR3,[4] • [7] MOV STORB,R1 • [8] STORE [200]
VHDL Source Code Generator • VHDL files contain an already written source code, where the main architecture’s parameters are given as generics and constants; • The following components can be tailored: • The number of EUs; • The register’s width in all the EUs; • The size of the Instructions Memory and Operands Memory for each EU; • The size of the Data Stack and Slice Stack Memory; • The slice-mapping block, containing instructions.
The Hardware Architecture • The N Execution Units; • Instruction Memories; • Data Stack Memory (for Push and Pop); • Slice Stack Memory (for Call and Return); • A Slice Program Counter; • A Slice-mapping Memory; • Store Buffer and Load Buffer; • Data Memory (external or internal); • Operand Memories.
The Instruction Set • Relatively large instruction set, contains more instructions than the usual microcontrollers have; • Every instruction performs operation only on unsigned integers; • Each EU is potentially able to execute any kind of instruction from the CREC Instruction Set.
Data Manipulation Instructions • Addition with or without Carry; • Subtraction with or without Borrow and compare; • Logical functions: And, Or, Xor, Not and Bit Test; • Shift arithmetic and logic to left/right; • Rotate and rotate through Carry to left/right; • Increment/Decrement and 2’s Complement.
Instruction Format and Example • “G” defines the Instruction Group (Data Manipulation); • “Code” is the operation code (ex. Add, Sub); • “Type” specifies the operation type (ex. with/without Carry); • “Load” contains the load signals for the register and for the Carry and Zero flags; • “D” is the Register/Data selection for the second operand.
Program Control Instruction • Slice counter manipulation: Jump, Call and Return; • Data movement: Move; • Stack manipulation: Push and Pop; • Input from and Output to port: In and Out; • Load from and Store to external memory; • For great flexibility every instruction exists also in the conditioned form: C (Carry), Z (Zero), E (Equal), A (Above), AE (Above or Equal), B (Below), BE (Below or Equal) and with negation too.
Instruction Format and Example • “G” defines the Instruction Group (Program Control); • “Code” is the operation code (ex. Jump, Call); • “Conditions” field contains the code for validating the execution of a given instruction; • “R” is the load signal for the Register (ex. Move); • “D” is the Register/Data selection for the second operand.
The Execution Unit • Decoding Unit – decodes the instruction code; • Control Unit – generates the control signals for the Program Control Instruction group; • Multiplexer Unit – the second operand of the binary instructions is multiplexed by this unit; • Operating Unit – realizes data manipulating operations; • Accumulator Unit – stores the instruction result; • Flag Unit – contains the two flag bits: Carry Flag (CF), and the Zero Flag (ZF)
The Optimized Operating Unit • Symmetrical organization: at the right side are the binary instruction blocks, and at the left side are the unary operation blocks (performing operations only on the accumulator); • The blocks use only one level of FPGA slices; • All four subunits use the same number of slices; • Takes advantage of the Fast Carry Lines; • The size of the Operating Unit is growing linearly with the word length.
Virtex Optimized Arithmetic Unit • The basic 2-bit ADD/SUB cell using the Fast Carry Lines consumes only one Xilinx VirtexE slice.
Arithmetic and Logic Opcodes Opcodes of the arithmetic unit Where L is the “Not Load” and S is the “Subtract” signal Opcodes of the logic unit
Virtex Optimized Shift Left Unit • The basic 2-bit SHL/ROL/NEG/INC/DEC cell using the Fast Carry Lines consumes only one slice.
Virtex Optimized Shift Right Unit • The basic 2-bit SHR/ROR/NOT cell using the Fast Carry Lines consumes only one Xilinx VirtexE slice.
Shift Left and Right Opcodes Opcodes of the shift left unit Where S is the “Shift” and D is the “Decrement” signal Opcodes of the shift right unit Where S is the “Shift” and N is the “Not” signal
Shift and Rotate Operations • SHL– Shift Left; • SAL– Shift Arithmetic Left; • ROL– Rotate Left; • RCL– Rotate through Carry Left. • SHR– Shift Right; • SAR– Shift Arithmetic Right; • ROR– Rotate Right; • RCR– Rotate through Carry Right.
Execution Unit Resources • A complete Execution Unit (with all the subunits generated) having 8-bit wide accumulator consumes 20 CLBs, that is approximately 0.6% of a Xilinx Virtex600E FPGA chip; • An Execution Unit with 16-bit wide register consumes 35 CLBs, that is approximately 1% of the available CLBs.
Experimental Results • Functional Parallel compiler; • Execution Units optimized for Xilinx VirtexE device; • Slice Memory and Stack Memory under test; • A CREC architecture having 4 EUs with 4-bit wide registers occupies 4% of the CLBs and 5% of the BlockRAMs in the Virtex600E device; • A CREC architecture having 4 EUs with 16-bit wide registers occupies 18% of the CLBs and 20% of the BlockRAMs in the Virtex600E device; • The operating clock frequency is 100 MHz.
Performance evaluation • The performance indexes show how many times faster a given algorithm is executed on an optimised CREC system than in the case of classical execution flow
Conclusions and Further Work • Creating the possibility of writing high-level programs for CREC; • Extend the functionalities of the Parallel Compiler, then create a C or PASCAL compiler for CREC applications; • Several variants of CREC architectures; • Hardware distributed computing, using the FPGA configuration over the Internet.