330 likes | 423 Views
Training Software version v2.2. Levels of the Design Flow. A|RT Builder in his design environment. Generated HDL Design: Architecture. C description is mapped in a “one-to-one fashion into a sequential HDL description
E N D
Training Software version v2.2
Generated HDL Design: Architecture • C description is mapped in a “one-to-one fashion into a sequential HDL description • For a single C function, the generated HDL Component is a Mealy machine with 2 parts: • Compute part • Reset/Update part
Generated HDL Design: Architecture • The function hierarchy in C is preserved by the generation of a corresponding component hierarchy in the HDL • Logic synthesis can be performed bottom-up • Interchangeability of components with similar behavior
Generated HDL Design: Compute Process • Purely combinatorial • Calculates outputs and next state, based upon inputs and current state (Mealy machine) • Signals are exchanged with lower-level components • Input signals and current state are copied into local variables first • calculations are done using local variables • next state and outputs are assigned to output signals
Pure Combinatorial logic Example #include <fxp.h> void adv_comb( const Int<4> a, const Int<4> b, const Int<4> c, Int<4>& z ) { #pragma OUT(z) Int<4> tmp = a*b; z = tmp-c; }
DSP data types • integer and fractional numbers are a special case of fixed point • fix <p,q> (ART designer & SystemC) p q 1 0 1 -19/8 = -2.375 1 1 1 fix <8,3> 1 0 -24 23 22 2-2 21 20 2-1 2-3 Scale factor 1/8 negative weight 2’s complement quantization error Same alu handles fix <8,1>, fix <8,2>, fix <8,3>, ... if q=0 then integer e.g. int <8,0> if q=p-1 then fractional e.g. int <8,7>
DSP data types -19/8 1 1 1 0 1 1 Int <8,3> 1 0 Int <8,4> 97/16 1 0 1 0 0 1 0 0 -1843/128 1 1 1 0 1 1 1 0 0 0 1 1 1 0 1 0 Some processors (C54) have special instructions for fractional numbers s x x x s y y y -------- s s z z z z z z s z z z z z z 0 => if FRCT = 1
C Type A|RT Library Type [signed] char Fix<8,0> unsigned char Ufix<8,0> Fix<16,0> [signed] short [int] unsigned short [int] Ufix<16,0> [signed] int Fix<32,0> unsigned int Ufix<32,0> [signed] long [int] Fix<32,0> unsigned long [int] Ufix<32,0> bool Uint<1> void Not mapped Data Type Conversion • The standard C types are mapped into A|RT Library types before being mapped into HDL types
Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises
Compile Build Design Flow C code C Test Bench Flow Graph Verification HDL Test Bench VHDL Verilog
Compile C code Flow Graph Build VHDL Verilog Compile Step • Compiles C Code into internal Flow graph • ANSI C • optionally extended with A|RT Library types • Optimizations • Constant Propagation • Dead Code Elimination • Renaming Elimination • Block Flattening • Results • Optimized Data Dependency Graph • C(++) testbench
C code Compile Flow Graph Build VHDL Verilog Build step • Builds HDL Description • VHDL IEEE 1076-1987 • Verilog IEEE 1364-1995 • Options • HDL configuration • Test-bench configuration • Results • Synthesizable HDL description • HDL testbench
Cross-Highlighting Between Source and HDL Using the right mouse button
C subset • This part specifies the subset of the ANSI C language that is supported by A|RT Builder. • On top of the C subset, the fixed-point datatypes and operators provided by A|RT Library are fully supported. The ANSI C constructs that are NOT supported by A|RT Builder will be listed. For a number of constructs, the text describes in what way they are supported. Constructs that are not mentioned are fully supported.
Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises
Exercise 1: A New Project • A|RT Builder is organized in projects • Select Project>New (or <New> button in the toolbar) • Enter a name for your project
Exercise 2: Import, Compile & Build • Select Source>Import… • Browse to the subdirectory that contain the training examples. • Select file ex1.cxx and press <open> • The Import dialog box: • Compile and Build the design
Exercise 3: Inspecting the HDL • Look at the HDL design by selecting Reports>Design… • Use cross-highlighting for more transparent analysis. • Notice the following: • Parameters of the ‘A|RT Builder run’ and cross-reference information • Definition of the HDL component • Translation of the body of the C function to COMPUTE_PROC process • Signal casting
Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises
Purpose of the Test Benches • C Test Bench • created during compile step • helps in validating the design as entered and modified in A|RT Builder • HDL Test Bench • created during build step • allows you to verify whether the behavior of HDL output is identical to the C description • Requirements • input file(s) • reference file to compare with generated output file(s)
Bench Behavior • Input files • Binary representations (or decimal values for the C bench) of the input signals, saved in ASCII format. 1 and 0 are the only valid characters. One input word per line. • Every input argument in the main C function must have a corresponding input file with filename: <name_of_C_input>.INP • The wordsize of the representation in the file must match the datatype wordsize • Output files • .OUT files are generated by the C test bench and the HDL test bench • Besides 0 and 1, they can contain X, Z and - characters
Overview Introduction Key Concepts Operating Procedures C Subset Exercise A|RT style guide Verification and logic synthesis Advanced Use Exercises
Pipelining • From input to output, a design contains a series of logical and arithmetic operations. • Suppose a specific clock rate is imposed on a design. • Pipelining is achieved by introducing pipeline registers in order to reduce the number of logic gates between registers. • Now, the outputs are delayed with the number of clockcycles equal to the number of pipeline stages. Valid outputs are obtained after an initial startup phase. For a long chain of logic, this will not always be feasible, depending on the gate delay parameters of the target technology.
Pipelining • Achieved by introducing pipeline registers in order to reduce the number of logic gates between registers • Valid output now obtained after a startup phase of 1 clock cycle MAX clockrate = 20Mhz MAX clockrate = 33Mhz
Pure Combinatorial logic Pipelined Pipelining : Example #include <fxp.h> void adv_comb( const Int<4> a, const Int<4> b, const Int<4> c, Int<4>& z ) { #pragma OUT(z) Int<4> tmp = a*b; z = tmp-c; } #include <fxp.h> void adv_pipe( const Int<4> a, const Int<4> b, const Int<4> c, Int<4>& z ) { #pragma OUT(z) static Int<4> tmp=0; z = tmp-c; Int<4> tmp_nxt = a*b; tmp = tmp_nxt; }
Resource Sharing • Trading speed for area by using multiple clock cycles to execute the algorithm once. • This way, being able to share resources between different clock cycles. • However, the efficiency of the resource sharing depends on the used synthesis tool. • The synthesizing step can be steered in the right direction by describing the resource sharing explicitly in the C description.
Purely combinatorial val1 * val2 out + val3 * val4 Resource Sharing : Example #include <fxp.h> void mac2( const Int<32> val1, const Int<32> val2, const Int<32> val3, const Int<32> val4, Int<32>& out ) { #pragma OUT(out) Int<32> prod1 = val1*val2; Int<32> prod2 = val3*val4; out=prod1 + prod2; }
Implicit resource sharing (2 cycles) A new input is supplied every 2 cycles A valid output is obtained every second cycle Resource sharing depends on the intelligence of the synthesis tools val1 out * + DFF val2 val3 * val4 Resource Sharing : Example #include <fxp.h> void mac2_mult( const Int<32> val1, const Int<32> val2, const Int<32> val3, const Int<32> val4, Int<32>& out ) { #pragma OUT(out) static Int<32> prod1 = 0; static Uint<1> cycle = 0u; switch (cycle) { case 0: // process 1st cycle {prod1=val1*val2;} break; case 1: // process 2nd cycle {Int<32> prod2=val3*val4; out = prod1 + prod2;} break; } ++cycle; // update cycle if (cycle==2) cycle=0; }
Resource Sharing : Example (2) • Explicit resource sharing (execution in 2 cycles) Int<4> result=inp1*inp2; switch (cycle) { case 0: // 1st cycle prod1=result; break; case 1: // 2nd cycle out = prod1 + result; break; } ++ cycle; if (cycle==2) cycle=0; } #include <fxp.h> void mac2_multi( const Int<4> val1, const Int<4> val2, const Int<4> val3, const Int<4> val4, Int<4>& out ) { #pragma OUT(out) static Int<4> prod1=0; static Uint<1> cycle = 0u; switch(cycle) { case 0: // 1st cycle inp1=val1; inp2=val2; break; case 1: // 2nd cycle inp1=val3; inp2=val4; break; } + DFF *
Resource Sharing of For Loops • Hardware within loops will be generated as many times as there are loop iterations • costly • not feasible for high clockrates • When creating a state machine to execute the algorithm • the hardware within the loop can be shared over all loop iterations • every loop iteration will be performed in a single clock cycle