1.51k likes | 1.69k Views
Training Software Version v2.2. Training Overview. Key Concepts Edit and Compile Source Create Architecture Map to Architecture Schedule Operations Build the RT-Level Verify the Design Create and Use a User Library Supported C subset. Key Concepts.
E N D
Training Software Version v2.2
Training Overview Key Concepts Edit and Compile Source Create Architecture Map to Architecture Schedule Operations Build the RT-Level Verify the Design Create and Use a User Library Supported C subset
Key Concepts Edit and Compile Source Create Architecture Map to Architecture Schedule Operations Build the RT-Level Verify the Design Create and Use a User Library Supported C subset
ROM Custom Logic RAM Embedded Core I/O Logic Custom Logic Electronic Product Design High-Complexity Applications Time-2-Market Time-2-Profit Power-Efficient, High-Performance, Cost-Effective, Flexible Architectures Low-Cost Low-Power Deep-Sub-Micron Silicon Assembly
branch logic ALU MULT IN OUT RAM ROM Behavioral Synthesis algorithm Layout Generation gates layout Design Flow Algorithm Architecture RT-level Synthesis Abstraction Levels architecture Gates Layout
Time-to-Market • Raising the abstraction level • Code compactness • Algorithmic description FIR filter • 100 lines of C code • RT-level description FIR filter • 5,200 lines of HDL • “Blackbox” • Better simulation performance • Easier design transfer and re-use BEHAVIOR ( C SUBSET ) RT-LEVEL
Flexibility • Optimal area for application • Low-power design • More processing power/throughput • Same starting point: • FPGA • ASIC $ : cheaper custom solution BEHAVIOR ( C SUBSET ) RT-LEVEL
# gates # gates 8000 80 70 4000 before after before after Flexibility: Example Behavioral synthesis RT-level synthesis Reduction:50% Reduction:13%
Application Area • Data path elements are shared over clock cycles • Moderate decision making is involved Controller FSM Control/ Flags Control Data Path Cores Register Files RAM/ROM Addr/Data Regs Address/ Data
Typical Applications • ASSP: Application Specific Standard Product • Relatively complex data/signal processing • GSM, DECT, wireless LAN • Speech recognition, compression, processing • JPEG, image processing • Portable medical electronics • ...
Design Constraints • Design considerations: • Algorithm level • Frame rate • Frame = 1 execution of your algorithm • 1 frame consumes 1 value for each input, produces 1 value for each output • e.g. GSM LTP: 1 data frame (160 samples) every 20 ms • Maximal latency = delay on signal caused by the algorithm • RT-level • Clock rate • e.g. 50 MHz clock • Cycle budget = Clock rate / Frame rate • The amount of clock cycles available to execute one frame • e.g. for GSM LTP: 4000 cycles
Target Processor Architecture branch logic ALU MULT IN OUT RAM ROM
System Specification Embedded Software Datapath Resources (arithmetic, memory) Legacy HDL Vendor HDL HW Resource Library ANSI C HW Resource Library HW Resource Library Create Architecture Edit/Compile Map to Architecture Source Code Tuning Architecture Optimization Schedule Operations Performance Analysis Build RTL code Logic Synthesis FPGA ASIC Internal Design Flow
C Resource Libraries 1. 2. Architecture Creation pragmas Compilation Central Data Structure 3. 5. pragmas pragmas Mapping 4. Building Scheduling VHDL Verilog pragmas Internal Design Flow(2)
Defaults, Options and Pragmas • Increasing order of priority: • Tool defaults • Option settings (if any) • Pragmas for specific cases
Hardware Libraries • Default library • Supplied by Frontier • Two versions: - for Xilinx FPGA flow - for ASIC flow • Sufficient to map all supported C operators • User libraries • Existing hardware blocks • Custom hardware blocks for better speed/area/power trade-off
Project organization artd_cache
Key Concepts Create Architecture Map to Architecture Schedule Operations Build the RT-Level Verify the Design Create and Use a User Library Supported C subset Edit and Compile Source
Key Concept • In a first step, A|RT Designer will convert in anintelligent way your behavior description of your algorithm into an internal representation. intelligent -> it checks whether the code is C/C++ compliant, if there are non-synthesizable constructs present • You can describe your algorithm using C/C++ optionally enriched by A|RT Library fixed-point types in C-style or SystemC-style. • To use A|RT Library types: #include <fxp.h>; /* C/C++ version*/ #include <sc_fxp.h>; /*SystemC version*/
C Compiler optimizations Dead code elimination • Constant propagation • only for temporary expressions with constants • b = a + 2 + 3 => b = a + 5
Name of the function to be compiled Default= last function in C source C Compiler Options (1) • Specification of the include search path • Multiple entries are separated by semicolon • Specification is relative to project subdirectory Example: /home/john/include;..;$MY_INCLUDES/include Macros to be defined/undefined Semicolon separated Example for Defines: FXPTRACE;MY_DEFINE=1 • Enables C test bench generation • I/O can be read in binary or decimal format Saves the source file obtained after CPP processing Enables strict ANSI C compliance
C Compiler Options (2) • Data flow analysis identifies and accurately represents the parallelism of the C-code by - determining the exact data - dependencies between the variables to achieve : - better performance - optimal use of target processor
Data Flow Analysis void calc_address(const T_AD i, const T_AD j, T_AD& address) { address = const1*i + const2*j; } void mydesign(…) { ... for (i=0; i<16; i++) { for (j=0; j<16; j++) { calc_address(i,j,address); a = A[address]; ….. // calculation of b A[address] = b; } } } DFA will check whether or not write address is different from read address for every iteration! This will determine how much loop folding can be performed.
void array (const Int<16> in[4], Int<16> out1[4], Int<16> out2[4] ) { #ifdef __SYNTHESIS__ #pragma OUT out1out2 #endif for(i=0;i<4;i++){ out1[i]=in[i]-i; out2[i]=in[i]+i; } } void addsub (const Int<8> a, const Int<8> b, Int<8>& c, Int<8>& d) { #ifdef __SYNTHESIS__ #pragma OUT c d #endif c=a+b; d=a-b; } Pragmas in C Source • #pragma OUT <var_name_1> <var_name_2> … • Used to indicate function arguments that are strictly outputs • This is not checked by the compiler ! Example:
Key Concepts Edit and Compile Source Map to Architecture Schedule Operations Build the RT-Level Verify the Design Create and Use a User Library Supported C subset Create Architecture
Key Concept • In this step, you instantiate the hardware resources that you need to define the target architecture you want to use • You only have to instantiate the central elements of hardware clusters(auxiliary resources like register files, muxes and tristate buffers are automatically generated at a later step) : • Cores (ALU, MULT, …) • Memories (RAM, ROM, …) • Ports (INPORT, OUTPORT) • You also instantiate one type of controller
Instantiating Resources • Resources can be instantiated from: • The default library: artd_library (for ASIC flow) or artd_xilinx_library( for Xilinx FPGA flow) • A user library • The libraries must have been selected in the Create Architecture options :
Resources in the Default Library (1) • Cores • alu, alusat, • mult, multp, mac2, mac3 • acu • Memories • rom, ram • romctrl • dpram_r_w, dpram_r_rw, dpram_w_rw, dpram_rw_rw • dprom, dpromctrl
Resources in the Default Library(2) • Ports • inport, inport_nohs, inport_noaddr, inport_noaddr_nohs • outport, outport_nohs , outport_noaddr, outport_noaddr_nohs • Controllers • mbc_11, mbc_12, mbc_22, mbc_23
Pragma Syntax Table • I : integer (e.g. 10) • IL: integerlist (e.g. [10,20,6] ) • IW: integer or wildcard (e.g. 10 or * or _) • C : quoted string (e.g. "acu") • CL: quoted stringlist (e.g. ["in1:8","in2:10"]) • EXPR : expression (e.g. _*_)
Pragmas (1) • instantiate(C, C, C); • instantiate(“libraryName”, “resourceName”, “instanceName”); • This pragma instantiates a resource defined in a library • The default library is called artd_library or artd_xilinx_library • Multiple instances of the same resource can be created • EXAMPLE: • instantiate("artd_xilinx_library","multp","multp_1"); • instantiate("artd_library","mbc_12","ctrl"); • instantiate(”my_own_library",”multiplier",”mymult");
Pragmas (2) • instantiate_function(C, C); • instantiate_function(“functionName”, “instanceName”); • This pragma instantiates a virtual resource, not defined in a library • All calls to the named function will be mapped on this virtual resource as single-cycle operations • Only a single function can be associated with a virtual resource • Allows design exploration without actually having to create a library element • EXAMPLE: • instantiate_function(”cordic",”cordic_1");
reg_d reg_a reg_dx reg_dz reg_d reg_dz Pragmas (3) • merge_regfiles(CL, C); • merge_regfiles ([“registerfileName”], “newRegisterfileName”); • Merge a list of register files into a new register file with the specified name • May lead to less registers but possibly a longer schedule • EXAMPLE : • merge_regfiles(["reg_a_ram_1","reg_dx_acu_1"], ”addr_reg"); ram_1 ram_1 addr_reg acu_1 acu_1
Pragmas (4) • set_regfileports(C,[IN,OUT], I); • set_regfileports(“regFileName”,IN|OUT, nrports); • This pragma allows you to generate multiport register files • This pragma overrules the default register file settings of one input port and one output port • EXAMPLE : • set_regfileports(”merged_reg",IN,2); • set_regfileports(”merged_reg",OUT,2); This will result in a multiport register file called “merged_reg’ with two input ports and two output ports
Pragmas (5) • connect_bus(C, CL, CL); • Connect_bus(“busName”, [“writer”],[ “reader”]); • Allows you to define a bus and its connctions. • With this pragma you can restrict resources from writing to specific busses or you can merge a number of busses into one single bus. • By using multiple connect_bus pragmas you can define partial or a complete busnetwork. The outport of a resource that still has no bus connection after the last connect_bus pragma will automatically receive a private bus. • EXAMPLE : • connect_bus( “ram2_bus”,[“acu_2:dout”],[‘reg_a_ram_2:d0”,’reg_dx_acu_2:d0”]); Defines a bus called ‘ram2_bus” that is written to by the output of acu_2 and read by the address port of ram_2 and the first input port of acu_2
Pragmas (6) • no_connection(C, CL); • No_connection(“writer”,[ “reader”]); • With this pragma you can restrict connections between one output of a resource (defined by the first argument!) and a list of inputs. • EXAMPLE : • no_connection( “romctrl_1:dout”,[‘reg_a_ram_2:d0”,”reg_dx_acu_2:d0”]); Using this pragma, no connection will be present between the output of romctrl_1 and the address register of ram_2 and the first input of acu_2
Default Architecture • The following resources from the (ASIC)default library are automatically instantiated when a new project is created: • alu, mult • acu • romctrl • ram, rom • inport, outport • mbc_23
Example Pragma File //INPORT and OUTPORT without address generation instantiate("artd_library","inport_noaddr","inport_1"); instantiate("artd_library","outport_noaddr","outport_1"); //ACU and ROMCTRL for RAM and ROM addressing instantiate("artd_library","acu","acu_ram"); instantiate("artd_library","acu","acu_rom"); instantiate("artd_library","romctrl","romctrl_ram"); instantiate("artd_library","romctrl","romctrl_rom"); //Cores and Memories instantiate("my_library","mac","my_mac"); instantiate("artd_library","rom","rom_1"); instantiate("artd_library","ram","ram_1"); //Controller instantiate("artd_library","mbc_23","ctrl"); //dedicate address generation cluster connect_bus(“bus_romctrl_rom”,[“romctrl_rom:dout”],[“reg_*_acu_rom:d0”]); connect_bus (“bus_dout_acu_rom”,[“acu_rom:dout”],[“reg_*_acu_rom:d0”,” reg_a_acu_rom:d0”]); no_connection(“acu_ram:dout”,[”reg_a_rom_1:*”]); no_connection(“romctrl_ram:dout”,[“reg_*_acu_rom:*”,”reg_a_rom_1:*”]);
Views Architecture view:
views • Architecture view • Graphical representation of the selected architecture • In this view you can select and highlight individual components and resources. You can also jump to the architecture report for a detailed textual overview
Reports (1) Architecture Report :
Reports (2) • Architecture report • Lists all selected resource instances and its registers • Lists for each instance/register: • input ports and connected register files/muxes • output ports and connected buses • Resources from the default library are listed with unspecified types and with their complete instructionset • Resources from user libraries are listed with types and instruction list as specified in the library
Key Concepts Edit and Compile Source Create Architecture Schedule Operations Build the RT-Level Verify the Design Create and Use a User Library Supported C subset Map to Architecture
Key Concepts • In the mapping step following tasks are performed: • Memory management:variables and temporary variables (introduced by the compilation step) are allocated to the available memory resources • Core resource assignment:operations from the design are assigned to corresponding core resources and translated in RT’s(register transfers) • Multiplexer introduction:muxes are introduced if more than 1 bus is connected to input of a register or if 2 or more variables with different types are transferred to that input over a bus connected to it
Addressed by Controller Scalars RegFile ROMCTRL (Constants) Memory Management Access Speed Addressed by Data path RAM Arrays ROM INPORT/OUTPORT Area per Memory Location
Core resource assignment • Resource assignment is completely detemined by a set of internal mapping rules and by user pragmas. • The rules are divided in two groups: • First set applies to the mapping of the core resources in the default library. This set of rules are transparent for the user but not accessable • The second set apply to the mapping of operations on user-defined resources and are an essential part of the pragmas of the corresponding user-defined library
Mapping rules • Operations or instructions on resources from the standard library are handled as taking one clock cycle. Exception: MAC (has a pipeline register) • By default, operations and implicit operations are mapped to the first instance of a resource that can execute the operation • First means: first instantiated in pragma file of previous step • Implicit operations: - ROM/RAM addressing: Initialize address, compute next address - FOR loops: Initialize loop counter, update, test - Implicit constants for all instances
Multiplexer introduction • In a last stage of the mapping step, muxes are introduced were needed. • Their function is threefold: bus selection data alignment type manupilation: performed by coding cast operations