1.39k likes | 1.63k Views
ENG6530 Reconfigurable Computing Systems. High Level Languages “Electronic System Level (ESL) Design ”. Topics. Issues with Reconfigurable Computing Complexity of Applications … Complexity of the Design Cycle Electronic System Level (ESL) Motivation, Why? Advantages/Disadvantages
E N D
ENG6530 Reconfigurable Computing Systems High Level Languages “Electronic System Level (ESL) Design”
Topics • Issues with Reconfigurable Computing • Complexity of Applications … • Complexity of the Design Cycle • Electronic System Level (ESL) • Motivation, Why? • Advantages/Disadvantages • Summary ENG6530 RCS
References • “Reconfigurable Computing: The Theory & Practice of FPGA Based Computing”, by S. Hauck and A. Dehon, 2008. • “Leading Languages: Is There a Future Beyond RTL”, FPGA Journal 2005. • “The Challenges of Synthesizing Hardware from C-like Languages”, by Stephen Edwards. • “Design of a high-level language for Custom Computing Machines”, C. Van Reeuwijk, 2002. • “Comparison of VHDL, Verliog and SystemVerliog”, Stephen Bailey, Model Technology. • http://www.SystemC.org (System-C) • http://www.celoxica.com (Handel-C) • http://www.mentor.com (Catapult-C) • http://www.xilinx.com (AutoESL) ENG6530 RCS
Key Markets for HPC How are we going to manage Design Complexity? ENG6530 RCS
Managing Complexity • One important practical approach to handle complexity is to raise the level of abstraction • We can take guidance from previous shifts in methodology which raised the level of abstraction - from schematics to HDLs - from assembler code to HLLs ENG6530 RCS
Complexity of Design Cycle ENG6530 RCS
Why do companies face these problems? • You can’t get your hardware designs done quickly enough • Designs are getting too complex to handle (SOC) • You haven’t enough experienced hardware designers • Errors in design or unimplemented features cost $ • ASICs and development tools costly $ • Software development stalls waiting for the hardware ENG6530 RCS
A need for a new design Language • Verilog and VHDL work very well for HW implementation flows but … • They are too complicated for casual use. • Systems are becoming more complex, pushing us to design and verify at higher levels of abstraction. • Designers often implement today’s systems as a mix of hardware and software (which should be Hw/Sw?) • It is essential that new design flows support early software development, integration with existing C/C++ code, and HW/SW co-design. Using a single language like C simplifies migration task! • If we synthesize hardware from C like languages we can effectively turn every C programmer into a hardware designer!! ENG6530 RCS
High-level Synthesis • Wouldn’t it be nice to write high-level code? • Ratio of C to VHDL developers (10000:1 ?) • + Easier to specify • + Separates function from architecture • + More portable • - Hardware potentially slower • Similar to assembly code era • Programmers could always beat compiler • But, no longer the case • Hopefully, high-level synthesis will catch up to manual effort
Abstraction: Advantages ENG6530 RCS
Why not a Software Language for Design Entry?? • The semantics of “C” and similar languages are distant enough from hardware (Execution Models!!) • Software follows a sequential model • Hardware is fundamentally concurrent. • C language has no support for user specified parallelism • So either the synthesis tool must find it a difficult task • Or the designer must use language extensions and insert explicit parallelism (programmer will have to think differently to design hardware). • Techniques for synthesizing hardware from C either • generate inefficient hardware or • propose a language that merely adopts parts of C syntax. ENG6530 RCS
Advantages of HLLs for Hardware Design • Designs are often specified by a C/C++ executable • Some problems are better expressed as a software algorithm • Software Reference designs can be utilized • Enables much higher speed verification • Faster Simulation at architecture level than gate level • Reduce Risks by enabling early verification of the entire system. • Software development techniques can be used • Simplifies hardware-software partitioning • Brings hardware and software teams closer together ENG6530 RCS
Requirements for New Language? • Don’t invent a new language! Build on C/C++ so that: • Extensive C/C++ infrastructure (compilers, debuggers, language standards, books, e.t.c.) can be re-used. • Users’ existing knowledge of C/C++ can be leveraged. • Integration with existing C/C++ code is easy • It must support specification and refinementto detailed implementation of both software and hardware. • It must support verification through all stages of the design process. • It must provide a very general set of modeling constructs to cleanly support the wide range of abstraction levels and models of computation used in system design. ENG6530 RCS
Semiconductor Design 1970’s 1980’s 1990’s 2000’s Hand Crafted Schematic Capture VHDL / Verilog System Level Design In house Cut rubies (manual) Daisy Mentor Valid Calma Internal Synopsys Cadence Mentor Dracula Cadence Avant! FRONT END BACK END Handel-C SystemC SystemVerilog CatapultC ImpulseC ENG6530 RCS AutoESL
Ease of Use vs. Efficiency VHDL Verilog SystemVerilog Vivado HLS SystemC Handel-C Low Efficiency High CatapultC ImpulseC Easy Ease of Use Difficult ENG6530 RCS
Contrasting ESLs Software Hardware Mitrion C VIVA Impulse-C Handel-C HDL SystemC VIVADO HLS Explicit Par Statements Memory Statements Channels, … Pure C/C++ statements with Pragmas inserted
Specification Model Software Model Design HW SW C/C++ Testbench AL C for HW CA COMMS BSP BSP C to FPGA Accelerated System Function & Architecture Algorithm Design System Model Partitioning Architecture Exploration API’s/Libraries Mixed Simulation Design Analysis Optimization C-Based Synthesis Implementation EDIF RTL OBJ Synthesis P&R FPGA Processor
Defense & Security Consumer Automotive & Industrial Commercial RC Applications …using C-based design • Well established in embedded systems: • Digital Video Technology and Image Processing • “PROCESSING AT THE SENSOR” versus local and/or remote processing • 3D LCD display development and test • Real-time verification of HDTV image processing algorithms • Robust image matching - product tracking and production line control • Digital Signal Processing • Engine control unit for 3-phase motors • Radar and sonar beam forming and spatial filtering • Computer aided tomography security system • Communications and Networking • Internet reconfigurable multimedia terminal, MP3, VoIP etc. • Ground traffic simulation test bed for broadband satellite network communications • Satellite based Internet data tracking system • Rapid Systems Prototyping • Automotive safety system incorporating sensor fusion • Robotic vision system for object detection and robot guidance
Summary • Systems are too complicated today to rely on Hardware Descriptive Languages such as VHDL or Verilog. • New Languages have emerged such as SystemC, Handel-C, CatapultC, ImpulseC, … • Some of these languages are • Suitable for system verification (speedup the simulation of the system). • Suitable for synthesis • Suitable for architecture exploration • Suitable for Hardware/Software Co-design • Challenges: • Efficiency of synthesizers (Performance, Area, Power) • Learning curve ENG6530 RCS
ENG6530 Reconfigurable Computing Systems High Level Synthesis ENG6530 RCS
CAD for FPGAs: Synthesis Design Entry Synthesis Logic Optimization Placement Packing LUTs to CLBs Mapping to k-LUT Routing Simulation Configure an FPGA ENG6530 RCS
FPGA Processor FPGA Tool Flow with ESL C/C++, Java, etc. High-level Synthesis HDL RT Synthesis Technology Mapping Netlist Placement Physical Design Routing Bitfile
WHILE G < K LOOP F := E*(A+B); G := (A+B)*(C+D); END LOOP; Algorithm Controller PLA Latches High Level Synthesis Library + - Constraints Area Time: Clock Period Nr. of clock steps Power * < Datapath K X < A C B D E Y + * F G
High-level Synthesis • First, consider how to manually convert high-level code into circuit • Steps • 1) Build FSM for controller • 2) Build datapath based on FSM acc = 0; for (i=0; i < 128; i++) acc += a[i];
Manual Example • Build a FSM (controller) • Decompose code into states acc = 0; for (i=0; i < 128; i++) acc += a[i]; acc=0, i = 0 if (i < 128) Done load a[i] acc += a[i] i++
Manual Example • Build a datapath • Allocate resources for each state acc=0, i = 0 if (i < 128) a[i] Done addr acc i load a[i] 1 128 1 acc += a[i] + + < + i++ acc = 0; for (i=0; i < 128; i++) acc += a[i];
Manual Example • Build a datapath • Determine register inputs In from memory acc=0, i = 0 &a 0 0 if (i < 128) 2x1 2x1 2x1 a[i] Done addr acc i load a[i] 1 128 1 acc += a[i] + + < + i++ acc = 0; for (i=0; i < 128; i++) acc += a[i];
Manual Example • Build a datapath • Add outputs In from memory acc=0, i = 0 &a 0 0 if (i < 128) 2x1 2x1 2x1 a[i] Done addr acc i load a[i] 1 128 1 acc += a[i] + + < + i++ acc = 0; for (i=0; i < 128; i++) acc += a[i]; acc Memory address
Manual Example • Build a datapath • Add control signals In from memory acc=0, i = 0 &a 0 0 if (i < 128) 2x1 2x1 2x1 a[i] Done addr acc i load a[i] 1 128 1 acc += a[i] + + < + i++ acc = 0; for (i=0; i < 128; i++) acc += a[i]; acc Memory address
Manual Example • Combine controller+datapath In from memory Controller &a 0 0 2x1 2x1 2x1 a[i] addr acc i 1 128 1 + + < + acc = 0; for (i=0; i < 128; i++) acc += a[i]; Done Memory Read acc Memory address
Manual Example • Comparison with high-level synthesis • Determining when to perform each operation • => Scheduling • Allocating resource for each operation • => Resource allocation • Mapping operations onto resources • => Binding
ENG6530 RCS Behavioral Synthesis I/O Behavior Target Library Algorithm • Resource Allocation • Scheduling • Binding Behavioral Synthesis RTL Design Logic Synthesis Classic RTL Design Flow Gate level Netlist
HLS: Main Steps High-level Code Converts code to intermediate representation - allows all following steps to use language independent format. Front-end Syntactic Analysis Intermediate Representation Optimization Determines when each operation will execute, and resources used Scheduling/Resource Allocation Back-end Maps operations onto physical resources Binding/Resource Sharing Controller + Datapath
Intermediate Representation • Parser converts tokens to intermediate representation • Usually, an abstract syntax tree Assign x = 0; if (y < z) x = 1; d = 6; x if 0 assign cond assign y z < x d 1 6
Intermediate Representation • Why use intermediate representation? • Easier to analyze/optimize than source code • Theoretically can be used for all languages • Makes synthesis back end language independent Java Perl C Code Syntactic Analysis Syntactic Analysis Syntactic Analysis Intermediate Representation Scheduling, resource allocation, binding, independent of source language - sometimes optimizations too Back End
Scheduling • Scheduling assigns a start time to each operation in DFG • Start times must not violate dependencies in DFG • Start times must meet performance constraints • Alternatively, resource constraints • Performed on the DFG of each CFG node • => Can’t execute multiple CFG nodes in parallel
Scheduling Examples a b c d c d a b + Cycle1 Cycle1 Cycle2 + + + Cycle2 Cycle3 + + Cycle3 c d a b Cycle1 + + + Cycle2
Scheduling Problems • Several types of scheduling problems • Usually some combination of performance and resource constraints • Problems: • Unconstrained • Not very useful, every schedule is valid • Minimum latency • Latency constrained • Mininum-latency, resource constrained • i.e. find the schedule with the shortest latency, that uses less than a specified # of resources • NP-Complete • Mininum-resource, latency constrained • i.e. find the schedule that meets the latency constraint (which may be anything), and uses the minimum # of resources • NP-Complete
Minimum Latency Scheduling • ASAP (as soon as possible) algorithm • Find a candidate node • Candidate is a node whose predecessors have been scheduled and completed (or has no predecessors) • Schedule node one cycle later than max cycle of predecessor • Repeat until all nodes scheduled c d e a f b g h - < Cycle1 + + Cycle2 * Cycle3 * + Cycle4 Minimum possible latency - 4 cycles
Minimum Latency Scheduling • ALAP (as late as possible) algorithm • Run ASAP, get minimum latency L • Find a candidate • Candidate is node whose successors are scheduled (or has none) • Schedule node one cycle before min cycle of predecessor • Nodes with no successors scheduled to cycle L • Repeat until all nodes scheduled e f g h b c d a - Cycle1 < Cycle4 + + Cycle3 Cycle2 * Cycle3 * + Cycle4 L = 4 cycles
Minimum Latency Scheduling • ALAP (as late as possible) algorithm • Run ASAP, get minimum latency L • Find a candidate • Candidate is node whose successors are scheduled (or has none) • Schedule node one cycle before min cycle of predecessor • Nodes with no successors scheduled to cycle L • Repeat until all nodes scheduled c d e a f b g h Cycle1 + + Cycle2 * Cycle3 - * < + Cycle4 L = 4 cycles
Minimum Latency Scheduling • ALAP • Has to run ASAP first, seems pointless • But, many heuristics need the mobility/slack of each operation • ASAP gives the earliest possible time for an operation • ALAP gives the latest possible time for an operation • Slack = difference between earliest and latest possible schedule • Slack = 0 implies operation has to be done in the current scheduled cycle • The larger the slack, the more options a heuristic has to schedule the operation
Binding • During scheduling, we determined: • When ops will execute • How many resources are needed • We still need to decide which ops execute on which resources • => Binding • If multiple ops use the same resource • =>Resource Sharing
Binding • Basic Idea - Map operations onto resources such that operations in same cycle don’t use same resource 2 ALUs (+/-), 2 Multipliers + Cycle1 2 3 * + 1 4 - Cycle2 * 6 * 5 Cycle3 + 7 - Cycle4 8 ALU2 Mult2 ALU1 Mult1
Binding • Many possibilities • Bad binding may increase resources, require huge steering logic, reduce clock, etc. 2 ALUs (+/-), 2 Multipliers + Cycle1 2 3 * + 1 4 - Cycle2 * 6 * 5 Cycle3 + 7 - Cycle4 8 ALU2 Mult2 ALU1 Mult1
ENG6530 Reconfigurable Computing Systems Xilinx Vivado High Level Synthesis (HLS) Or AutoESL ENG6530 RCS
AutoESL or Vivado HLS High-Level Synthesis: HLS • High-Level Synthesis • Creates an RTL implementation from C level source code • Extracts control and dataflow from the source code • Implements the design based on defaults and user applied directives • Many implementation are possible from the same source description • Smaller designs, faster designs, optimal designs • Enables manual design exploration ……………… Test bench ……………… Constraints/ Directives C, C++, SystemC AutoESL ……………… RTL Wrapper ……………… Script with Constraints VHDL Verilog System C RTL Simulation RTL Synthesis ENG6530 RCS
Vivado HLS GUI Toolbar • The primary commands have toolbar buttons • Easy access for standard tasks • Button highlights when the option is available • E.g. cannot perform C/RTL simulation before synthesis Create a new Project Open Analysis Viewer Change Project Settings Compare Reports Open Reports Create a new Solution Export RTL Change Solution Settings Run C/RTL Cosimulation Run C Simulation Run C Synthesis ENG6530 RCS
Design Exploration with Directives Design Exploration with Directives … loop: for (i=3;i>=0;i--) { if (i==0) { acc+=x*c[0]; shift_reg[0]=x; } else { shift_reg[i]=shift_reg[i-1]; acc+=shift_reg[i]*c[i]; } } …. One body of code: Many hardware outcomes Before we get into details, let’s look under the hood …. • The same hardware is used for each iteration of the loop: • Small area • Long latency • Low throughput • Different hardware is used for each iteration of the loop: • Higher area • Short latency • Better throughput • Different iterations are executed concurrently: • Higher area • Short latency • Best throughput ENG6530 RCS
Analysis Perspective Analysis Perspective • Perspective for design analysis • Allows interactive analysis ENG6530 RCS