540 likes | 723 Views
Some Trends in High-level Synthesis Research Tools. Tanguy Risset Compsys, Lip, ENS-Lyon http://www.ens-lyon.fr/COMPSYS. Outline. Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Some on-going projects. Context : Embedded Computing Systems design.
E N D
Some Trends in High-level Synthesis Research Tools Tanguy Risset Compsys, Lip, ENS-Lyon http://www.ens-lyon.fr/COMPSYS
Outline • Context: Why High level synthesis? • HLS Hard problems • Some solution in existing tools • Some on-going projects 2
Context: Embedded Computing Systems design • SoC or MPSoC for multimedia application will soon includes: • Network on chip • dozens of initiators (CPU, DMA,…) • Mbytes of code • Operating systems • Shared memory coherency protocols • … • SoC Design problems: • Time to market • Design space exploration • Software complexity 3
Some envisaged solutions • Time to market • IP re-use • High level design • Design space exploration • Fast prototyping and performance evaluation, refinement methodology (specification, algorithm, TLM, CABA) • Software complexity • Tools for embedded code generation/embedded OS • High level synthesis is only a small part of the « High level Design » process 4
Definition of High Level Synthesis • HLS: Generates register-transfer level description from behavioral specification, in an automatic or semi-automatic way. • Input: • A behavioral specification • Design constraints • Library of available RTL components • Output: • RTL description • Performance evaluations 5
SoC Intermediate Representation SoC platform design abstract architecture TransactionLevelModeling virtual prototype Refinement : from algorithm to hardware System application design • Matlab • C algorithmdomain algorithmic exploration IP block design ArchitectureDescriptionLanguage block specification block implementation block implementation • RTL Synthesis, VHDL, Verilog • RTL Synthesis 6
Abstraction levels for HLS • AL = Algorithm prior to HW/SW partition • TLM = Transaction-Level Model after HW/SW partition models bit-true behavior, register bank, data transfers, system synchronisation no timing needed • T-TLM = Timed TLM (also PVT) TLM + timing annotation refined communication model • CABA = Cycle Accurate-Bit Accurate models state at each clock edge • RT = Register Transfer (ASIC flow entry point) synthesisable model 7
Pro’s and Cons • « Traditional » motivations: • Fast design • Safe design : formal refinement approach • « Must be used » to cope with Moore’s law • But! • Commercial tools are not here • A new tool is a big investment • Designers have managed without it 8
New motivations ? • IP-reuse • Slightly change design parameter for re-using IP • New target technologies and languages (FPGA, SystemC, etc.) • Tools can easily re-target the designs • CAD tools companies are investing a lot in « high level-like » synthesis tools • Monet, Behavioural compiler, VCC, … • Technological advantage • Traditional RTL design will be de-localized to Asia 9
Outline • Context: Why High level synthesis? • HLS Hard problems • Some solution in existing tools • Some on-going projects 10
HLS Hard Problems • Huge design space • Complex design space exploration • Multi-criteria optimization techniques • Integration into a design environment • Lack of standard interchange format • SoC simulation time is a crucial issue • Acceptance by the designers • Find a language common to SoC designers and tools designer • Refinement technical problems • (detailed hereafter) 11
HLS technical problems • Compilation occurs when the target architecture is precisely known • In HLS, target architecture is only partially specified, Examples: • Data-flow architecture/systolic arrays : pure RTL description • FSM+data path : closer to processor description • HLS technical problems : • Initial specification format / language • Specification refinement : fixed point arithmetic • Scheduling/Mappingrefinement:resource constraints • Technological Mapping refinement 12
Initial specification format • Restriction on the input language expressivity are necessary • … but designers hate new languages • C-like language (handel-C, silicon-C,hardware-C, etc…) are actually hardware description languages • Main problems: • How to express parallelism/sequentially • Data-flow, CSP-like, process network, event-driven • How to express both algorithmic and RTL description • How much expressivity • Dynamic control, loops • How to introduce constraints/hints 13
Fixed point arithmetic • Problem: translate a floating point computation to fixed point computation • Most of the tools start with an initial fixed point specification found by extensive simulation. • Automatic techniques are not handling loops • In the case of signal processing application the signal processing theory can help (transfer function used to compute signal-to-noise ratio). 14
Scheduling/Mapping • For a « basic bloc », resource constraints scheduling is NP-Hard, but widely studied. • Computations • Currently, two way to handle loops: • Unroll them • Keep them sequential • Other solutions: • Use software pipelining theory • Use the polyhedral model • Memory and communication • Memory mapping is usually strongly guided by the user • Highly active research field (Catthoor, Darte) • Communication refinement is also an important issue • Highly dependent on the chosen computation model (Gajski, Kenhuis) 15
Technological mapping refinement • Fine technological mapping are very target-dependent • Predefined libraries are not precise enough • Delays on wires • Power consumption • VLSI designers « tricks » are difficult to integrate in tools • Sub-Micronics technologies constraints are changing too fast for high level tools • Cross talk • Capacitance 16
Outline • Context: Why High level synthesis? • HLS Hard problems • Some solution in existing tools • Some on-going projects 17
Some solution in existing tools • Digital signal processing circuits: • Gaut: http://lester.univ-ubs.fr:8080 • Source: signal processing (one infinite loop) • Target: RTL + FSM • FSM+datapath • Ugh: http://www-asim.lip6.fr/recherche/disydent/ • Source: restricted C • Target: FSM+data path • Regular computation and polyhedral Model • MMAlpha: http://www.irisa.fr/cosi/ALPHA/ • Source : functional specification • Systolic like architectures 18
GAUT:Génération Automatic d’Unité de Traitement • Developed first at LASTI (Lannion) and then LESTER (Lorient): free • Generate RTL description from behavioral description for signal processing algorithm • Kernel technology: highly optimized ressource constraint scheduling • Inputs are • a behavioral VHDL description (one process repeated infinitely) • Libraries of operators pre-characterized • Some design constraints • Outputs are • a synthesizable RTL VHDL description (data path, memory, and communication units) • Gantt chart for I/O specification 19
Gaut design flow Compiling -analyzing -loop unrolling Behavioral description VHDL Operator library .src .lib .gc graph User constraints: Latency, clock frequency Operators, Alloc,etc. Synthesis -selection -Scheduling Mapping .vhd RTL description (data path+control) Memory and IO specifications .mem 20
Gaut : VHDL Input code • Sequential instruction in one single process (no clock, no reset, no sensitivity list) ENTITY fir IS PORT (xn:IN INTEGER; yn:OUT INTEGER); END fir; ARCHITECTURE behavioral OF fir IS ... BEGIN PROCESS VARIABLE H,x: vecteur; VARIABLE tmp: INTEGER; VARIABLE i: CONTROL; BEGIN tmp := xn * H(0); FOR i IN 1 TO N-1 LOOP tmp := tmp + x(i) * H(i); END LOOP; yn <= tmp; FOR i IN N-1 DOWNTO 2 LOOP x(i) := x(i-1); END LOOP; x(1) := xn; WAIT FOR cadence; END PROCESS; END behavioral; 21
Gaut : Input code • Types • Bit, boolean, std_logic, Integer (single size), Bit_Vector, Std_Logic_Vector • Arrays (to be inlined) • Sequential instructions • Signal and variables assignment • Only one level of if • For and While loops (to be inlined) • Procedure calls (to be inlined) • Function calls corresponding to library elements 22
Gaut step1: Source code transformation • Control dependence elimination • Loop unrolling y ( 0 ) := x ( 0 ) * h ( 0 ) ; y ( 0 ) := x ( 0 ) * h ( 0 ) ; for i in 1 to n - 1 loop y ( 1 ) := y ( 1 - 1 ) + x ( 1 ) * h ( 1 ); y ( i ) := y ( i - 1 ) + x ( i ) * h ( i ) ; y ( 2 ) := y ( 2 - 1 ) + x ( 2 ) * h ( 2 ) ; end loop ; y ( 3 ) := y ( 3 - 1 ) + x ( 3 ) * h ( 3 ) ; • Procedure inlining • Static single assignment b := x + z ; b := x + z ; a := b + c ; a := b + c ; b := e + f ; b0001 := e + f ; y := b; y := b0001; 23
Gaut step1: Source code transformation • Simple expression generation b := x + z * u ; tmp := z * u ; b := x + tmp ; • Constant propagation • Generation of GC Graph (Data-Flow Graph Format of Synchronous Programming) 24
GAUT step 2: Scheduling/Mapping • In addition to throughputand clock cycle, the user can give: • Ressource constraints and mapping constraints • Memory constraints • I/O constraints • Optimization type • The result is an architecture and a GANTT charts • For computations • For I/O • For memory 25
I/O Communication unit Control ASIC Datapath Memory unit Gaut step 3: memory and communication synthesis • Optimizing memory layout and minimizing buses 27
Gaut: summary • Advantages • Advanced development status (still research tool) • User guided synthesis • Open library • Active research team: memory optimization, communication synthesis • Drawbacks • Loop flattening (complexity problem) • Predefined timing characteristics • Hard to get out of 1D signal processing 28
Ugh: User Guided High Level Synthesis • Developed at LIP6 (Paris), as part of the Disydent project (Digital System Design Environment): open source • Behavioral level synthesis tool for control dominated coprocessor • Emphasis on precise timing estimation • Kernel technology: ressource constraint scheduling and (GNU-like) compiler construction technology • Inputs are • a C or VHDL behavioral description with KPN communication primitives • a draft data-path • a cycle time constraint TC • Outputs are • a synthesizable RTL VHDL model • a cycle accurate simulation model 29
Coprocessor System Environment R3000 Processor ICache DCache Bus PI-BUS Controller unit M/S Interface RAM Coprocessor 30
UGH Structure Depends on the Synthesis tool Cell Library (Synopsys) Ugh C Draft Data-Path Synthesis + Timing Characterization Annotations CK VHDL Data-Path Fine grain scheduler UGH-CGS UGH-FGS VHDL Coarse grain scheduler FSM/C VHDL Caba simulation Data-Path + FSM Model 31
Input 1 : UGH-C C Description #include <ughc.h> ugh_inChannel32 work2hcfa; ugh_inChannel32 work2hcfb; ugh_outChannel32 hcf2work; uint32 a,b; void hcf(void) { while (a != b) if (a < b) b = b - a; else a = a - b; } int ugh_main() { while (1) { channelRead(work2hcfa,&a); channelRead(work2hcfb,&b); hcf(); channelWrite(hcf2work,&a); } } • Library IEEE; • Use ieee.std_logic_arith.all; • entity HCF is • port (CK : in bit; • DINA : in integer; • READA : out bit; • ROKA : in bit; • DINB : in integer; • READA : out bit; • ROKA : in bit; • DOUT : out integer; • WRITE : out bit; • WOK : int bit); • end HCF; 32
Input 2 : Draft Data-path a model Hcf(sofifo hcf2work; sififo work2hcfa, work2hcfa) { DFFl a, b; SUB subst; subst.A = a.Q, b.Q; subst.B = a.Q, b.Q; a.D = subst.S, work2hcfa; b.D = subst.S, work2hcfb; hcf2work= subst.S; } D Q A work2hcfa Subst S hcf2work b B D Q work2hcfb 33
OUTPUT 2 : FSM for control RESET RESET RESET START READY WHILE START IF ROKA READA ROKA ROKB ROKB READB WOK S1 S2 WRITE WOK 35
Ugh summary • Advantages • Precise timing information • Multi cycle operation • Almost a compiler approach (restricted target architecture) • Interfacing (Integrated in a SoC design environment) • Drawbacks • Development status (research tool) • Low level information given by the user • Highly dependent on commercial tool (synopsys) • Dedicated to control oriented applications 36
MMAlpha • Developed in Irisa (Rennes): open source • High level synthesis of highly pipelined accelerators • Kernel technology: polyhedral model and systolic design methodology • Emphasis on loop transformations • Input : • functional specification (Alpha langage) • Output : • RTL description of systolic-like architecture (Alpha or VHDL) 37
VHDL VHDL Alpha C C C MMAlpha design flow FPGA Uniformization bus Scheduling host RTL derivation For i=1:1:N For j=1:1:N C 38
What is polyhedral model? • Abstract a loop nest by the polyhedron described by the loop indices during execution of the loop • Can be used for any index-based structure : memory (arrays), communications (accesses), etc… • example: convolution (FIR filter) for (i=N; i<=M; i++) { y(i)=0; for (n =0; n<=N-1; j++)) { y(i)=y(i)+H(n)x(i-n) }} 39
H(0) FIR: iteration space y(N+1) y(N) n H(N-1) i 0 0 x(N) x(N+1) 40
H(0) FIR polyhedral representation (MMAlpha input language) y(N+1) y(N) n H(N-1) i 0 0 x(N) x(N+1) 41
H(0) MMAlpha polyhedral scheduling y(N+1) y(N) n H(N-1) i 0 0 6 t=4 5 x(N) x(N+1) 42
H(N-1) H(0) MMAlpha space time transformation p y(N) t 0 0 6 5 t=4 x(N) x(N+1) 43
H(0) MMAlpha mapping p y(N) y H(N-1) H t i 0 0 0 6 5 t=4 x(N) x(N+1) x 44
MMAlpha current features • Tool box for designers: • Powerful analyze tools • Pipelining, Change of basis, multi-dimensionnal scheduling, control signal generation. • Code generation (C, VHDL) • Hierarchical design methodology • Work in progress: • Ressource constraint scheduling (extention to Z-polyhedra) • Multi-dimensionnal scheduling and memory synthesys 46
MMAlpha summary • Advantages • Design tool integrating loop transformation • Parameterised design (N: size of the filter not fixed until VHDL generation) • Formal approach for refinement (functional to operational) • A real language that syntactically captures HLS input restriction • Drawbacks • Does not yet handle resource constraints • A language (Alpha) and design methodology very different from designer’s habits • Implementation status (research tool) 47
Some Design results • Ugh compares IDCT with CoWare and Gaut but the results are highly dependent upon design parameters • MMAlpha demonstrates real implementation on FPGA co-processor board (DLMS algorithm) 48
Outline • Context: Why High level synthesis? • HLS Hard problems • Some solution in existing tools • Conclusion and on-going projects 49
HLS conclusion • HLS tools are not mature enough to produce the famous « C-to-VHDL » magic tool • Most tool designer agree that a highly « user guided » approach is mandatory • CAD tools are still actively developping tools (Mentor: Catapult-C, CoWare: Cocentric….) • Some progress have been made • Domain specific constraints are more clearly identified (control oriented or data flow) • Interfacing is studied together with the synthesis • Fast simulation is an important issue addressed by HLS tools 50