1.52k likes | 1.74k Views
Asynchronous Circuit Design GALS Systems Synchronous and GALS NoCs - DAAD Workshop, Nis, Serbia, July 2009 -. Dr. Miloš Krstić. Overview. Motivation Problems of the synchronous design Asynchronous circuit design GALS - State of the Art Synchronous and GALS NoCs. 2.
E N D
Asynchronous Circuit Design GALS SystemsSynchronous and GALS NoCs- DAAD Workshop, Nis, Serbia, July 2009 - Dr. Miloš Krstić
Overview • Motivation • Problems of the synchronous design • Asynchronous circuit design • GALS - State of the Art • Synchronous and GALS NoCs 2
Challenges with Synchronous Design • Most digital systems today operate synchronously. • However, the complexity of electronic systems grows enormously. 3
R1 R2 R4 Classical Synchronous Paradigm CLK • Usually digital circuits are designed to work synchronously R3 CL3 CL4 4
1 1 0 0 1 0 Synchronous communication • Clock edges determine the time instants where data must be sampled • Data wires may glitch between clock edges (setup/hold times must be satisfied) • Data are transmitted at a fixed rate - clock frequency 5
Problems with Synchronous Design • As clock speeds increase clock distribution becomes difficult: We need to minimize clock skew. There is some upper limit to clock speed that depends on the material properties of the device. It is not possible to propagate a signal from one side of the chip to the other side within the single clock cycle • Worst-case performance. • Sensitive to variations in Voltage, Temperature, Process. • Not modular (fixed clock rate: poor match for reusability of components). • Clock burns large fraction of chip power (~40-70%) • Synchronization failure. 6
What is Asynchronous Design ? (I) • Synchronization is achieved without a global clock. • Asynchronous Communication: Handshake mechanisms Sender request Receiver acknowledge data 7
R2 R4 REQ ACK EXAMPLE: DATA R1 R2 R3 R4 CL3 CL4 LINK / CHANNEL TOKEN FLOW What is Asynchronous Design ? (II) ACK CTL CTL CTL CTL CL3 CL4 REQ R1 R3 8
Asynchronous design styles (I) • Bundled data (Single Rail) 4 - phase protocol This style is very widely used because of very small and fast asynchronous controllers REQ ACK n DATA DATA SOME VARIATIONS REQ 4 PHASE PROTOCOL: ALWAYS LIKE THIS ACK 9
1 1 0 0 1 0 Bundled data • Validity signal • Similar to an aperiodic local clock • n-bit data communication requires n+1 wires • Data wires may glitch when no valid 10
Asynchronous design stiles (II) • Bundled data (Single Rail) 2 - phase protocol This style looks simpler and faster than 4-phase, but controllers are more complex REQ ACK n DATA DATA REQ 2 PHASE PROTOCOL ACK 11
VALUE d.t d.f EMPTY 0 0 VALID “0” 0 1 VALID “1” 1 0 Not used 1 1 Asynchronous design stiles (III) • 4-phase dual rail protocol Each data bit encoded into 2 wires Offers generation of Delay-Insensitive circuits Introduces very big area overhead ACK 2n DATA DATA EMPTY VALID EMPTY VALID EMPTY VALID ACK 0 E 1 12
1 1 1 0 0 0 Dual rail • Two wires per bit • “00” = spacer, “01” = 0, “10” = 1 13
Asynchronous modules • Signaling protocol:reqin+ start+ [computation] done+ reqout+ ackout+ ackin+reqin- start- [reset] done- reqout- ackout- ackin- DATA PATH Data IN Data OUT start done req in req out CONTROL ack in ack out 14
Asynchronous components • Asynchronous design require additional components and special logic • Such components are not available in standard synchronous design kit • Critical components are C-element and Mutex 15
Muller C-element A b z 0 0 0 0 1 no change 1 0 no change 1 1 1 16
Mutual Exclusion element • ME prevents multiple event propagation ME is used for arbitration 17
A.t C.t B.t A.f C.f B.f Dual-rail logic • Dual-rail logic require additional logic for each logical operation Dual-rail AND gate 18
C done Completion detection tree Completion detection (dual-rail) • • • • • • 19
Completion detection (bundled-data) logic • • • • • • Conventional logic + matched delay start done delay 20
Muller pipeline • The” delay-insensitive handshake machine • C[i] accepts 1/0 from C[i-1] only if C[i+1]=0/1 • Think of 1010101.. as waves: 10 10 10 1.. • The C-elements propagate waves precisely • Timing depends on local delays, may vary along the pipe • If RIGHT is quiet, pipe will fill and stall 21
Aout Ain C L L L L logic logic logic C C C Rin Rout delay delay delay Micropipelines (Sutherland 89) 22
E V V E E Abstract Pipeline • Bubbles • Tokens Valid (0 or 1, who cares) and Empty tokens 23
V V V E E E V V E E V E token bubble Abstract Rings • 3 stages, 1 bubble: • 3 steps for token round • 6 steps to cycle 24
Latch Source Sink Fork Join (wait for all) Merge (wait for one) 0 0 1 1 MUX DEMUX Function Block (Join; CL; Fork) Building Blocks 25
Describing Asynchronous Cirsuit - STGs A+ A B+ B A– A input B output B– 26
C Control specification – C element A+ B+ A C+ C B A- B- C- 27
Ro+ Ri+ Ri Ro FIFO cntrl Ao+ Ai+ Ao Ai Ro- Ri- C C Ai- Ao- Ri Ro Ao Ai Control specification – FIFO Controller 28
A simple filter: specification IN Ain Rin y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop filter Aout Rout OUT 29 J. Cortadella - Introduction to asynchronous circuit design: specification and synthesis
+ OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout A simple filter: block diagram • x and y are level-sensitive latches (transparent when R=1) • + is a bundled-data adder (matched delay between Ra and Aa) • Rin indicates the validity of IN • After Ain+ the environment is allowed to change IN • (Rout,Aout) control a level-sensitive latch at the output 30
+ OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout- Ra- Ry- Rx- Rin- Aout- Aa- Ay- Ax- Ain- A simple filter: control spec. 31
Rx Ax Aa Ry Ra Ay Aout C Ain Rout Rin Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout- Ra- Ry- Rx- Rin- Aout- Aa- Ay- Ax- Ain- A simple filter: control impl. 32
x’ z+ x- x y z’ z x+ y+ z- y- Taking delays into account • Delay assumptions: • Environment: 3 times units • Gates: 1 time unit events: x+ x’- y+ z+ z’- x- x’+ z- z’+ y- time: 3 4 5 6 7 9 10 12 13 14 33
z+ x- x+ y+ z- y- Taking delays into account x’ x y z’ z very slow Delay assumptions: unbounded delays events: x+ x’- y+ z+ x- x’+ y- failure ! time: 3 4 5 6 9 10 11 34
Gate vs wire delay models • Gate delay model: delays in gates, no delays in wires • Wire delay model: delays in gates and wires 35
DI Delay models for async. circuits • Bounded delays (BD): realistic for gates and wires. • Technology mapping is easy, verification is difficult • Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. • Technology mapping is more difficult, verification is easy • Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. • DI class (built out of basic gates) is almost empty • Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). • Formally, it is the same as speed independent • In practice, different synthesis strategies are used BD SI QDI 36
Desynchronization - concept • Start with synchronous design • Replace clock with local handshake • Use standard CAD tools • Does not change datapath • Guaranteed correctness 37 * Eyal Friedman, Desynchronization - From Synchronous to Asynchronous design, Seminar in VLSI Architecture, Technion, Israel, Spring 2008
Desynchronization - flow steps • Main assumptions: • Normal Combinatorial logic, DFF • single clock • single clock edge 38
Desynchronization flow step #1 • Replace DFF by M+S latches 39
Desynchronization flow step #2 • Add matched delays • Respect bundling assumption • Delay > Tpd of CL • Delay serves as completion signal 40
Desynchronization flow step #3 • Replace clock by local handshake controllers 41
Why Asynchronous Design? • We are used to sync design • Logic and timing assumptions are simpler, but not true in reality • Currently it is very hard to solve big problems of synchronous design like clock skew, big power consumption, process variability ... • Common arguments for asynchronous design: • Low power ? • High speed ? • Low emission ? • Low sensitivity to PVT (Process, Voltage, Temperature) variations ? • High modularity (SoC) ? • No clock distribution and timing problems (works) ? • Secure chips ? 42
Why not Asynchronous Design? • Overhead (area, speed, power) • Hard to design • Non-decomposable to small combinatorial logic blocks • Converting synchronous design to asynchronous typically fails • Few CAD tools • There is no real complete design-flow available • There is only one commercial async EDA vendor available (Handshake Solutions) with very specific design flow (HASTE) • Hard to test • Asynchronous test methods are not present yet (or not mature enough), and it is difficult to go into any production without proper testing 43
Available tools • There are several tools available for automation of Asynchronous Design • Mostly tools are developed at Universities • Two groups of tools: for synthesis of asynchronous controllers and for design of the systems • I group Minimalist Petrify 3D II group BALSA TAST HASTE 44
Minimalist • Developed at Columbia University • “burst-mode” synthesis package • based on synthesis of asynchronous FSMs • integrates synthesis, testability and verification tools • Good side Produce Hazard-free control circuits Contains several different algorithms for synthesis Can provide generalized C-element based mapping and also behavioral Verilog • Bad side Doesn’t support arbitration and EBM No optimal algorithm selection 45
Petrify • Designed by J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, A. Yakovlev • Synthesis of Asynchronous controllers defined as Petri Nets or Signal Transition Graphs (STG) • Good side Produce optimal Hazard-free control circuits Can provide generalized C-element based mapping, complex-gate mapping and mapping to the technology libraries • Bad side Supports only asynchronous design, not mixed sync-async With increased number of signals, synthesis time grows exponentially Suitable for relatively small controllers 46
3D • Produced by Kenneth Yun • “Extended Burst-Mode” synthesis package • Good side Produce Hazard-free control circuits Supports restricted multiple-input change (input burst) with don't-care inputs Supports input choices based on sampling possibly glitchy signals Suitable for mixed sync-async systems (like GALS) • Bad side No technology mapping No optimal algorithm selection No support and further development 47
TAST • Produced by TIMA Laboratory, France • TAST is compiler/synthesizer of Asynchronous digital circuits from high level communication description language Input is CHP language, which can describe Petri Nets. It is using VHDL as a format for behavioral and post synthesis simulation. Produces QDI (dual-rail, 1-M code rail) circuits • Good side Produces complete asynchronous system and provides full design-flow • Bad side Uses QDI style, which gives very big area overhead Gives not optimized output circuits Not available in the moment 48
BALSA • Produced by University of Manchester • BALSA is compiler/synthesizer of Asynchronous digital circuits from high level communication description language Input is BALSA language developed specially for this package Produces Bundled data, Dual-rail, 1-M code rail circuits • Good side Produces complete asynchronous system and provides full design-flow • Bad side Gives large overhead compared with manual design (up to 300 %) All tools are not freely available 50