1 / 151

Dr. Miloš Krstić

Asynchronous Circuit Design GALS Systems Synchronous and GALS NoCs - DAAD Workshop, Nis, Serbia, July 2009 -. Dr. Miloš Krstić. Overview. Motivation Problems of the synchronous design Asynchronous circuit design GALS - State of the Art Synchronous and GALS NoCs. 2.

aquarius
Download Presentation

Dr. Miloš Krstić

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asynchronous Circuit Design GALS SystemsSynchronous and GALS NoCs- DAAD Workshop, Nis, Serbia, July 2009 - Dr. Miloš Krstić

  2. Overview • Motivation • Problems of the synchronous design • Asynchronous circuit design • GALS - State of the Art • Synchronous and GALS NoCs 2

  3. Challenges with Synchronous Design • Most digital systems today operate synchronously. • However, the complexity of electronic systems grows enormously. 3

  4. R1 R2 R4 Classical Synchronous Paradigm CLK • Usually digital circuits are designed to work synchronously R3 CL3 CL4 4

  5. 1 1 0 0 1 0 Synchronous communication • Clock edges determine the time instants where data must be sampled • Data wires may glitch between clock edges (setup/hold times must be satisfied) • Data are transmitted at a fixed rate - clock frequency 5

  6. Problems with Synchronous Design • As clock speeds increase clock distribution becomes difficult: We need to minimize clock skew. There is some upper limit to clock speed that depends on the material properties of the device. It is not possible to propagate a signal from one side of the chip to the other side within the single clock cycle • Worst-case performance. • Sensitive to variations in Voltage, Temperature, Process. • Not modular (fixed clock rate: poor match for reusability of components). • Clock burns large fraction of chip power (~40-70%) • Synchronization failure. 6

  7. What is Asynchronous Design ? (I) • Synchronization is achieved without a global clock. • Asynchronous Communication: Handshake mechanisms Sender request Receiver acknowledge data 7

  8. R2 R4 REQ ACK EXAMPLE: DATA R1 R2 R3 R4 CL3 CL4 LINK / CHANNEL TOKEN FLOW What is Asynchronous Design ? (II) ACK CTL CTL CTL CTL CL3 CL4 REQ R1 R3 8

  9. Asynchronous design styles (I) • Bundled data (Single Rail) 4 - phase protocol This style is very widely used because of very small and fast asynchronous controllers REQ ACK n DATA DATA SOME VARIATIONS REQ 4 PHASE PROTOCOL: ALWAYS LIKE THIS ACK 9

  10. 1 1 0 0 1 0 Bundled data • Validity signal • Similar to an aperiodic local clock • n-bit data communication requires n+1 wires • Data wires may glitch when no valid 10

  11. Asynchronous design stiles (II) • Bundled data (Single Rail) 2 - phase protocol This style looks simpler and faster than 4-phase, but controllers are more complex REQ ACK n DATA DATA REQ 2 PHASE PROTOCOL ACK 11

  12. VALUE d.t d.f EMPTY 0 0 VALID “0” 0 1 VALID “1” 1 0 Not used 1 1 Asynchronous design stiles (III) • 4-phase dual rail protocol Each data bit encoded into 2 wires Offers generation of Delay-Insensitive circuits Introduces very big area overhead ACK 2n DATA DATA EMPTY VALID EMPTY VALID EMPTY VALID ACK 0 E 1 12

  13. 1 1 1 0 0 0 Dual rail • Two wires per bit • “00” = spacer, “01” = 0, “10” = 1 13

  14. Asynchronous modules • Signaling protocol:reqin+ start+ [computation] done+ reqout+ ackout+ ackin+reqin- start- [reset] done- reqout- ackout- ackin- DATA PATH Data IN Data OUT start done req in req out CONTROL ack in ack out 14

  15. Asynchronous components • Asynchronous design require additional components and special logic • Such components are not available in standard synchronous design kit • Critical components are C-element and Mutex 15

  16. Muller C-element A b z 0 0 0 0 1 no change 1 0 no change 1 1 1 16

  17. Mutual Exclusion element • ME prevents multiple event propagation ME is used for arbitration 17

  18. A.t C.t B.t A.f C.f B.f Dual-rail logic • Dual-rail logic require additional logic for each logical operation Dual-rail AND gate 18

  19. C done Completion detection tree Completion detection (dual-rail) • • • • • • 19

  20. Completion detection (bundled-data) logic • • • • • • Conventional logic + matched delay start done delay 20

  21. Muller pipeline • The” delay-insensitive handshake machine • C[i] accepts 1/0 from C[i-1] only if C[i+1]=0/1 • Think of 1010101.. as waves: 10 10 10 1.. • The C-elements propagate waves precisely • Timing depends on local delays, may vary along the pipe • If RIGHT is quiet, pipe will fill and stall 21

  22. Aout Ain C L L L L logic logic logic C C C Rin Rout delay delay delay Micropipelines (Sutherland 89) 22

  23. E V V E E Abstract Pipeline • Bubbles • Tokens Valid (0 or 1, who cares) and Empty tokens 23

  24. V V V E E E V V E E V E token bubble Abstract Rings • 3 stages, 1 bubble: • 3 steps for token round • 6 steps to cycle 24

  25. Latch Source Sink Fork Join (wait for all) Merge (wait for one) 0 0 1 1 MUX DEMUX Function Block (Join; CL; Fork) Building Blocks 25

  26. Describing Asynchronous Cirsuit - STGs A+ A B+ B A– A input B output B– 26

  27. C Control specification – C element A+ B+ A C+ C B A- B- C- 27

  28. Ro+ Ri+ Ri Ro FIFO cntrl Ao+ Ai+ Ao Ai Ro- Ri- C C Ai- Ao- Ri Ro Ao Ai Control specification – FIFO Controller 28

  29. A simple filter: specification IN Ain Rin y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop filter Aout Rout OUT 29 J. Cortadella - Introduction to asynchronous circuit design: specification and synthesis

  30. + OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout A simple filter: block diagram • x and y are level-sensitive latches (transparent when R=1) • + is a bundled-data adder (matched delay between Ra and Aa) • Rin indicates the validity of IN • After Ain+ the environment is allowed to change IN • (Rout,Aout) control a level-sensitive latch at the output 30

  31. + OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout- Ra- Ry- Rx- Rin- Aout- Aa- Ay- Ax- Ain- A simple filter: control spec. 31

  32. Rx Ax Aa Ry Ra Ay Aout C Ain Rout Rin Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout- Ra- Ry- Rx- Rin- Aout- Aa- Ay- Ax- Ain- A simple filter: control impl. 32

  33. x’ z+ x- x y z’ z x+ y+ z- y- Taking delays into account • Delay assumptions: • Environment: 3 times units • Gates: 1 time unit events: x+  x’-  y+  z+  z’-  x-  x’+  z-  z’+  y-  time: 3 4 5 6 7 9 10 12 13 14 33

  34. z+ x- x+ y+ z- y- Taking delays into account x’ x y z’ z very slow Delay assumptions: unbounded delays events: x+  x’-  y+  z+  x-  x’+  y- failure ! time: 3 4 5 6 9 10 11 34

  35. Gate vs wire delay models • Gate delay model: delays in gates, no delays in wires • Wire delay model: delays in gates and wires 35

  36. DI Delay models for async. circuits • Bounded delays (BD): realistic for gates and wires. • Technology mapping is easy, verification is difficult • Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. • Technology mapping is more difficult, verification is easy • Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. • DI class (built out of basic gates) is almost empty • Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). • Formally, it is the same as speed independent • In practice, different synthesis strategies are used BD SI  QDI 36

  37. Desynchronization - concept • Start with synchronous design • Replace clock with local handshake • Use standard CAD tools • Does not change datapath • Guaranteed correctness 37 * Eyal Friedman, Desynchronization - From Synchronous to Asynchronous design, Seminar in VLSI Architecture, Technion, Israel, Spring 2008

  38. Desynchronization - flow steps • Main assumptions: • Normal Combinatorial logic, DFF • single clock • single clock edge 38

  39. Desynchronization flow step #1 • Replace DFF by M+S latches 39

  40. Desynchronization flow step #2 • Add matched delays • Respect bundling assumption • Delay > Tpd of CL • Delay serves as completion signal 40

  41. Desynchronization flow step #3 • Replace clock by local handshake controllers 41

  42. Why Asynchronous Design? • We are used to sync design • Logic and timing assumptions are simpler, but not true in reality • Currently it is very hard to solve big problems of synchronous design like clock skew, big power consumption, process variability ... • Common arguments for asynchronous design: • Low power ?  • High speed ?   • Low emission ?  • Low sensitivity to PVT (Process, Voltage, Temperature) variations ?  • High modularity (SoC) ?  • No clock distribution and timing problems (works) ?  • Secure chips ?  42

  43. Why not Asynchronous Design? • Overhead (area, speed, power) • Hard to design • Non-decomposable to small combinatorial logic blocks • Converting synchronous design to asynchronous typically fails • Few CAD tools • There is no real complete design-flow available • There is only one commercial async EDA vendor available (Handshake Solutions) with very specific design flow (HASTE) • Hard to test • Asynchronous test methods are not present yet (or not mature enough), and it is difficult to go into any production without proper testing 43

  44. Available tools • There are several tools available for automation of Asynchronous Design • Mostly tools are developed at Universities • Two groups of tools: for synthesis of asynchronous controllers and for design of the systems • I group Minimalist Petrify 3D II group BALSA TAST HASTE 44

  45. Minimalist • Developed at Columbia University • “burst-mode” synthesis package • based on synthesis of asynchronous FSMs • integrates synthesis, testability and verification tools • Good side Produce Hazard-free control circuits Contains several different algorithms for synthesis Can provide generalized C-element based mapping and also behavioral Verilog • Bad side Doesn’t support arbitration and EBM No optimal algorithm selection 45

  46. Petrify • Designed by J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, A. Yakovlev • Synthesis of Asynchronous controllers defined as Petri Nets or Signal Transition Graphs (STG) • Good side Produce optimal Hazard-free control circuits Can provide generalized C-element based mapping, complex-gate mapping and mapping to the technology libraries • Bad side Supports only asynchronous design, not mixed sync-async With increased number of signals, synthesis time grows exponentially Suitable for relatively small controllers 46

  47. 3D • Produced by Kenneth Yun • “Extended Burst-Mode” synthesis package • Good side Produce Hazard-free control circuits Supports restricted multiple-input change (input burst) with don't-care inputs Supports input choices based on sampling possibly glitchy signals Suitable for mixed sync-async systems (like GALS) • Bad side No technology mapping No optimal algorithm selection No support and further development 47

  48. TAST • Produced by TIMA Laboratory, France • TAST is compiler/synthesizer of Asynchronous digital circuits from high level communication description language Input is CHP language, which can describe Petri Nets. It is using VHDL as a format for behavioral and post synthesis simulation. Produces QDI (dual-rail, 1-M code rail) circuits • Good side Produces complete asynchronous system and provides full design-flow • Bad side Uses QDI style, which gives very big area overhead Gives not optimized output circuits Not available in the moment 48

  49. TAST Design flow 49

  50. BALSA • Produced by University of Manchester • BALSA is compiler/synthesizer of Asynchronous digital circuits from high level communication description language Input is BALSA language developed specially for this package Produces Bundled data, Dual-rail, 1-M code rail circuits • Good side Produces complete asynchronous system and provides full design-flow • Bad side Gives large overhead compared with manual design (up to 300 %) All tools are not freely available 50

More Related