380 likes | 544 Views
Bridging the gap between asynchronous design and designers. Hao Zheng. Outline. What is an asynchronous circuit ? Asynchronous communication Asynchronous design styles (Micropipelines) Asynchronous logic building blocks Control specification and implementation
E N D
Bridging the gap between asynchronous designand designers HaoZheng
Outline • What is an asynchronous circuit ? • Asynchronous communication • Asynchronous design styles (Micropipelines) • Asynchronous logic building blocks • Control specification and implementation • Delay models and classes of async circuits • Why asynchronous circuits ?
R CL R CL R CL R CLK Synchronous circuit Implicit (global) synchronization between blocks Clock period > Max Delay (CL + R) Time is an independent physical variable (quantity)
Asynchronous circuit Ack R CL R CL R CL R Req Explicit (local) synchronization: Req / Ack handshakes Time = events + quantity Time does not exist if nothing happens (Aristotle)
Motivation for Asynchronous • Asynchronous design is often unavoidable: • Asynchronous interfaces, arbiters etc. • Modern clocking is multi-phase and distributed – and virtually ‘asynchronous’ (cf. GALS – next slide): • Mesachronous (clock travels together with data) • Local (possibly stretchable) clock generation • Robust asynchronous design flow is coming (e.g. VLSI programming from Philips, NCL from Theseus Logic, fine-grain pipelining from Fulcrum)
Motivation(TechnologyAspects) • Lowpower • Automatic clock gating • Electromagneticcompatibility • No peakcurrentsaround clock edges • Security • No ‘electro-magneticdifference’ between logical ‘0’ and ‘1’in dual railcode • Robustness • Highimmunitytotechnology and environmentvariations (temperature, powersupply, ...)
Motivation(Designer’sView) • Modularityforsystem-on-chip design • Plug-and-playinterconnectivity • Average-case peformance • No worst-case delaysynchronization • Many interfaces are asynchronous • Buses, networks, ...
Globally Async Locally Sync (GALS) Asynchronous World Clocked Domain Req3 Req1 R R CL Ack3 Ack1 Local CLK Req4 Req2 Ack4 Ack2 Async-to-sync Wrapper
Key Design Differences • Synchronous logic design: • proceeds without taking timing correctness (hazards, signal ack-ing etc.) into account • Combinational logic and memory latches (registers) are built separately • Static timing analysis of CL is sufficient to determine the Max Delay (clock period) • Fixed set-up and hold conditions for latches
Key Design Differences • Asynchronous logic design: • Must ensure hazard-freedom, signal ack-ing, local timing constraints • Combinational logic and memory latches (registers) are often mixed in “complex gates” • Dynamic timing analysis of logic is needed to determine relative delays between paths • To avoid complex issues, circuits may be built as Delay-insensitive and/or Speed-independent (Maller’s theory vs Huffman asynchronous automata)
Verification and Testing Differences • Synchronous logic verification and testing: • Only functional correctness aspect is verified and tested • Testing can be done with standard ATE and at low speed • Asynchronous logic verification and testing: • In addition to functional correctness, temporal aspect is crucial: e.g. causality and order, deadlock-freedom • Testing must cover faults in complex gates (logic+memory) and must proceed at normal operation rate • Delay fault testing may be needed
Synchronous communication • Clock edges determine the time instants where data must be sampled • Data wires may glitch between clock edges (set-up/hold times must be satisfied) • Data are transmitted at a fixed rate(clock frequency) 1 1 0 0 1 0
Dual Rail 1 1 1 • Two wires with L(low) and H (high) per bit • “LL” = “spacer”, “LH” = “0”, “HL” = “1” • n-bit data communication requires 2n wires • Each bit isself-timed • Otherdelay-insensitivecodesexist (e.g. k-of-n) and event-basedsignalling (choicecriteria: pin and powerefficiency) 0 0 0
BundledData • Validity signal • Similar toanaperiodic local clock • n-bit data communication requiresn+1 wires • Data wires may glitch when no validity signal. • Signaling protocols • level sensitive (latch) • transition sensitive (register): 2-phase / 4-phase 1 1 0 0 1 0
Example: Memory Read Cycle Validaddress • Transition signaling, 4-phase Address A A Valid data Data D D
Example: Memory Read Cycle Valid address • Transition signaling, 2-phase A A Address Valid data Data D D
AsynchronousModules DATA PATH • Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+reqin- start- [reset] done- reqout- ackout- ackin-(more concurrencyisalsopossible) Data IN Data OUT start done req in req out CONTROL ack in ack out
A C Z B A B Z+ 0 0 0 0 1 Z 1 0 Z 1 1 1 AsynchronousLatches: C element Vdd A B Z B A Z B A Z Static Logic Implementation A B [van Berkel 91] Gnd
Vdd A B Z B A Gnd C-element: Other Implementations Vdd A Weak inverter B Z B A Dynamic Quasi-Static Gnd
A.t C.t B.t A.f C.f B.f Dual-RailLogic Dual-rail AND gate Validbehaviorformonotonicenvironment
done C Completiondetectiontree CompletionDetection Dual-rail logic • • • • • •
DifferentialCascodeVoltageSwitchLogic start Z.f Z.t done A.t N-type transistor network C.f B.f A.f B.t C.t start 3-input AND/NAND gate
Examples of Dual-Rail Design • Asynchronous dual-rail ripple-carry adder (A. Martin, 1991) • Critical delay is proportional to logN (N=number of bits) • 32-bit adder delay (1.6m MOSIS CMOS): 11ns versus 40 ns for synchronous • Async cell transistor count = 34 versus synchronous = 28 • More recent success stories (modularity and automatic synthesis) of dual-rail logic from Null-Convension Logic from Theseus Logic
start done delay Bundled-Data LogicBlocks Single-rail logic • • • • • • Conventionallogic + matcheddelay
r1 g1 C d1 r2 g2 d2 r1 a1 r a r2 out0 a2 in sel out1 outf in outt Micropipelines (Sutherland 89) Micropipeline (2-phase) control blocks Request-Grant-Done (RGD)Arbiter Join Merge Call Select Toggle
C C C delay delay delay Micropipelines (Sutherland 89) Aout Ain C L logic L logic L logic L Rin Rout
DataPath/ Control L logic L logic L logic L Rin Rout CONTROL Ain Aout Synthesis of control is a major challenge
Control specification A+ A B+ B A- A input B output B-
Control specification A+ B- B A A- B+
C Control specification A+ B+ A C+ C B A- B- C-
C Control specification A+ B+ A C+ C A- B B- C-
Ro+ Ri+ Ri Ro FIFO cntrl Ao+ Ai+ Ao Ai Ro- Ri- C C Ai- Ao- Ri Ro Ao Ai Control Specification
Gate vs Wiredelaymodels • Gatedelaymodel: delays in gates, no delays in wires • Wiredelaymodel: delays in gates and wires
DI DelayModelsforAsync. Circuits • Boundeddelays (BD):realisticforgates and wires. • Technologymappingiseasy, verificationisdifficult • Speedindependent (SI):Unbounded (pessimistic) delaysforgates and “negligible” (optimistic) delaysfor wires. • Technologymappingis more difficult, verificationiseasy • Delayinsensitive (DI):Unbounded (pessimistic) delaysforgates and wires. • DI class (builtout of basicgates) isalmostempty • Quasi-delayinsensitive (QDI):Delayinsensitiveexceptforcriticalwireforks (isochronicforks). • In practiceitis the same as speedindependent BD SI QDI
Environment models • Slow enough environment = Fundamental mode (Inputs change AFTER system has settled) • Reactive environment = I/O mode (Inputs may change once the first output changes)
Correctness of a Circuit wrtDelay Assumptions C-element: z = ab +zb + za a a b z b z
Resistance • Concurrentmodelsforspecification • CSP, Petrinets, ...: no more FSMs • Difficulttodesign • Hazards, synchronization • Complextiminganalysis • Difficulttoestimate performance • Difficultto test • No wayto stop the clock
But ... some successful stories • Philips • AMULET microprocessors • Sharp • Intel (RAPPID) • Start-up companies: • Theseus logic, Fulcrum, Self-Timed Solutions • Recent blurb: It's Time for Clockless Chips, by Claire Tristram (MIT Technology Review, v. 104, no.8, October 2001: http://www.technologyreview.com/magazine/oct01/tristram.asp) • ….