220 likes | 703 Views
Asynchronous Circuits. Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams. Agenda. What are Asynchronous Circuits? Advantages & Disadvantages Example Asynchronous Circuit GasP FPGAs Design Project. What are Asynchronous Circuits?.
E N D
Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams
Agenda • What are Asynchronous Circuits? • Advantages & Disadvantages • Example Asynchronous Circuit • GasP • FPGAs • Design Project
What are Asynchronous Circuits? • Synchronous Circuits • Everything synchronized to a global clock • Clock edges determine the time instants where data is sampled • Register inputs are sampled at the clock rising edge • Data wires may glitch between clock edges • “Worst case” operation: • The clock frequency is limited by the speed of the slowest stage. • The clock frequency must be slow enough that the circuit will work with worst case PVT, and worst case data. Clock 9 ns 4 ns 6 ns 10 ns 10 ns 10 ns
What are Asynchronous Circuits? • Asynchronous Circuits • Eliminate the global Clock signal • States defined in terms of input values and internal actions • Synchronize data transfer by other means • Handshaking, flow control • “Average-case” performance: each block goes as fast as it goes. • Each block goes as fast as it goes. Req Ack 4 ns 6 ns 9 ns 10 ns 5 ns 7 ns
Micropipelines • Each data channel associated with two abstract control signals • Rdy – indicates when the upstream stage has data. • Ack – indicates when the downstream stage is finished with the previous data. • Data moves through a stage when the upstream stage has data available, and the downstream stage is ready for new data. • If no logic processing is being performed, the circuit acts as an elastic FIFO. Rin Rout A1 R2 A3 C C Din Dout C C Ain R1 A2 R3 Aout
Advantages • Technology Scaling Potential • No circuit retiming/re-pipelining necessary • Technology-independent, in some ways • Automatic adaptation to physical properties, PVT • Lower EMI • Activity in synchronous circuits produce predictable EMI patterns • Ease of composition • Easier to interface heterogeneous IP cores • No timing assumptions necessary • Performance • Average-case instead of worst case • Low Power • Clock accounts for 30 – 50% of chip dynamic power • Automatic clock gating in asynchronous • Escape from Metastability • No concern about clock crossing: circuits are metastable-safe by design • Easier Circuit Synthesis • No clock distribution, no clock skews, no clock buffering tree analysis • No timing-driven placement necessary
Disadvantages • Vulnerable to circuit hazards & glitches • Circuits are larger • more area for control & handshaking logic, encoding scheme, hazard avoidance • More difficult & less mature than synchronous designs • Benefits not explored on large-scale VLSI • Synchronous designs • are well understood : it’s easier to think sequentially than concurrently • provide a simple way to deal with noise and hazards • are tolerant to glitches • CAD Tools • Synchronous tools are quite mature • No such established asynchronous tools
Example Asynchronous Circuit • TOKYO, Japan, February 9, 2005: Epson Develops the World's First Flexible 8-Bit Asynchronous Microprocessor • Seiko Epson Corp. ("Epson") has announced that it has developed the world's first*1 flexible 8-bit asynchronous microprocessor using low-temperature polysilicon thin-film transistors (LTPS-TFTs) on a plastic substrate • With energy consumption reduced by 70% compared to the synchronous microprocessors now in everyday use, Epson is now researching potential applications for its invention. • Using asynchronous circuit design technology, Epson has been able to: • Make a stable 8-bit microprocessor composed of 32,000 LTPS-TFTs, • Achieve energy consumption 70% lower than the synchronous design, • Reduce electromagnetic radiation by 20dB.
GasP • A family of asynchronous circuits that provide controls for: • simple pipelines • branching and joining, • Scatter & gather • Join on demand with arbitration • Excess of 1.5 G data items / second in 0.35 um • A single wire is used to carry both Ack & req messages, indicating that each is empty or full. • Rely on careful choice of transistor widths to equalize delay in logic gates.
GasP Circuit If the upstream state conductor is full (low), and the downstream state conductor is empty (high), b and x both conduct, driving the voltage at (1) low. This causes transistor p to turn on, making the data latch momentarily transparent.
GasP Circuit • The low voltage at (2) causes transistor d to turn on, driving the downstream state conductor to low (full). • This also causes transistor y to turn on, driving the upstream state conductor to high (empty) • Transistor t turns on, resetting the top of the nand gate to a high value, causing pass transistor p to turn off.
GasP Circuit • The propagation of data in the forward direction through the circuit is four gate delays per stage: a b c d • The transistors for Logic functions must be sized such that the logic functions take no more than four gate delays. • The propagation of holes in the reverse direction is two gate delays per stage: x y
FPGAs • Commonly built of 4-input look-up tables (LUTs) • Effectively a small RAM block with 1 data bit, and 16 memory locations. • Any logic function with up to 4 inputs can be made from a 4 input LUT. • Combinations of LUTs are used to create larger logic functions. • RAM is programmed at configuration time, or during operation. • A register for each logic element • Connected with a ‘sea of programmable interconnect’ • SRAM used to configured at start-up time
FPGAs • Almost exclusively synchronous • Frequency is limited by the worst case path from a register, through one or more lookup tables, through the routing matrix, and into the next register. • The delay through a LUT is constant (and worst case!) • A 2-input XOR function takes as much time as a complex 4-input function. • The path from a register to the next register is very granular • If the logic function is 5 inputs, then then the propagation delay is almost doubled over the 4-input case. • High power • Clock distribution network goes everywhere. • Power consumed to drive logic elements that aren’t used for a given design
Design Project • 16:1 pipeline multiplexer in four stages, using GasP pipeline. • Essentially a 4-input LUT • Compare with equivalent synchronous design with the same gate sizes • Performance, Power & Energy per cycle, Circuit Size • SPICE Simulations, with 0.13um technology • using TSMC models from MOSIS • Example: Out ABCD Sel [ABCD] Delay Delay Delay 0 In0 D-Sel0 C-Sel1 B-Sel2 A-Sel3 0 In1 0 In2 0 In3 0 In4 0 In5 0 In6 0 In7 0 In8 0 In9 0 In10 Out 0 In11 0 In12 0 In13 0 In14 1 In15
Design Project • Motivation • The pipeline is shortened when some inputs are not used, leading to reduced propagation delay. • If GasP latches are at each stage within the LUT, the flip-flop after each LUT is not required • The effective operating frequency is not due to the propagation between GasP stages, not LUTs. • Performance can be further increased by incorporating GasP FIFO stages into the routing network. • Example: Z AB Sel [ABCD] Delay Delay Delay 0 In0 D-Sel0 C-Sel1 B-Sel2 A-Sel3 0 In1 0 In2 0 In3 0 In4 0 In5 0 In6 0 In7 0 0 In8 0 0 In9 0 In10 0 Out 0 In11 1 1 In12 1 In13 1 In14 1 In15
References [1] Sutherland, Ivan, and Fairbanks, Scott, “GasP: A minimal FIFO Control”, Synchronous Circuits and Systems, 2001. ASYNC 2001. Seventh International Symposium on , 11-14 March 2001 [2] Shams, Maitham, Ebergen, Jo, and Elmasry, Mohammed I. “Asynchronous Circuits”, http://citeseer.ist.psu.edu/495643.html [3] Ebergen, J, “Squaring the FIFO in GasP”, Asynchronous Circuits and Systems, 2001. ASYNC 2001. Seventh International Symposium on , 11-14 March 2001 [1] I. Sutherland, “Micropipelines”, Communications of the ACM, June 1989 [4] Girish Venkataramani, “Asynchronous Logic Design: What, Why and How?” National University of Singapore, Sept, 2004 [5] Myers, Chris J, “Asynchronous Circuit Design”, University of Utah lecture notes [6] A. Davis, S. Nowick, “An Introduction to Asynchronous Circuit Design”, University of Utah, Columbia University. [7] Asynchronous Logic Homepage http://www.cs.man.ac.uk/async/ [8] http://www.epson.co.jp/e/newsroom/2005/news_2005_02_09.htm [9] S.Brown, J. Rose, “Architecture of FPGAs and CPLDs: A Tutorial”, Department of Electrical and Computer Engineering, University of Toronto, 1994
Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams
Classification: Timing • Delay-Insensitive (DI) • Designed to operate correctly regardless of the delays on gates & wires • “Unbounded” gate & delay model assumed. • The class of simple DI operations built out of basic gates is almost empty • Practical DI circuits can be build with complex compnents that use timing assumptions within the component. • Example: C-Element • Quasi-Delay Insensitive (QDI) • Same as DI, but with Isochronic fork delay assumption • An isochronic fork is a forked wire where all branches have the same or a bounded delay • Weakest compromise to true DI circuits needed to build practival circuits. • Speed-Independent (SI) • Unbounded delays for gates and “negligible” (optimistic) delays for wires. • Self-timed • The circuit contains a number of elements, where each element may be SI internally. • Communication between regions is assumed to be Delay Insensitive.
Classification: Signaling • Control Signaling • Request/Acknowledge (Self-Timed) is popular • Four phase / Return to Zero / Level signalling • Req / Ack / Req \ Ack \ : 1 cycle. • Two phase / Non-RTZ / Transition Signalling • Req / Ack / : 1 Cycle. Req \ Ack \ : 1 cycle. • Data Signaling • Bundled Data • Normal wires, one wire per bit. • Use control signals to indicate when data is valid. • Dual-rail data • 2 wires per bit, encoding implies data validity • 00=no data, 01=0, 10=1, 11=invalid • Simple acknowledge control wire