1 / 36

High Performance Asynchronous ASIC Back-End Design Using Single-Track Full-Buffer Standard Cells

This paper discusses the design flow and implementation of a high-performance asynchronous ASIC back-end design using single-track full-buffer standard cells.

Download Presentation

High Performance Asynchronous ASIC Back-End Design Using Single-Track Full-Buffer Standard Cells

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California

  2. Key to High-Speed Async Design Control logic • Completion detection demands 2-D pipelining Async. channels Pipeline stages Latches Latches Latches Datapath Bundle-data pipeline 2-D pipeline USC Asynchronous CAD/VLSI Group

  3. 1 2 Control channel Req Ack Receiver Sender Control Data Latches Single-rail data Latches Data stable Ack 1-of-N 2 4 Acknowledge 1 3 Receiver Sender 1-of-N data 1-of-N channel 1 2 1-of-N data Receiver Sender 1-of-N Acknowledge 1-of-N single-track channel Asynchronous Channels GasP bundle-data channel USC Asynchronous CAD/VLSI Group

  4. fw = 4 t = 6 Includes latch setup time and delay GasP (Sutherland et al.’01) Self-resetting NAND A GasP R L L R B Latches Staticizer Pulse to data latches Datapath Bundled-data pipeline using single-track control USC Asynchronous CAD/VLSI Group

  5. RCD fw = 2 t = 14+ Precharge Half-Buffer (Lines’98) Schematic for each output rail Pc Eval Le Re C Rx Sx Eval Pc LCD NMOS transistor stack L L L R R Precharge Half-Buffer Template 2-D pipeline using 1-of-N delay-insensitive channels and QDI cells USC Asynchronous CAD/VLSI Group

  6. fw = 2 t = 10 Pulse generator Pulse generator Single-Track Asynchronous Pulsed Logic (Nyström’01) Schematic for dual-rail output xv Re S0 S1 RCD L01 L0n NMOS transistor stack L11 L1n S Re xv R0 R1 L L R R re R0 R1 R4 Reset L01 L11… L0n L1n STAPL template R4 xv STAPL uses pulse generators to control drivers activation timing USC Asynchronous CAD/VLSI Group

  7. B B RCD B S0 S1 L L R R L01 L0n NMOS transistor stack L11 L1n R1 R0 S A C Reset SCD L01 L11… L0n L1n B A R0 R1 S0 S1 B Timing Diagram L S A B R Single-Track Full-Buffer (Ferretti’02) Block diagram Schematic for dual-rail output fw = 2 t = 6 Small and fast USC Asynchronous CAD/VLSI Group

  8. STFB: Tradeoff Speed for Robustness GasP performance • Features of STFB • 3x faster than QDI and about half the size • Smaller and faster than STAPL • Smaller forward latency and less timing assumptions than GasP (Sutherland - Sun) STFB (Ferretti - USC) STAPL (Nyström - Caltech) QDI (Lines - Caltech) robustness USC Asynchronous CAD/VLSI Group

  9. Motivation and Goals • Develop a methodology to design STFB-based asynchronous circuits using conventional CAD tools • Create a STFB standard cell library • Make the library publicly-available • Design and fabricate a demonstration test chip • Evaluate the results Ultimate Goal: Full-custom Performance with ASIC Design Times USC Asynchronous CAD/VLSI Group

  10. Outline • STFB standard-cell design • Backend design flow • Demonstration test chip • Conclusions USC Asynchronous CAD/VLSI Group

  11. STFB channels are point to point (no forked wires) One size per cell in the library is adequate STFB Standard-Cell Design • Transistor sizing USC Asynchronous CAD/VLSI Group

  12. 2x 8x 2.8 2.8 Sx Sx 10 10 NMOS transistor stack NMOS transistor stack Rx Rx C C RCD RCD B B Wn Wn SCD SCD A A 5 5 8x STFB Standard-Cell Design • Transistor sizing • 2x min. size N-stack strength • 1:4-5 drive ratio ≤ 1mm L L Up to 1mm long wire TSMC 0.25mm, widths in mm and all lengths 0.24 mm USC Asynchronous CAD/VLSI Group

  13. 1.4 R0 1.4 2.8 2.8 A 1.4 R1 1.2 1.2 1.4 S0 B S1 1.2 1.2 1.2 1.2 STFB Standard-Cell Design Balanced response SCD/RCD SCD balanced NAND (2x) RCD balanced NOR (1x) TSMC 0.25mm, widths in mm and all lengths 0.24 mm Data-independent timing assumptions USC Asynchronous CAD/VLSI Group

  14. fast S reset fights charge–sharing fights leakage current staticizer STFB Standard-Cell Design STFB_POUT sub-cell layout STFB_POUT sub-cell B 0.6 2.8 0.3 1.4/0.6 S 0.6 10 R 1.2 NR 1.2 TSMC 0.25mm, widths in mm and all lengths 0.24 mm Yields less load on B and faster S reset USC Asynchronous CAD/VLSI Group

  15. Reset transistors, reset inverter and NAND layout (from STFB_XOR2 cell) L01 L11… L01 L11… A2 A2 /Reset S2 /Reset L01 L11… L01 L11… L01 L11… S0 /Reset S1 A1 S0 S1 A1 S0 S1 A 1-of-2 cell 2-input NAND + inverter 1-of-3 cell two 2-input NAND Initial idea 3-input NAND STFB Standard-Cell Design • Reset transistors TSMC 0.25mm, widths in mm and all lengths 0.24 mm 2-input NAND →less load on S USC Asynchronous CAD/VLSI Group

  16. VDD VDD -Vtp Vtn 0V Ipeak1 Ipeak2 0A VAVSx t Idp t STFB Standard-Cell Design • Direct-path current analysis VDD VDD -Vtp Vtn 0V Ipeak 0A Vin M1 t Vin Vout M2 Idp Idp t Sx M1 M2 A Idp Average direct-path current is similar to inverter USC Asynchronous CAD/VLSI Group

  17. Outline • STFB standard cell design • Backend design flow • Demonstration test chip • Conclusions USC Asynchronous CAD/VLSI Group

  18. Standard-Cell Library Development (Ozdag’04) Template specifications Cell specifications Symbol, Schematic and Functional (Virtuoso, Emacs) Simulation (Verilog, Hspice) Symbol Schematic Functional Asynchronous Cell Library LVS/DRC (Dracula/Diva) Layout Layout (Virtuoso) Abstract Standard cell specifications Cell Abstract (Envisia) Same tools and flow as synchronous USC Asynchronous CAD/VLSI Group

  19. Design specifications Symbol Schematic Functional Schematic (Virtuoso) Simulation (Verilog, Nanosim) Asynchronous Cell Library Place & Route (Silicon Ensemble) Abstract Chip Assembly (Virtuoso) LVS/DRC (Dracula/Diva) Layout Chip Fabrication Asynchronous ASIC Design Flow (Ozdag’04) Same tools and flow as synchronous USC Asynchronous CAD/VLSI Group

  20. STFB_POUT STFB_POUT R1 R0 S0 S1 S0 S1 S1 S0 C RCD B R0 R1 b0 b1 b0 b1 A0 A1 B0 B1 A0 A1 B0 B1 a0 a0 a1 a1 SCD S0 S1 A Reset /Reset C B S B S R R Cell Layout Example: STFB2_XOR2 Each cell comprises an entire STFB pipeline stage USC Asynchronous CAD/VLSI Group

  21. Outline • STFB standard cell design • Backend design flow • Demonstration test chip • Conclusions USC Asynchronous CAD/VLSI Group

  22. 3 + élog2 nù Prefix Adder STFB2_FORK (fork stage) STFB2_BUFFER (buffer stage) STFB2_XOR2 (2-input xor stage) STFB3_AB_KPG and STFB3_AB_KPG2 STFB3_KPG2_KPG and STFB3_KPG2_KPG2 STFB3_KPGC_C and STFB3_KPGC_C2 b7 a7 b6 a6 b5 a5 b4 a4 b3 a3 b2 a2 b1 a1 b0 a0 c-1 (Goldovsky’99) c7 s7 s6 s5 s4 s3 s2 s1 s0 2*n + 1 USC Asynchronous CAD/VLSI Group

  23. M4 and M5 power grid 129 rows Input pins on the left (A64, B64 and C) Output pins on the right (S64 and C) 70% area utilization Floor plan Plan power Pins and cell placement Filler cell Routing 64-bit Adder Block • Silicon Ensemble P&R Schematic (Virtuoso) Place & Route (Silicon Ensemble) USC Asynchronous CAD/VLSI Group

  24. Input Generator Block 8x8 Single-rail to single-track converter 64 A 64x9-stage ring 8 8 4 levels STFB2_SPLIT 8x8 data d0…d7 12x STFB2_SRST 4 address a0…a3 64 B 64x9-stage ring 4 STFB2_SRST 9-stage ring 1 1 Cin Carry in Flexible and fast input generation USC Asynchronous CAD/VLSI Group

  25. = 1,10,… = 1,100,… 0010000000 1000000000 0000100000 1000000000 0000000100 1000000000 = 1,1000,… = 3,13,… = 43,143,… = 843,1843,… Output Sampler Block 1:10 1:100 1:1000 65 65 65 1 1 1 64 bit sum + Cout 65 65x STFB2_BUCKET 65x STFB2_BUCKET 65x STFB2_BUCKET 65x STFB2_SPLIT 65x STFB2_SPLIT 65x STFB2_SPLIT 65 65 65 BB BB BB 0 0 0 30-stage ring 30-stage ring 30-stage ring Flexible and fast output sampler USC Asynchronous CAD/VLSI Group

  26. Simulation Results: Loading Carry in • Nanosim Sampler: 10x4x4 = 160 3x B64 3x A64 Go! USC Asynchronous CAD/VLSI Group

  27. Simulation Results: Running • Nanosim Go! Carry out Sum 112.9ns 112.9/160 = 0.706ns 1/0.706ns = 1.4 GHz USC Asynchronous CAD/VLSI Group

  28. Simulation Results USC Asynchronous CAD/VLSI Group

  29. 3733 mm 1963 mm 801 mm 663 mm 499 mm STFB 64-bit Adder 20.5 mm2 132 pins INPUTGEN129BY9 ADDER64 SAMPLER65BY1000 1.36 mm2 105k transistors 1.3 A @ 1.4 GHz 1.13 mm2 89k transistors 1.3 A @ 1.4 GHz 0.85 mm2 62k transistors 0.3 A @ 1.4 GHz 1700 mm 5483 mm QDI Sequential Decoder (Session VI, 10:30am, Thu, Apr/22) ~6 months/man Library ~6 months/man Design 3.3 mm2 257k transistors 2.9 A @ 1.4 GHz Demonstration chip Top layout TSMC 0.25 mm MOSIS Mar/22/04 USC Asynchronous CAD/VLSI Group

  30. Summary and Conclusions • Performance • STFB 2-D pipelining yields ultra-high-performance • Design Time • Back-end flow achieves ASIC design time • Availability • Cell library has been made freely available • Future work • Characterize and extend library • Static timing analysis and sign-off USC Asynchronous CAD/VLSI Group

  31. Efharisto!(Thank you!) USC Asynchronous CAD/VLSI Group

  32. Sx Sx Sx Sx R R R R L L L L RCD RCD RCD RCD A A A A STFB Standard-Cell Design • Dynamic worst-case direct-path current analysis • (STFB buffer pipeline at 2GHz) 1mm TSMC 0.25mm, widths in mm and all lengths 0.24 mm Non-overlap drive = less direct-path current than an inverter USC Asynchronous CAD/VLSI Group

  33. 1,0,0,1,0,0… 0 0 0 0 0 0 1 1 1 1 1 Input Generator Block • 9-stage ring in out go BG STFB2_FORK (fork stage) STFB2_BUFFER (buffer stage) STFB2_XOR2 (2-input xor stage) STFB2_BITGEN (bit generator) BG STFB2_MERGENC (non-conditional merge stage) USC Asynchronous CAD/VLSI Group

  34. Et2 • Comparison STFB x WCHB STFB buffer is ~3x more efficient than WCHB buffer USC Asynchronous CAD/VLSI Group

  35. 1963 mm 801 mm 663 mm 499 mm INPUTGEN129BY9 ADDER64 SAMPLER65BY1000 1.36 mm2 105k transistors 1.3 A @ 1.4 GHz 1.13 mm2 89k transistors 1.3 A @ 1.4 GHz 0.85 mm2 62k transistors 0.3 A @ 1.4 GHz 12 In/Out, 8 Input and 3 pad’s supply pins 1700 mm Total: 51 pins 3.3 mm2 257k transistors 2.9 A @ 1.4 GHz 7 Vdd and 7 Gnd pins 7 Vdd and 7 Gnd pins Demonstration chip TSMC 0.25 mm MOSIS Mar/22/04 Top layout USC Asynchronous CAD/VLSI Group

  36. Test chip design TSMC 0.25 mm MOSIS Mar/22/04 Top chip layout 5483 mm STFB 64-bit Adder QDI Sequential Decoder (Session VI, 10:30am, Thu) 3733 mm 20.5 mm2 132 pins USC Asynchronous CAD/VLSI Group

More Related