230 likes | 470 Views
Clockless Logic: Asynchronous Pipelines. MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001. MOUSETRAP Pipelines. Simple asynchronous implementation style, uses… transparent latches
E N D
Clockless Logic: Asynchronous Pipelines MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001
MOUSETRAP Pipelines Simple asynchronous implementation style, uses… • transparent latches • simple control:1 gate/pipeline stage Target datapath = static logic blocks “MOUSETRAP”: uses a “capture protocol” Latches … • are normally transparent: beforenew data arrives • become opaque: afterdata arrives (“capture” data) Control Signaling:transition-signaling = 2-phase • simple protocol: req/ack = only 2 events per handshake (not 4) • no “return-to-zero” • each transition (up/down) signals a distinct operation Our Goal: very fast cycle time • simple inter-stage communication
MOUSETRAP: A Basic FIFO Stages communicate usingtransition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En doneN reqN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline
MOUSETRAP: A Basic FIFO (contd.) Latch is disabled when current stage is “done” Latch is re-enabled when next stage is “done” Latch controller (XNOR) acts as “phase converter”: • 2 distinct transitions (up or down) pulsed latch enable Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN reqN+1 doneN Data in Data out Data Latch Stage N-1 Stage N Stage N+1
MOUSETRAP: FIFO Cycle Time 3 Latch Controller 2 ackN-1 ackN En reqN reqN+1 doneN 1 2 Data in Data out Data Latch Fast self-loop: N disables itself Stage N-1 Stage N Stage N+1 Cycle Time = N re-enabled to compute N+1 computes N computes
Detailed Controller Operation • One pulse per data item flowing through: • down transition:caused by“done” of N • up transition:caused by“done” of N+1 • No minimum pulse width constraint! • simply, down transition should start “early enough” • can be “negative width” (no pulse!) Stage N’s Latch Controller ackfrom N+1 donefrom N to Latch
MOUSETRAP: Pipeline With Logic logic logic logic Logic Blocks:can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: • each“req”must arrive after data inputs valid and stable Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN+1 reqN delay delay delay doneN Data Latch Stage N-1 Stage N Stage N+1
Special Case: Using “Clocked Logic” pull-up network A B “keeper” “keeper” logic inputs logic inputs En En logic output logic output En En pull-down network A B A General C2MOS gate Clocked-CMOS = C2MOS: eliminate explicit latches • latch folded into logic itself C2MOS AND-gate
Gate-Level MOUSETRAP: with C2MOS En,En 2 2 2 (ack,ack’) (done,done’) (En,En’) Use C2MOS:eliminate explicit latches New Control Optimization =“Dual-Rail XNOR” • eliminate 2 inverters from critical path Latch Controller ackN-1 ackN 2 2 2 doneN 2 2 reqN reqN+1 pair of bit latches C2MOS logic Stage N-1 Stage N+1 Stage N
Complex Pipelining: Forks & Joins fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures • Forks: distributedata + controlto multiple destinations • Joins: mergedata + controlfrom multiple sources • Enabling technology for building complex async systems Problems with Linear Pipelining: • handles limited applications; real systems are more complex
Forks and Joins: Implementation ack1 C ack ack2 req1 C req req req2 Stage N Stage N Join:merge multiple requests Fork:merge multiple acknowledges
Related Protocols Day/Woods (’97), and Charlie Boxes (’00) Similarities: all use… • transition signaling for handshakes • phase conversion for latch signals Differences: MOUSETRAP has… • higher throughput • ability to handlefork/joindatapaths • more aggressive timing, less insensitivity to delays
Performance, Timing and Optzn. Stage Latency = Cycle Time = Stage Latency = Cycle Time = MOUSETRAP with Logic: MOUSETRAP Using C2MOS Gates:
Timing Analysis Latch Controller ackN-1 ackN reqN+1 reqN delay delay doneN logic logic Data Latch Stage N Stage N-1 Main Timing Constraint: avoid “data overrun” Data must be safely “captured” by Stage N before new inputs arrive fromStage N-1 • simple 1-sided timing constraint: fast latch disable • Stage N’s “self-loop” faster than entire path through previous stage
Timing Optzn: Reducing Cycle Time Analytical Cycle Time = Goal:shorten (in steady-state operation) Steady-state = no undue pipeline congestion Observation: • XNOR switches twice per data item: • only 2nd (up) transition criticalfor performance: Solution: reduce XNOR output swing • degrade “slew” for start of pulse • allows quick pulse completion: faster rise time Still safe when congested:pulse starts on time • pulse maintained until congestion clears
Timing Optzn (contd.) “optimized” XNOR output “unoptimized” XNOR output N’s latch disabled N’s latch re-enabled N “done” N+1 “done” latch only partly disabled; recovers quicker! (no pulse width requirement)
Comparison with Wave Pipelining Two Scenarios: • Steady State: • both MOUSETRAP and wave pipelines act like transparent “flow through” combinational pipelines • Congestion: • right environment stalls: each MOUSETRAP stage safely captures data • internal stage slow: MOUSETRAP stages to its left safely capture data congestion properly handled in MOUSETRAP Conclusion: MOUSETRAP has potential of… • speed of wave pipelining • greater robustness and flexibility
Timing Issues: Handling Wide Datapaths En En reqN doneN reqN+1 reqN doneN reqN+1 Stage N-1 Stage N Stage N-1 Stage N Buffers inserted to amplify latch signals (En): Reducing Impact of Buffers: • control uses unbuffered signals buffer delay off of critical path! • datapath skewed w.r.t. control Timing assumption: buffer delays roughly equal
En reqN doneN reqN+1 Stage N-1 Stage N
Preliminary Results Pre-Layout Simulations of FIFO’s: • do not account for wire delays, parasitics, etc. • careful transistor sizing/verification of timing constraints
Conclusions and Future Work Introduced a new asynchronous pipeline style: • Static logic blocks • Simple latches and control: • transparent latches, or C2MOS gates • single gate control = 1 XNOR gate/stage • Highly concurrent event-driven protocol • High throughputs obtained: • 3.5 GHz in 0.25, 1.9 GHz in 0.6 • comparable to wave pipelines; yet more robust/less design effort • Correctly handle forks and joins in datapaths • Timing constrains: local, 1-sided, easily met Ongoing Work: • more realistic performance measurement (incl. parasitics) • layout and fabrication