1 / 25

Clockless Computing Lecture 3

This lecture provides an introduction to clockless computing and focuses on the basics of pipelining. It explains the concept of breaking complex operations into simpler sequential operations and the performance impact of pipelining. It also discusses the use of clockless computing in various application areas. The lecture presents the MOUSETRAP pipeline as an example of an ultra-high-speed transition signaling asynchronous pipeline and explains its implementation and control signaling. The lecture also introduces the concept of forks and joins and discusses their implementation and the potential problems with linear pipelining.

olvera
Download Presentation

Clockless Computing Lecture 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clockless ComputingLecture 3 Montek Singh Thu, Aug 30, 2007

  2. Handshaking Example:Asynchronous Pipelines Pipelining basics Fine-grain pipelining Example Approach: MOUSETRAP pipelines

  3. Background: Pipelining fetch decode execute A “coarse-grain” pipeline (e.g. simple processor) A “fine-grain” pipeline (e.g. pipelined adder) What is Pipelining?: Breaking up a complex operation on a stream of data into simpler sequential operations Storage elements(latches/registers) Performance Impact: + Throughput: significantly increased (#data items processed/second) – Latency:somewhat degraded (#seconds from input to output)

  4. Focus of Asynchronous Community A Key Focus: Extremely fine-grain pipelines • “gate-level” pipelining = use narrowest possible stages • each stage consists of only a single level of logic gates • some of the fastest existing digital pipelines to date Application areas: • general-purpose microprocessors • instruction pipelines: often 20-40 stages • multimedia hardware (graphics accelerators, video DSP’s, …) • naturally pipelined systems, throughput is critical; input “bursty” • optical networking • serializing/deserializing FIFO’s • string matching? • KMP style string matching: variable skip lengths

  5. MOUSETRAP: Ultra-High-SpeedTransition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001 & IEEE Trans. VLSI June 2007

  6. MOUSETRAP Pipelines Simple asynchronous implementation style, uses… • standard logic implementation: Boolean gates, transparent latches • simple control:1 gate/pipeline stage MOUSETRAP uses a “capture protocol:” Latches … • are normally transparent: beforenew data arrives • become opaque: afterdata arrives (“capture” data) Control Signaling:transition-signaling = 2-phase • simple protocol: req/ack = only 2 events per handshake (not 4) • no “return-to-zero” • each transition (up/down) signals a distinct operation Our Goal: very fast cycle time • simple inter-stage communication

  7. MOUSETRAP: A Basic FIFO Stages communicate usingtransition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En doneN reqN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline

  8. MOUSETRAP: A Basic FIFO (contd.) Latch is disabled when current stage is “done” Latch is re-enabled when next stage is “done” Latch controller (XNOR) acts as “protocol converter”: • 2 distinct transitions (up or down)  pulsed latch enable Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN reqN+1 doneN Data in Data out Data Latch Stage N-1 Stage N Stage N+1

  9. MOUSETRAP: FIFO Cycle Time 3 Latch Controller 2 ackN-1 ackN En reqN reqN+1 doneN 1 2 Data in Data out Data Latch Fast self-loop: N disables itself Stage N-1 Stage N Stage N+1 Cycle Time = N re-enabled to compute N+1 computes N computes

  10. Detailed Controller Operation • One pulse per data item flowing through: • down transition:caused by“done” of N • up transition:caused by“done” of N+1 Stage N’s Latch Controller ackfrom N+1 donefrom N to Latch

  11. MOUSETRAP: Pipeline With Logic logic logic logic Logic Blocks:can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: • each“req”must arrive after data inputs valid and stable Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN+1 reqN delay delay delay doneN Data Latch Stage N-1 Stage N Stage N+1

  12. Complex Pipelining: Forks & Joins fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures • Forks: distributedata + controlto multiple destinations • Joins: mergedata + controlfrom multiple sources • Enabling technology for building complex async systems Problems with Linear Pipelining: • handles limited applications; real systems are more complex

  13. Forks and Joins: Implementation ack1 C ack ack2 req1 C req req req2 Stage N Stage N Join:merge multiple requests Fork:merge multiple acknowledges

  14. Performance, Timing and Optzn. Stage Latency = Cycle Time = MOUSETRAP with Logic:

  15. Timing Analysis Latch Controller ackN-1 ackN reqN+1 reqN delay delay doneN logic logic Data Latch Stage N Stage N-1 Main Timing Constraint: avoid “data overrun” (hold time) Data must be safely “captured” by Stage N before new inputs arrive fromStage N-1 • simple 1-sided timing constraint: fast latch disable • Stage N’s “self-loop” faster than entire path thru prior stage

  16. Experimental Results • Simulations of FIFO’s: • ~3 GHz (in 0.13u IBM process) • Recent fabricated chip: GCD • ~2 GHz simulated speed • Chips tested to be fully functional • Will show demo later

  17. In-Class Exercise • Modify MOUSETRAP to remove the “data overrun” timing constraint • How is the performance affected?

  18. Homework #3 (due Tue Sep 11, 2007) • Read MOUSETRAP paper [TVLSI Jun ’07] • Modify MOUSETRAP to reduce power consumption • Make the latches normally opaque • Latches become transparent only when new data arrives at their inputs • Prevents glitchy/garbage data from propagation • How is the performance (throughput, latency) affected?

  19. MOUSETRAP Advanced Topics

  20. Special Case: Using “Clocked Logic” pull-up network A B “keeper” “keeper” logic inputs logic inputs En En logic output logic output En En pull-down network A B A General C2MOS gate Clocked-CMOS = C2MOS: eliminate explicit latches • latch folded into logic itself C2MOS AND-gate

  21. Gate-Level MOUSETRAP: with C2MOS En,En 2 2 2 (ack,ack’) (done,done’) (En,En’) Use C2MOS:eliminate explicit latches New Control Optimization =“Dual-Rail XNOR” • eliminate 2 inverters from critical path Latch Controller ackN-1 ackN 2 2 2 doneN 2 2 reqN reqN+1 pair of bit latches C2MOS logic Stage N-1 Stage N+1 Stage N

  22. Timing Optzn: Reducing Cycle Time Analytical Cycle Time = Goal:shorten (in steady-state operation) Steady-state = no undue pipeline congestion Observation: • XNOR switches twice per data item: • only 2nd (up) transition criticalfor performance: Solution: reduce XNOR output swing • degrade “slew” for start of pulse • allows quick pulse completion: faster rise time Still safe when congested:pulse starts on time • pulse maintained until congestion clears

  23. Timing Optzn (contd.) “optimized” XNOR output “unoptimized” XNOR output N’s latch disabled N’s latch re-enabled N “done” N+1 “done” latch only partly disabled; recovers quicker! (no pulse width requirement)

  24. Comparison with Wave Pipelining Two Scenarios: • Steady State: • both MOUSETRAP and wave pipelines act like transparent “flow through” combinational pipelines • Congestion: • right environment stalls: each MOUSETRAP stage safely captures data • internal stage slow: MOUSETRAP stages to its left safely capture data congestion properly handled in MOUSETRAP Conclusion: MOUSETRAP has potential of… • speed of wave pipelining • greater robustness and flexibility

  25. Timing Issues: Handling Wide Datapaths En En reqN doneN reqN+1 reqN doneN reqN+1 Stage N-1 Stage N Stage N-1 Stage N Buffers inserted to amplify latch signals (En): Reducing Impact of Buffers: • control uses unbuffered signals  buffer delay off of critical path! • datapath skewed w.r.t. control Timing assumption: buffer delays roughly equal

More Related