160 likes | 189 Views
Explore the concept of fine-grain pipelining using the MOUSETRAP approach, enhancing throughput and enabling complex asynchronous systems with high-speed transition signaling. Discover how this asynchronous pipeline design integrates simple logic blocks for optimal performance and efficiency. Learn about the implementation, control signaling, and benefits of MOUSETRAP pipelines in various application areas, such as microprocessors, multimedia hardware, and optical networking. Gain insights into the practical implications, challenges, and experimental results of this innovative logic design methodology.
E N D
COMP290-084Clockless Logic and Silicon CompilersLecture 3 Montek Singh Tue, Jan 24, 2006
Handshaking Example:Asynchronous Pipelines Pipelining basics Fine-grain pipelining Example Approach: MOUSETRAP pipelines
Background: Pipelining fetch decode execute A “coarse-grain” pipeline (e.g. simple processor) A “fine-grain” pipeline (e.g. pipelined adder) What is Pipelining?: Breaking up a complex operation on a stream of data into simpler sequential operations Storage elements(latches/registers) Performance Impact: + Throughput: significantly increased (#data items processed/second) – Latency:somewhat degraded (#seconds from input to output)
Focus of Asynchronous Community A Key Focus: Extremely fine-grain pipelines • “gate-level” pipelining = use narrowest possible stages • each stage consists of only a single level of logic gates • some of the fastest existing digital pipelines to date Application areas: • general-purpose microprocessors • instruction pipelines: often 20-40 stages • multimedia hardware (graphics accelerators, video DSP’s, …) • naturally pipelined systems, throughput is critical; input “bursty” • optical networking • serializing/deserializing FIFO’s • string matching? • KMP style string matching: variable skip lengths
MOUSETRAP: Ultra-High-SpeedTransition-Signaling Asynchronous Pipelines Singh and Nowick, Intl. Conf. on Computer Design (ICCD), September 2001
MOUSETRAP Pipelines Simple asynchronous implementation style, uses… • standard logic implementation: Boolean gates, transparent latches • simple control:1 gate/pipeline stage MOUSETRAP uses a “capture protocol:” Latches … • are normally transparent: beforenew data arrives • become opaque: afterdata arrives (“capture” data) Control Signaling:transition-signaling = 2-phase • simple protocol: req/ack = only 2 events per handshake (not 4) • no “return-to-zero” • each transition (up/down) signals a distinct operation Our Goal: very fast cycle time • simple inter-stage communication
MOUSETRAP: A Basic FIFO Stages communicate usingtransition-signaling: Latch Controller 1 transition per data item! ackN-1 ackN En doneN reqN reqN+1 Data in Data out Data Latch Stage N-1 Stage N Stage N+1 2nd data item flowing through the pipeline 1st data item flowing through the pipeline 1st data item flowing through the pipeline
MOUSETRAP: A Basic FIFO (contd.) Latch is disabled when current stage is “done” Latch is re-enabled when next stage is “done” Latch controller (XNOR) acts as “protocol converter”: • 2 distinct transitions (up or down) pulsed latch enable Latch Controller 2 transitions per latch cycle ackN-1 ackN En reqN reqN+1 doneN Data in Data out Data Latch Stage N-1 Stage N Stage N+1
MOUSETRAP: FIFO Cycle Time 3 Latch Controller 2 ackN-1 ackN En reqN reqN+1 doneN 1 2 Data in Data out Data Latch Fast self-loop: N disables itself Stage N-1 Stage N Stage N+1 Cycle Time = N re-enabled to compute N+1 computes N computes
Detailed Controller Operation • One pulse per data item flowing through: • down transition:caused by“done” of N • up transition:caused by“done” of N+1 Stage N’s Latch Controller ackfrom N+1 donefrom N to Latch
MOUSETRAP: Pipeline With Logic logic logic logic Logic Blocks:can use standard single-rail (non-hazard-free) “Bundled Data” Requirement: • each“req”must arrive after data inputs valid and stable Simple Extension to FIFO: insert logic block + matching delay in each stage Latch Controller ackN-1 ackN reqN+1 reqN delay delay delay doneN Data Latch Stage N-1 Stage N Stage N+1
Complex Pipelining: Forks & Joins fork join Non-Linear Pipelining: has forks/joins Contribution: introduce efficient circuit structures • Forks: distributedata + controlto multiple destinations • Joins: mergedata + controlfrom multiple sources • Enabling technology for building complex async systems Problems with Linear Pipelining: • handles limited applications; real systems are more complex
Forks and Joins: Implementation ack1 C ack ack2 req1 C req req req2 Stage N Stage N Join:merge multiple requests Fork:merge multiple acknowledges
Performance, Timing and Optzn. Stage Latency = Cycle Time = MOUSETRAP with Logic:
Timing Analysis Latch Controller ackN-1 ackN reqN+1 reqN delay delay doneN logic logic Data Latch Stage N Stage N-1 Main Timing Constraint: avoid “data overrun” Data must be safely “captured” by Stage N before new inputs arrive fromStage N-1 • simple 1-sided timing constraint: fast latch disable • Stage N’s “self-loop” faster than entire path through previous stage
Experimental Results • Simulations of FIFO’s: • ~3 GHz (in 0.13u IBM process) • Recent fabricated chip: GCD • ~2 GHz simulated speed • chips awaited