Recap: Lectures 5 & 6 Classic Pipeline Styles

Recap: Lectures 5 & 6Classic Pipeline Styles Williams and Horowitz’s PS0 pipeline Sutherland’s micropipelines

Different Points in the Design Space Williams/Horowitz’s PS0: • Dual-rail • Data-dependent completion • Dynamic logic • No extra latches • “Zero-overhead” latency • 4-phase handshakes: resetting overhead Sutherland’s micropipelines: • Single-rail • Worst case matched delay • Statuc logic • Explicit latches • Latch latencies = overhead • Elegant transition signaling

PS0 Protocol 4 3 indicates “done” 6 5 1 2 3 • PRECHARGE N: when N+1 completes evaluation • delete data:after next stage has copied it • EVALUATE N: when N+1 completes precharging • accept new data: after next stage is emptied indicates “done” indicates “done” N N+1 N+2 precharges evaluates evaluates evaluates Complete cycle: 6 events Evaluate  Precharge: 3 events Precharge  Evaluate: another 3 events

PS0 Performance 6 4 Cycle Time = 5 1 2 3

Drawbacks of PSO Pipelining • Poor throughput: • long cycle time: 6 events per cycle • data “tokens” are forced far apart in time • Limited storage capacity: • max only 50% of stages can hold distinct tokens • data tokens must be separated by at least one spacer Our Research Goals: address both issues • still maintain very low latency

Lecture 7: Recent Approaches

Recent Approaches 3 novel styles for high-speed async pipelining: • “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] • “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] • MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal:significantly improve throughput of PS0 Two Distinct Strategies: • LP: introduce protocol optimizations • “shave off” components from critical cycle • HC: fundamentally new protocol • greater concurrency: “loosely-coupled” stages  

Outline Dynamic circuit style Static circuit style • New Asynchronous Pipelines: • Lookahead Pipelines (LP) • High-Capacity Pipelines (HC) • MOUSETRAP Pipelines

Lookahead Pipelines: Strategy #1 Use non-neighbor communication: • stage receives information from multiple later stages • allows “early evaluation” Benefit: stage gets head-start on next cycle

Lookahead Pipelines: Strategy #2 Use early completion detection: • completion detector moved before stage (not after) • stage indicates“early done”in parallel with computation early completion detector Benefit: again, stage gets head-start on next cycle

Lookahead Pipelines: Overview 5 New Designs: • “Dual-Rail” Data Signaling: • LP3/1:“early evaluation” • LP2/2:“early done” • LP2/1:“early evaluation” + “early done” • “Single-Rail” Bundled-Data Signaling: • LPSR2/2:“early done” • LPSR2/1:“early evaluation” + “early done”

Dual-Rail Design #1: LP3/1 Optimization = “early evaluation” • each stage has two control inputs: from stages N+1 and N+2 Idea: shorten precharge phase • terminate precharge early: when N+2 is done evaluating PC Eval Data in Data out N N+1 N+2 Completion Detector ProcessingBlock From N+2

LP3/1 Protocol New! 4 3 N+1 indicates “done” 3 1 2 Enables “early evaluation!” • PRECHARGEN:when N+1 completes evaluation • EVALUATEN:whenN+2completes evaluation N+2 indicates “done” N N+1 N+2 N+2 evaluates N evaluates N+1 evaluates

LP3/1: Comparison with PS0 indicates “done” Enables “early evaluation!” 4 3 evaluates evaluates evaluates EVALUATE N: when N+2 completes evaluation PRECHARGE N: when N+1completes evaluation indicates “done” EVALUATE N: when N+1 completes precharging 5 4 6 3 1 2 3 3 1 2 evaluates evaluates evaluates N N+1 N+2 LP3/1 Only 4 events in cycle! N N+1 N+2 PS0 6 events in cycle

LP3/1 Performance 4 3 1 2 Cycle Time = saved path Savings over PS0:1 Precharge + 1 Completion Detection

LP3/1: Inside a Stage “old Eval” PC (From Stage N+1) Eval (From Stage N+2) “early Eval” NAND Merging 2 Control Inputs: A NAND gate merges2 control inputs: • Precharge whenPC=1(and Eval=0) • Evaluate “early”whenEval=1(or PC=0) • Problem: “early”Eval=1 is non-persistent! • may be de-asserted before stage completes evaluation!

LP3/1 Timing Constraints: Example PC (From Stage N+1) Eval (From Stage N+2) NAND Observation:PC=0soon after Eval=1, and is persistent Solution:no change! use PC as safe“takeover”for Eval! Timing Constraint:PC=0must arrivebeforeEval de-asserted • simple one-sided timing requirement • other constraints as well… all easily satisfied in practice Problem (cont.):“early”Eval=1 non-persistent

Dual-Rail Design #2: LP2/2 Optimization = “early done” • Idea: move completion detector beforeprocessing block • stage indicates when“about to”precharge/evaluate “early” Completion Detector “early done” Data in Data out Processing Block

LP2/2 Protocol “early done” of N+1 eval 2 3 “early done” of N+1 prech 4 “early done” of N+2 eval 1 2 3 Completion Detection: performedin parallel with evaluation/precharge of stage N N+1 N+2 N evaluates N+1 evaluates

LP2/2 Performance 3 Cycle Time = 4 1 2 LP2/2 savings over PS0: 1 Evaluation + 1 Precharge

Dual-Rail Design #3: LP2/1 Hybrid of LP3/1 and LP2/2… Combines: • early evaluationof LP3/1 • early doneof LP2/2 Cycle time:Best of our dual-rail lookahead pipelines…

Recap: Lectures 5 & 6 Classic Pipeline Styles