190 likes | 336 Views
Clockless Logic. Recap: Lookahead Pipelines High-Capacity Pipelines. Recap: Lookahead Pipeline Styles. 2 Strategies: Early Evaluation Early Done. Lookahead Pipelines: Strategy #1. Use non-neighbor communication: stage receives information from multiple later stages
E N D
Clockless Logic Recap: Lookahead Pipelines High-Capacity Pipelines
Recap: Lookahead Pipeline Styles 2 Strategies: Early Evaluation Early Done
Lookahead Pipelines: Strategy #1 Use non-neighbor communication: • stage receives information from multiple later stages • allows “early evaluation” Benefit: stage gets head-start on next cycle
Lookahead Pipelines: Strategy #2 Use early completion detection: • completion detector moved before stage (not after) • stage indicates“early done”in parallel with computation early completion detector Benefit: again, stage gets head-start on next cycle
Single-Rail Styles matched delay request done request/done indicate valid data bit 1 bit 1 bit n bit m delay delay delay Adapt dual-rail styles to single-rail: • replace dual-rail function blocks by single-rail blocks • replace completion detectors by matched delays Example: LPsr2/2
Single-Rail Styles (contd.) delay delay delay Example: LPsr2/1
High-Capacity Pipelines Singh/Nowick WVLSI-00, ISSCC-02, Async-02
Recent Approaches 3 novel styles for high-speed async pipelining: • “Lookahead Pipelines” (LP) [Singh/Nowick, Async-00] • “High-Capacity Pipelines” (HC) [Singh/Nowick, WVLSI-00] • MOUSETRAP Pipelines [Singh/Nowick, TAU-00] Goal:significantly improve throughput of PS0 Two Distinct Strategies: • LP: introduce protocol optimizations • “shave off” components from critical cycle • HC: fundamentally new protocol • greater concurrency: “loosely-coupled” stages
High-Capacity Pipeline: HC stage controller pc eval ack delay delay delay Key Idea: Decouple control for pull-up and pull-down • increases pipeline concurrency initiates next cycle early • once N+1 evaluates, can enter “isolate (hold) phase” • stage N allowed to complete entire next cycle! N N+1 N+2
Inside an HC stage Decoupled control: pull-up and pull-down stacks are independently controllable: eval pc “keeper” precharge control Pull-down stack datainputs dataoutputs evaluation control • pcasserted: precharge • evalasserted: evaluate • both de-asserted: enter“isolate” (hold) phase
Cycle of an LPHC Stage Eval Eval pc=1eval=1 Isolate Isolate pc=1eval=0 Precharge pc=0eval=0 Precharge • Only a singlebackward synchronization arc: • once stage N+1 has completed Eval, N can perform entire next cycle! • why safe?: N+1 enters isolate phase … key to greater concurrency • almost all existing approaches: require 2 arcs • One (natural) forward synchronization arc: • stage N+1 evaluates new data only after N has evaluated Stage N Stage N+1
Formal Specification of Controller (Start evaluate) pc+ eval+ (Evaluate of N+1 complete) T+ (Evaluate complete) S+ eval- (Isolate) (Start precharge) pc- (Precharge of N+1 complete) T- (Precharge complete) S- Problem: Specification too concurrent for direct synthesis • desired precharge condition: N and N+1 have evaluated same data • problem: this condition not uniquely captured by given signals! • N may evaluate next data item,while N+1 stuck on current item!
Modified Specification of Controller pc+ eval+ (Evaluate of N+1 complete) T+ S+ eval- T- (Precharge of N+1 complete) pc- ok2pc+ S- ok2pc- Solution: Add a state variable ok2pc ok2pc records whether N+1 has “absorbed” N’s data item • ok2pc resets immediately when N deletes item (N precharges) • ok2pc is set when N+1 deletes item (N+1 precharges)
Controller implementation T Controller implementation is very simple: • each signal implemented using a single gate • ok2pc typically off the critical path S pc T NAND3 S aC + ok2pc eval S INV
Performance 2 2 3 N isolates 1 Cycle Time = N N+1 N+2 N enables itself for next evaluation N precharges N evaluates N+1 evaluates
Ripple-Carry Adder: One Stage A B a1 a0 b1 b0 reqab Carry-in reqc Full-Adder Stage Carry-out done cin1 cout1 cin0 cout0 sum Mixed Dual-Rail/Single-Rail Datapath: • single-rail: sum • dual-rail:A, B, Carry-in and Carry-out • must implement binate functions using unate dynamic logic
Final Adder Architecture shift-registers provide operand bits A,B carryin adder stage carryout most significant least significant sum shift-registers accumulate sum bits
Results Designed/simulated adder in each pipeline style Experimental Setup: • design: 32-bit ripple-carry-adder • technology: 0.6 HP CMOS, @3.3 V and 300°K New LPHC style: 10% faster than LPSR2/1
Conclusions Introduced 2 new asynchronous adders: • Use novel pipeline protocols: • observe events from multiple later stages • decouple control of pull-up/pull-down • Especially suitable for fine-grain (gate-level) pipelining • Very high-throughputs obtained: • 0.93-1.02 GHz in 0.6 • expected to outperform the best (IPCMOS: 3.3-4.5 GHz / 0.18) • LPHC doubles the typical storage capacity • Robustly handle arbitrary-speed environments • useful as IP’s Future Work: Layout/fabrication, application to DSP’s