280 likes | 391 Views
Wagging Logic: Moore's Law will eventually fix it. Charlie Brej APT Group University of Manchester. Introduction. Quasi-Delay-Insensitive (QDI) approach Prove the high performance potential What is performance? Latency Throughput Why is async better? Average case performance
E N D
Wagging Logic: Moore's Law will eventually fix it Charlie Brej APT Group University of Manchester Group Talk
Introduction • Quasi-Delay-Insensitive (QDI) approach • Prove the high performance potential • What is performance? • Latency • Throughput • Why is async better? • Average case performance • Variability and data-dependant • Bit level pipelining Group Talk
Ensure all wire pairs are cycled up and down QDI Forward Safe Guarding C Group Talk
Viewpoint of a single output Many inputs Behaviour Group Talk
All or nothing Synchronises inputs together Behaviour Group Talk
Why is it so slow? • Delays: • Gate: 1, C-element: 2 • Stage data propagation: X • Cycle time (times 2 for set and reset): • Forward guarding: 2X • C-element for each gate • Acknowledge propagation: 2X • C-element for each fork (fork depth ~ gate depth) • About eight times slower than worst case! Group Talk
Why is four-phase so slow? • Low latency • Low throughput • Only 1/8th of the system doing useful work • Rest is resetting/completing Workie Sleepy Sleepy Sleepy Sleepy Sleepy Sleepy Sleepy Workie Sleepy Group Talk
Solutions • Ultra/Hyper/Super Pipelining • Need 8 times finer pipelining • Impossible • Each latch adds to the latency • Faster completion detection • Balanced treeing C-elements • Arranging to suit arrival order • Backward guarding • Not even close to 8x improvement Group Talk
Inspiration: Wagging Latches • Alternate latch read/write • Capacity of two latches • Depth of one latch Group Talk
Reset Set Reset Set Set Reset Set Reset Wagging Logic • Apply same method to the logic • Alternate logic allowing one to set while the other resets (precharges) Group Talk
Wagging Logic • Between wagging stages • No need to wagg • No need to synchronize • Wagg only when communication with non-wagging logic Group Talk
Non FIFO Example Group Talk
Duplicate the Logic Group Talk
Connect to Complementary Group Talk
A Harder Example Group Talk
Duplicate the Logic Group Talk
Connect to Complementary Group Talk
Triplicate the Logic Group Talk
Connect to the next on the list Group Talk
Other example Group Talk
Proof of the pudding • Simple gate level simulation • My own simulator • Delays: C-element=2, Gate=1 • Example circuits • Fibonacci sequence generators • Vertically pipelined 64bit ripple carry adder • Non-pipelined 8bit ripple carry adder • 16 input XOR • Backward and Forward guarded • Relative measurements of Speed, Power, Area • 10,000 gate delays simulation Group Talk
Synchronous Worst Case:74 64bit Fibonacci Performance Group Talk
Synchronous Worst Case:500 8bit Fibonacci Performance Group Talk
Synchronous Worst/Best Case:1250 (8 gate delays) Inc. Timing margins Inc. Flip-Flop:1000 (10 gate delays) XOR Performance Group Talk
Synchronous:610 Power Consumption Group Talk
Area Group Talk
Future work • Larger and more complex designs • Small CPU • Layout • Silicon? • Improve completion time • Current optimal wagging ~ 5 • Target ~ 3 • Fully automated flow • Verilog Input & Output • Partitioning Group Talk
Conclusions • Matching and surpassing synchronous performance every time • DI logic for performance • Very Expensive • 20 times more power • 5 times bigger (times wagging) • Fastest logic on the planet! • Discounting increase in wire delays • Assuming other things will be able to keep up Group Talk