1 / 43

Asynchronous Datapath Design

Asynchronous Datapath Design. Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …. Asynchronous Adder Design. Motivation Background: Sync and Async adders Delay-insensitive carry-lookahead adders Complexity Analysis Conclusions. Motivation.

lowri
Download Presentation

Asynchronous Datapath Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asynchronous Datapath Design • Adders • Comparators • Multipliers • Registers • Completion Detection • Bus • Pipeline • …..

  2. Asynchronous Adder Design • Motivation • Background: Sync and Async adders • Delay-insensitive carry-lookahead adders • Complexity Analysis • Conclusions

  3. Motivation • Integer addition is one of the most important • operations in digital computer systems • Statistics shows that in a prototypical RISC • machine (DLX) 72% of the instructions perform • additions(or subtractions) in the datapath. • In ARM processors it even reaches 80%. • The performance of processors is significantly • influenced by the speed of their adders.

  4. Background • Adders: synchronous or asynchronous • synchronous adders: worst case performance • asynchronous adders: average case performance • For example: • Ripple-Carry Adders(synchronous): O(n) • Carry-Completion Sensing Adders(asynchronous): • O(log n)

  5. Background: Binary Addition • Worst case • 00000001 • + 11111111 • ---------------------- • S 00000000 • C 11111111 • ---------------------- • 100000000 • Adders can perform average case behavior • Best case • 00000000 • + 00000000 • ---------------------- • S 00000000 • C 00000000 • ---------------------- • 000000000

  6. Background • Ripple-Carry Adders: • One-stage full adder: • Logic complexity: O(n) • Time complexity: O(n)

  7. Background • Carry-Sensing Completion Detection Adders: • (asynchronous version of RCA)

  8. Background • One-stage CSCD Adder: • Carry-Sensing Completion Detection Adders: • Logic complexity: O(n) • Time complexity: O(log n)

  9. Background • Delay-Insensitive Ripple-Carry Adders: • (DI version of RCA):

  10. Background • One-stage DIRCA: • DIRCA Adders: • Logic complexity: O(n) • Time complexity: O(log n) • One of the most robust adders

  11. Background • Completion detection for asynchronous adders:

  12. Background • DI adder VS Bundling Constraint adder:

  13. Carry-Lookahead Adders • RCA requires n stage-propagation delays. • For high speed processors, this scheme is • undesirable. • One way to improve adder performance is to • use parallel processing in computing the carries. • That is why Carry-Lookahead Adders (CLA) are • introduced. • CLAs: • Logic complexity: O(n) • Time complexity: O(log n)

  14. Carry-Lookahead Adders

  15. Carry-Lookahead Adders • A module: • B module:

  16. DI Carry-Lookahead Adders • Delay-Insensitive Carry-Lookahead Adders (DICLA) • may be implemented by using delay-insensitive code. • 1. dual-rail signaling: inputs, sums, and carry bits • 2. one-hot code: internal signals a. No data b. valid 0 c. valid 1 d. illegal A1=0 A0=0 A1=0 A0=1 A1=1 A0=0 A1=1 A0=1 a. No data: 000 b. 001 c. 010 d. 100

  17. QDI Carry-Lookahead Adders • DI C module: • 1. internal signals: • one-hot code, • k, g, p • 2. input and • sum bits: • dual-rail signals CLA A module

  18. QDI Carry-Lookahead Adders • DI D module: • 1. Internal signals: • one-hot code, • K, G, P • 2. Carry bits: • dual-rail signals CLA B module

  19. DI Carry-Lookahead Adders

  20. DI Carry-Lookahead Adders k3,g3 If A3=B3 then C3 is carry kill or generate

  21. DI Carry-Lookahead Adders k3,g3 K3,2, G3,2 G3,2, K3,2 can be used to speed up the carry computation too.

  22. Speeding Up DICLA • Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these • information to compute carries immediately. • D module with speed-up circuitry

  23. Speeding Up DICLA • General form: • D module with speed-up circuitry • for carry-kill • for carry-generate • = gj-1+gj-2Pj-1+…+g0p1p2…pj-1 • This is in fact the full carry-lookahead scheme.

  24. Speeding Up DICLA • Problem of full carry-lookahead scheme • practical limitations on fan-in and fan-out, • irregular structure, and many long wire. • logic complexity increases more than linearly • Solution: use the properties of tree-like structure • New speed-up circuitry:

  25. SP focuses on the root • node of a subtree. • All leftmost root node of • its right subtree

  26. Power of Speed-up Circuitry x : carry chain x’ in r subtree x-x’ in l subtree

  27. Power of Speed-up Circuitry Without Speed-up circuitry

  28. Power of Speed-up Circuitry With Speed-up circuitry

  29. Optimization: • Simplified D module • Simplified D’ module • Better logic complexity • Delay-Insensitive again

  30. Complexity Analysis • DICLASP • Logic Complexity: (n) • Time Complexity: (log log n) • Best area-time efficiency: (n log log n)

  31. Complexity Analysis

  32. CMOS: C module

  33. CMOS: SD module

  34. CMOS: SD’ module

  35. SPICE Simulation: • SPICE Simulation contains two parts: • Random number inputs: • 10000 random generated input pairs • Statistical data: • running examples on a 32-bit ARM • emulator

  36. SPICE Simulation: • Random number input distribution

  37. SPICE Simulation: • SPICE simulation results: random number inputs • Speedup: DIRCA vs RCA: 6.39 • DICLASP vs CLA: 2.64

  38. SPICE Simulation: • Breakdown of addition/subtraction operations: • by runing three benchmark programs: • Dhrystone f1, Dhrystone f2 and Espresso dc2 • on a 32-bit ARM simulator

  39. SPICE Simulation:dynamic traces

  40. SPICE Simulation: • dynamic traces • 83.92% instructions: |carry chain| <17

  41. SPICE Simulation: • SPICE simulation results: dynamic traces • Average computation time: • DIRCA 9.61ns • DICALSP 5.25ns • Speedup: DIRCA vs RCA: 4.1 • DICLASP vs CLA: 2.2

  42. Conclusion • DICLASP • Best area-time efficiency: (n log log n) • Correctness: No adder is more robust than DICLASP • Cost(Logic Complexity):No parallel adder is cheaper than DICLASP ((n)). • Speed(Time Complexity):No adder is better than DICLASP ((log log n)). • Suitable for VLSI implementation.

More Related