1 / 41

Practical Design and Performance Evaluation of Completion Detection Circuits

Practical Design and Performance Evaluation of Completion Detection Circuits. Fu-Chiung Cheng Department of Computer Science Columbia University. Reading 4. Outline. Motivation Previous Work New Completion Detection Circuit Performance Evaluation Conclusion. Motivation.

karis
Download Presentation

Practical Design and Performance Evaluation of Completion Detection Circuits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University Reading 4

  2. Outline • Motivation • Previous Work • New Completion Detection Circuit • Performance Evaluation • Conclusion

  3. Motivation • Circuits: Synchronous or Asynchronous. • Synchronization: • Sync: a global clock • Async: start and completion • mechanisms

  4. Motivation • Potential advantages of async. design: • No clock skew problem, • Low power consumption, • Average-case performance, • Modularity, composability and reusability • Easier technology migration • The promise of high performance is • especially attractive.

  5. Motivation • High performance async. design: • 1. fast self-timed components with • good average case performance • 2. fast completion detection circuits, • detecting the completion. 0 0 0 0 A A B B S S S S Ack0 Ackn-1 Self-timed component 1 0 1 0 + . . . . . . . . . C DoneReset 0 0 0 n-1 1 0 1 n-1 +

  6. Motivation • High performance async. design: • 1. fast self-timed components with • good average case performance • 2. fast completion detection circuits, • detecting the completion. 0 0 0 0 A A B B S S S S Ack0 Ackn-1 Self-timed component 1 0 1 0 + . . . . . . . . . C DoneReset 0 0 0 n-1 1 0 1 n-1 +

  7. Motivation • Fast self-timed components: • 1. Delay-insensitive carry-lookahead adders • 2.Delay-insensitive comparators:

  8. Motivation • Fast completion detection circuits: • 1. Completion detection circuits (CDCs) are • considered as the major overhead. • 2. This paper address the design of • fast completion detection circuits.

  9. Previous Work: • Self-timed components may use • 1. bundled data protocol • 2.dual-rail signaling

  10. Previous Work: • CDCs for bundled data components • 1. Delay elements (an inverter chain). • delay > worst case delay. • 2.Speculative completion [Nowick97] • performance depend on • A. number of matched delays and • B. associated abort detection network • 3. Current-Sensing Completion-Detection • [Dean94,Grass96] • A. consume substantial power • B. requires several gate delays

  11. Previous Work: • CDCs for dual-rail self-timed components • 1. General model: • A. n two-input ORs • B. 1 n-input C-element • 2.Operations: • A. computation cycle: DoneReset=1 • B. reset cycle: DoneReset=0 0 0 0 0 A A B B S S S S Self-timed component Ack0 Ackn-1 1 0 1 0 + DoneReset . . . . . . . . . C 0 0 0 n-1 1 0 1 n-1 +

  12. Previous Work: • N-input C-element: a tree of 2-input C-elms • 1. long delay • 2. large variance Ack0 Ack1 Ackn-2 Ackn-1 …. C C …. …. C C …. C

  13. Previous Work: • N-input C-element: • 1. More efficient implementation: • DoneReset = (done+reset DoneReset) • A. done circuit: an n-input AND • done = Ack0 Ack1 …Ackn-1 • B. reset: circuit: an n-input OR • reset = Ack0 + Ack1 + …+Ackn-1 • C. a 2-input C-elem. • 2. delay & variance: • better than the tree • of 2-input C-elem Ack0 Ackn-1 & done reset . . . DoneReset C Ack0 Ackn-1 + . . .

  14. Previous Work: • Wuu’s CDCs [Wuu93]: • A. done circuit: a tree of NAND • B. reset circuit: a tree of NOR • C. long delay • D. small variance • E. use static gates done reset

  15. Previous Work: • Yun’s CDCs [Yun97]: • A. done circuit: • a tree of • domino logic • B. no reset circuit • C. variant delay • D. large variance • E. use dynamic • CMOS

  16. Our Design • Computation Completion detection circuits • (dynamic n-input NOR) • (static 2-input NOR)

  17. Our Design • Reset Completion detection circuits • (dynamic 2n-input Or)

  18. Our Design • Computation cycle: • For the done signal, • 1. the PMOS transistor (Acki) will be closed and • 2. all NMOS transistors will be open. • 3. Thus, the done signal will be turned on.

  19. Our Design • Computation cycle: • For the reset signal, • the reset signal is turned on as soon as • any Acki signal goes high

  20. Our Design • Reset cycle: • For the done signal, • the done signal is turned off as soon as • any Acki signal is turned off

  21. Our Design • Reset cycle: • For the reset signal, • the reset signal is turned off only after all • Acki signals are turned off.

  22. Our Design • done + reset circuits • = dual-rail multi-input C-element • done + reset circuits + 2-input C-element • = single-rail multi-input C-element • Implementation of 2-input C-element:

  23. DIRCA With CDC: part 1

  24. DIRCA With CDC: part 2

  25. Our Design • The PMOS in the pull-up circuit of the done • circuit saves power in non-operation mode. • In a quiescent state, all Acki signals are zero. • All pull-down transistors are closed. • To save power, pull-up transistor is open to cut off • the path from Vdd to Ground.

  26. Our Design • Input low arrives too early, power is wasted. • Input low arrives too late, take a longer time • to turn on the done signal. • Low power consumption latest Acki signal • High performance any not-latest Acki signal

  27. SPICE Output: done circuit Delay=0.55ns ChengDone0: 1. Ack0 is the latest signal. 2. input pulses: 3 and 4 3. buffered input:1004 4. Ack0:100 5. Done:24680 6. DoneReset: 200

  28. SPICE Output: done circuit Delay=0.22ns ChengDone1: 1. Ack1 is the latest signal. 2. input pulses: 5 and 6 3. buffered input:1006 4. Ack1:101 5. Done:24680 6. DoneReset: 200

  29. SPICE Output: done circuit Delay=0.64ns ChengDone37: 1. All Ack arrive at the same time 2. Done:24680 3. DoneReset: 200

  30. SPICE Output: reset circuit Delay=1.23ns ChengReset0: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input:1004 5. Reset:13579 6. DoneReset: 200

  31. SPICE Output: reset circuit Delay=0.87ns ChengReset1: 1. Ack0 is the latest signal. 2. input pulse: 3 and 4 3. buffered input:1004 5. Reset:13579 6. DoneReset: 200

  32. SPICE Output: reset circuit Delay=1.34ns ChengReset37: 1. All Ack reset at the same time 2. Done:24680 3. DoneReset: 200

  33. Our Design • Constraint: when conducting, • when only one pull-down transistor is conducting. • This can be achieved by properly sizing transistors.

  34. Logic Complexity # of transistors

  35. Performance Evaluation • SPICE Simulation: • 1. use MOSIS 2 micron CMOS level 2 parameters • 2. W=3u L=2u (buffer 0.4 ns 2-input Nor 0.18ns) • Computation-completion detection circuits • 38 typical cases (for Wuu, Yun and Cheng) • The delay measured includes the delay of • the OR gate for Acki. • Reset-completion detection circuits: • 38 typical cases (Wuu and Cheng)

  36. Performance Evaluation

  37. Performance Evaluation

  38. Conclusions • A new completion detection circuit for • dual-rail self-timed components. • 1. very fast computation-completion detection • 2. very fast reset-completion detection • Low-overhead, very fast completion detection • circuit is crucial for high performance • self-timed circuits.

  39. Conclusions • SPICE simulation results: • 1. our computation-completion detection circuit • 9 times faster than Wuu's and Yun's • 2. our reset-completion detection circuit: • 4 times faster than Wuu's.

More Related