1 / 26

High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link

Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab. High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link. R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar. March 12, 2007. Presentation Outline. Why Serial Link?

elias
Download Presentation

High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab High Rate Wave-pipelinedAsynchronous On-chip Bit-serialData Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar March 12, 2007

  2. Presentation Outline • Why Serial Link? • Fast Asynchronous Serial Link • Transmitter, Fast LEDR Encoder • Receiver, Fast Toggle Circuit • Channel, Current Mode Async Signaling • Performance • Summary

  3. Why Serial Link? Less interconnect area Less routing congestion Less coupling Less power (depends on range) The relative improvement grows with technology scaling. The example on the right refers to: Single gate delay serial link Fully-shielded parallel link with 8gate delay clock cycle Equal bit-rate Word width N=8 Serial Link Employment Benefits Link Length [mm] Serial Link dissipates less power Parallel Link dissipates less power Serial Link requires less area Parallel Link requires less area Technology Node [nm]

  4. Serial Link Applications • P2P long-range interconnect • Long range NoC links • Pin-limited on-chip module interfaces • Presently chips are pin-limited, and that will migrate inside • Cross-bar • Simpler routing and congestion • Communications inside many-core CMPs

  5. Serial Link – Top Structure • Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS) • Acknowledge per word instead of per bit • Wave-pipelining over channel • Differential encoding (DS-DE, IEEE1355-95) • Low-latency synchronizers

  6. Encoding –Two Phase NRZ LEDR Uncoded (B) Phase bit (P) State bit (S) 0 0 0 0 1 0 1 1 0 0 • Two Phase Non-Return-to-Zero Level Encoded Dual Rail • “delta” encoding (one transition per bit)

  7. Transmitter – Fast SR Approach Transition Generator • Targeted Speed: One gate delay between bits

  8. Fast Asynchronous Shift Register

  9. Wave-pipelined Control Characteristics • The highest speed (the single gate-delay cycle) relates to the pole of the Bode diagram • This operating point results in signal degradation along the inverter chain Single Gate Delay Rate

  10. Splitter Architecture • The shift-register is partitioned into M shift-registers • M slower operation in each shift-register • Signal is no longer degraded  • Single gate-delay operation is localized to output (input) stage only

  11. Transmitter Splitter Architecture

  12. Transmitter – SPICE Simulation (65nm node) Simulations done at

  13. Receiver

  14. Receiver Splitter Architecture

  15. Toggle Circuit • Straightforward implementation (fundamental asynchronous state machine) is too slow (supports only ~1.5 gate delay cycle) • Novel toggle: • Single gate delay operation support • Internal and output latches

  16. Channel • Four transmission lines (DS-DE) • High metal layers utilization • Metals 5-8 of 65nm process • RLC modeled • Careful layout • Small crosstalk • Small relative variations

  17. LEDR Interconnect Layout P S P S P S

  18. Differential Channel Driver and Receiver • Current mode differential low-swing signaling • Currents in opposite directions • Controllable current return path P / S P / S

  19. Channel Characteristic Impedance S • Z depends on F • Voltage changes with F • Fast changes  voltage drifts • The drifts bound the operating speed Z F S Based on data from BPTM. Drawn for constant R, L, C

  20. Channel Driver with Adaptive Control • Compensates for Z changes • Turned on for low frequencies Adaptive Control Inertial Delay

  21. Adaptive Control – Simulation Example • SPICE simulation setup: • 65nm technology, 4mm range, 67Gbps data rate • RLC modeled channel (using Raphael-like three-dimensional field solver) • Adaptive control is turned on only for low frequencies

  22. Channel Receiver Amplifier

  23. Performance • SPICE simulation show correct operation at target data cycle of 15ps (65nm technology node) • Power for 67Gbps 4mm 16-bit word link under 100% utilization: • Total power: 150mW • Channel differential pair: 18mW • Leakage power: 4mW (due to low VT transistors employment) • Power reduction • Deeper split ( M power reduction) • Circuit optimizations • Circuit shut down during idle states

  24. In-Die Variations • Splitter architecture • High-speed operation localized to input and output stages • High-speed components design and verification • Monte-Carlo simulations (>5) • 26 PVT Corners • Iterative design with legging and sizing for sensitive transistors • Asynchronous structure • Supports any slow down • Minimal time separation between successive bits must be provided!

  25. Summary • High speed Serial Link requires special circuits: • Fast serializers and de-serializers • Wave-pipelined control • Splitter architecture: • Long word transmission • Power reduction • On-the-fly LEDR encoding • Adaptive control for fast asynchronous signals handling • Low crosstalk interconnect layout • Single FO4 inverter delay data cycle support (15ps on 65nm process, 67 Gbps) • The Serial Link preferred over Parallel Link thanks to: • Reduced Interconnect and Active area • Easier routing, less coupling • Reduced power for long on-chip interconnects

  26. The End • Thank you

More Related