260 likes | 365 Views
Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab. High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link. R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar. March 12, 2007. Presentation Outline. Why Serial Link?
E N D
Technion – Israel Institute of Technology Electrical Engineering Department – VLSI Lab High Rate Wave-pipelinedAsynchronous On-chip Bit-serialData Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar March 12, 2007
Presentation Outline • Why Serial Link? • Fast Asynchronous Serial Link • Transmitter, Fast LEDR Encoder • Receiver, Fast Toggle Circuit • Channel, Current Mode Async Signaling • Performance • Summary
Why Serial Link? Less interconnect area Less routing congestion Less coupling Less power (depends on range) The relative improvement grows with technology scaling. The example on the right refers to: Single gate delay serial link Fully-shielded parallel link with 8gate delay clock cycle Equal bit-rate Word width N=8 Serial Link Employment Benefits Link Length [mm] Serial Link dissipates less power Parallel Link dissipates less power Serial Link requires less area Parallel Link requires less area Technology Node [nm]
Serial Link Applications • P2P long-range interconnect • Long range NoC links • Pin-limited on-chip module interfaces • Presently chips are pin-limited, and that will migrate inside • Cross-bar • Simpler routing and congestion • Communications inside many-core CMPs
Serial Link – Top Structure • Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS) • Acknowledge per word instead of per bit • Wave-pipelining over channel • Differential encoding (DS-DE, IEEE1355-95) • Low-latency synchronizers
Encoding –Two Phase NRZ LEDR Uncoded (B) Phase bit (P) State bit (S) 0 0 0 0 1 0 1 1 0 0 • Two Phase Non-Return-to-Zero Level Encoded Dual Rail • “delta” encoding (one transition per bit)
Transmitter – Fast SR Approach Transition Generator • Targeted Speed: One gate delay between bits
Wave-pipelined Control Characteristics • The highest speed (the single gate-delay cycle) relates to the pole of the Bode diagram • This operating point results in signal degradation along the inverter chain Single Gate Delay Rate
Splitter Architecture • The shift-register is partitioned into M shift-registers • M slower operation in each shift-register • Signal is no longer degraded • Single gate-delay operation is localized to output (input) stage only
Transmitter – SPICE Simulation (65nm node) Simulations done at
Toggle Circuit • Straightforward implementation (fundamental asynchronous state machine) is too slow (supports only ~1.5 gate delay cycle) • Novel toggle: • Single gate delay operation support • Internal and output latches
Channel • Four transmission lines (DS-DE) • High metal layers utilization • Metals 5-8 of 65nm process • RLC modeled • Careful layout • Small crosstalk • Small relative variations
LEDR Interconnect Layout P S P S P S
Differential Channel Driver and Receiver • Current mode differential low-swing signaling • Currents in opposite directions • Controllable current return path P / S P / S
Channel Characteristic Impedance S • Z depends on F • Voltage changes with F • Fast changes voltage drifts • The drifts bound the operating speed Z F S Based on data from BPTM. Drawn for constant R, L, C
Channel Driver with Adaptive Control • Compensates for Z changes • Turned on for low frequencies Adaptive Control Inertial Delay
Adaptive Control – Simulation Example • SPICE simulation setup: • 65nm technology, 4mm range, 67Gbps data rate • RLC modeled channel (using Raphael-like three-dimensional field solver) • Adaptive control is turned on only for low frequencies
Performance • SPICE simulation show correct operation at target data cycle of 15ps (65nm technology node) • Power for 67Gbps 4mm 16-bit word link under 100% utilization: • Total power: 150mW • Channel differential pair: 18mW • Leakage power: 4mW (due to low VT transistors employment) • Power reduction • Deeper split ( M power reduction) • Circuit optimizations • Circuit shut down during idle states
In-Die Variations • Splitter architecture • High-speed operation localized to input and output stages • High-speed components design and verification • Monte-Carlo simulations (>5) • 26 PVT Corners • Iterative design with legging and sizing for sensitive transistors • Asynchronous structure • Supports any slow down • Minimal time separation between successive bits must be provided!
Summary • High speed Serial Link requires special circuits: • Fast serializers and de-serializers • Wave-pipelined control • Splitter architecture: • Long word transmission • Power reduction • On-the-fly LEDR encoding • Adaptive control for fast asynchronous signals handling • Low crosstalk interconnect layout • Single FO4 inverter delay data cycle support (15ps on 65nm process, 67 Gbps) • The Serial Link preferred over Parallel Link thanks to: • Reduced Interconnect and Active area • Easier routing, less coupling • Reduced power for long on-chip interconnects
The End • Thank you