Low-Power Chip-to-Chip I/O

Low-Power Chip-to-Chip I/O John Poulton 4/18/13

Outline Part I • Motivation • Why low-power chip-to-chip signaling is needed • Signaling fundamentals • Channels, termination, equalization • Clocking • Signaling budgets • Signaling methods, power tradeoffs Part II • Ground-referenced Signaling • An experimental low-power link

Scaling Trends Gates x GHz Doubled every 16 months Now every 30 months 109 107 105 Signal Pins x GHz Doubled every 28 months Now every 4 years 103 10 1978 1986 1994 2000 2008 Year

The Bandwidth Gap (5 years ago) Rent’s Rule: Number of pins = K x Gatesa (IBM, 1960) K = 0.82, a = 0.45 for early Microprocessors 109 107 B 105 Pins x GHz from Rent’s Rule 103 10 Bandwidth Gap: ~500 x and growing! 1978 1986 1994 2000 2008 Year

The Computer Architecture Crisis

Present Day Situation • Power is flat (150W, air cooled) • Frequency is flat (see power) • Performance per processor is flat • Transistor count still going up.. • ..but transistors much more expen$ive • Can’t run all of them at highest speed (see power) • Today’s challenges: • Parallel computing (programming) • Energy efficiency

By 2015.. …Will need 20X today’s off-chip bandwidth • Packaging improving only at 10%/year • Maybe 1,400 signal pins/chip • Power already at limits of air cooling • For processors and GPU’s, will need: • 1 TB/sec, through 350 transceivers, @ <10W • 25 Gb/sec transceivers at <1 mW/Gb/sec (pJ/bit) • Portable electronics is major market • Low active power important, but.. • Low standby power is critical (battery life).

Transistor Technology Changing.. 1/Gds Id Pre-2005 (Constant-field scaling) Gm Saturation Gm/Gds large Good for “analog” (current-mode ckts) Linear Vds Id Now (Constant-voltage scaling) Gm/Gds small Bad for current-mode Linear Saturation Vds

In the (near) Future.. • FinFETs will offer • Slightly better Gm/Gds • Limited choice of L and W • Short-L and slightly-less-short L • W replaced by “N” • No long-channel devices at all.. • ..except maybe legacy high-voltage FETs • Current-mode analog problematic • Paradox: • limited ability to mold transistors to our needs, but circuit design increasingly important

Signaling Fundamentals

What’s a signaling system? ..0100110.. ..0100110.. Transmitter: converts on-chip bits into a signal (a continuous function of voltage or current in time) Channel: physical structure (packages, connectors, cables, PC-board traces) that delivers the signal from transmitter to receiver Receiver: converts the signal (degraded by the channel) back into bits

Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Termination Unlike on-chip wires, which are RC transmission lines, chip-to-chip interconnect structures are LC T-lines Lincr Rincr Gincr Cincr LC lines have a ‘characteristic impedance’ Z0 = (L/C)1/2 that relates instantaneous voltage and current on the line

Termination.. • Must terminate one, or both, ends of an LC line • Termination must be resistive at Z0 to prevent unwanted reflections (which look like noise at the receiver!). • Termination must be built into transmitters and receivers. Z0 R = Z0

Termination.. Z0 = 50W 50W 0.3F At 20 Gb/sec (10 GHz), 0.3pF ~ 50W This termination will generate a 20-30% reflection. Shunt capacitance is bad! Also, unavoidable..

Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Channels Package PC Boards Connector Via Typical backplane interconnect Typical chip-to-chip interconnect

Frequency-dependent Loss • Conductors and dielectrics both lossy • Loss in 0.5m of typical FR-4 Skin Effect ~ F 1/2 Dielectric Loss ~ F Overall Loss • For low-cost PC boards, dielectric loss dominates • In exotic materials, 2 effects are comparable

Backplane vs Chip-to-chip Channels Connectors and vias introduce reflections, ‘notches’, crosstalk Bare 20” FR4 Trace 20” Backplane Channel > 100 MHz Flat Channel no attn 0.1 - 1 GHz Moderate Attn few dB/octave > 1 GHz Strong Attn > 5 dB/octave Chip-to-chip channels more like bare boards, and generally less than 20”

Why is attenuation bad? Frequency-dependent attenuation in channel produces inter-symbol interference (ISI): Vin Lone 1 in stream of 0’s undetectable Vout Equalization required when attenuation more than a few dB per octave

Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Equalization Filter can go at either end of the channel..

Transmitter Equalization ..or “Transmitter Pre-emphasis” Equivalent to a 2-tap FIR filter. Dally, Poulton, Tell, “Transmitter Equalization for 4 Gb/s Signalling,” Hot Interconnects, 1996

More General Tx Equalizer Multi-tap FIR Filter A flexible realization

Setting the tap weights.. Measure response to unit impulse... Compute Tap Weights Transmitter equalizer output Original unequalized waveform Equalized response

Are more taps worth the trouble..? 4-tap 2-tap 4-tap reduces undershoot significantly But.. rapidly diminishing return

Tx EQ is effective.. Equalization OFF Equalization ON 0.18µ I/O, 1999 3.125Gb/s 50cm FR-4 + conn’s Approx -8 dB atten 2-Tap equalizer 210 PRBS pattern 160 mV eye height 150 psec P-P jitter 0.13µ I/O, 2003 6.25Gb/s 73cm FR-4 + conn’s Approx -15 dB atten 4-Tap equalizer 210 PRBS pattern 100 mV eye height 63 psec P-P jitter

Problems with Tx Equalization Problem #1: Can’t drive higher than “max”, only lower! High-attenuation channels require lots of low-frequency attenuation ..so, not much signal into the line.

Problems with Tx EQ.. Problem #2: • EQ requirements ‘known’ only at the receiver • Automatic EQ adjustment requires a ‘back channel’ EQ Adjust

Rx equalizer also possible: • EQ at Rx : • Simplest possible transmitter • Straightforward ‘adaptation’ • No back-channel needed EQ EQ Adjust

Receiver Equalizers •Source degenerated amp popular implementation •About as effective as a 2-tap Tx EQ • Problems: ° Amplifies crosstalk as well as signal ° Power-hungry 3 dB/octave Gain Min. R Out- Out+ 8-10 dB of ‘boost’ In+ In- Max. R Freq R

Decision-Feedback Equalization •Biases receiver threshold in a way that depends on previous data • Depends on making a correct decision in slicer • Generates an error function that can be minimized by control system • Doesn’t amplify crosstalk! • But: area and power hog. • Generally used only in backplane transceivers, difficult channels. Slicer

Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Clocking • Reference clock • Limited to < 200 MHz • Quartz crystal or ceramic resonator PLL (8x – 50x) bit clk Clock Src ref clk ref clk PLL (8x – 50x) bit clk • Multiply ref clock up to bit rate • Typically in a Phase Locked Loop (PLL) • Large area, high power, complex design

Typical SERDES Clocking.. • A single multiplying PLL • Distribute the bit clock • + Share the PLL, reducing area • - Distributing multi-GHz clock costs power bit clk Clock Src ref clk PLL (8x – 50x)

Multiplying PLL’s • Conflicting design goals: • Low loop bandwidth to filter ref clk jitter • High loop bandwidth to filter on-chip noise • Increasing use of on-chip inductors • Power, jitter ~ 1/Q • Can get Q’s of 5-10 or so on-chip • Inductors are huge area hogs • So, plays well with shared PLL idea

2:1 Multiplexing Tx and Rx 0 Parallel tx data data 1 clk div 1 UI PLL (8x – 25x) clk CDR sclk f Half-bit-rate clock div Parallel rx data data sclk Half-bit-rate (2:1 mux’ing) most common SERDES architecture

Clock and Data Recovery (CDR) 1 UI data data clock edge clock • Most common architecture: Alexander Phase Detector • Two quadrature clocks, edge and data • Sample bit stream on both edges of both clocks • We want the data clock to sample in the middle of each data transition (unit interval) • Only simple logic needed to figure out how to adjust clocks

CDR.. Early clocks … E0 D0 E1 D1 E0’ D0’ E1’ D1’ Late clocks … E0 D0 E1 D1 E0’ D0’ E1’ D1’ if (D1 != D0) { * edge detected if ((E1 != D1) early = TRUE; late = FALSE; else late = TRUE; early = FALSE; } else { early = FALSE; late = FALSE; } ..and similarly for E0

Digital CDR.. D0 deserialized data D1 rx data data clock Early/Late Logic Digital Filter E0 E1 half-bit-rate ref clock f edge clock digital phase setting • Most common CDR architecture • Many other ways to do this..

Timing regimes • Mesochronous • Communicating chips get same reference clock • Receiver ‘knows’ the frequency, but not the phase • CDR must recover phase from the data stream • CDR is literally a phase-locked loop • Low-bandwidth “first-order” loop will do the job • Plesiochronous • Communicating chips get different clocks • Frequencies are close, but not exactly the same • CDR must recover frequency and phase • Requires higher bandwidth, “2nd-order” loop • Much more complex logic than Meso..

Termination Channels Equalization Clocking Signaling Bugets Signaling Styles How much signal… • ..do you need at the receiver to detect bits correctly with high probability? • Noise in on-chip logic: • Bounded sources, noise margins are high • Noise in chip-to-chip links: • Unbounded, random processes (“thermal” noise) • This is a good thing, believe it or not! • Want to drive signaling power down to the lowest possible value, but… • Nature (thermodynamics) will intervene eventually

Bit Error Rate (BER) Any signaling system will fail; question is: “How often?” VSNR2 BER < exp( ) ..is a good approximation 2 Probability of making an error Gross voltage margin after all bounded sources of noise are accounted for VM VSNR = VR RMS sum of all random, unbounded noise sources BER = 2e-15 (one error/day @ 6 Gb/s) => VSNR = 8.2 BER = 5e-18 (one error/year @ 6 Gb/s) => VSNR = 8.9

VSNR: Gross noise margin Signal voltage VM = VS (1 – K) - VN Proportional noise sources: Fixed noise sources: • Cross-talk • Reflections • Un-equalized ISI • Correlated power supply noise • Timing noise (jitter) • Receiver sensitivity • Receiver offset • Receiver reference offset • Transmitter offset • Uncorrelated power supply noise

VR: Unbounded Random Noise Anything that dissipates power generates thermal noise: “White” noise VT B (RMS) Frequency VT = (4kBTRB)1/2 Boltzman constant Bandwidth Resistance Temp, deg K Ideal 50 W resistor, 10 GHz b/w, room temp: ~150 mVRMS

VT in CMOS.. R C ..so doesn’t depend on R! VT ~ 2mV for 1fF

Unbounded Random Noise.. Flicker noise: 1/f noise white noise • Equal noise per decade • AKA “1/f noise” • Noise “knee” frequency • Voltages range from • 10nV/decade – 1mV/decade VF (RMS) Frequency Frequency ‘knee’ Knee frequency for typical FET is about 1 MHz

Unbounded Random Noise.. Shot Noise: • Arises from quantization of charge • “White” noise source • Proportional to current • Important mainly in junctions • Associated with gate leakage, though! IRMS = (2qIB)1/2 bandwidth electron charge current

Unbounded Random Noise.. • In practice, always dominated by “white” noise • Always seems to be ~1mV RMS • VM needs to be about 10mV for reasonable BER • ..which in turn means voltage at receiver input terminals needs to be 30-50mV • Fundamental lower limit to signaling energy

+ + - - Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Differential vs. Single-Ended Isig Isig Vrx Vrx Vsig Z0 Vsig 2Z0 Vref = Vsig/2 Isig = +I/2 (“1”) or - I/2 (“0”) Isig = I (“1”) or 0(“0”) Vsig = Isig x 2Z0 Vsig = Isig x Z0 P = 1/2 I2 Z0 P = 1/2 I2 Z0 |Vrx| = Vsig = I Z0 |Vrx| = Vsig/2 = I Z0/2 For same investment in power, differential signaling delivers twice the voltage to the receiver!

Differential vs. Single-Ended Noise Proportional Noise Sources Source: Single-Ended Differential Large: signals strongly coupled (E/M) and by common Z Smaller: Partly cancelled, no common signal impedance Cross-talk Correlated power supply noise Large: simultaneous switching noise Very small, easily cancelled Fixed Noise Sources Source: Single-Ended Differential Receiver reference offset Large Zero: no reference needed Uncorrelated power supply noise Easier to cancel since mainly common-mode Difficult to cancel

Differential vs. Single-ended.. • Single-ended • Self-generates significant noise (SSN) • Uncertainty in receiver reference large source of fixed noise • Vulnerable to externally and internally coupled noise • Half the power efficiency of differential • Differential • Tx consumes constant current from PS, so little self-generated noise • No receiver reference needed • Most external sources of noise are common mode • Easy to achieve 20-30 dB of CMRR • Bottom line: single-ended not a good candidate for low-power signaling.. • ..or, is it?

“Current-Mode” vs “Voltage-Mode” Isig Z0 Z0 Vrx = Isig×Z0/2 Current-mode Z0 Voltage-mode Vsig Z0 Vrx = Vsig/2

Current Mode Op: saturation Sets output current Vbias Op: saturation (high-impedance, isolate current source from line ) datP datN lineP lineN

Low-Power Chip-to-Chip I/O

Low-Power Chip-to-Chip I/O

Presentation Transcript

Study from chip-firing game to cover graph

The eXplicit MultiThreading (XMT) Easy-To-Program Parallel Computer: A PRAM-On-Chip Proof-of-Concept

Walking the Cattle Continuum: Moving From the BovineSNP50 to Higher- and Lower-Density SNP Panels

Network on Chip (NoC)

On-Chip Communication (Architecture and Design)

Provider Orientation 2012 - STAR, STAR+PLUS, and CHIP

Les - omiques

Scalable Many-Core Memory Systems Topic 3 : Memory Interference and QoS -Aware Memory Systems

Network-on-Chip (1/2) Ben Abdallah Abderazek The University of Aizu E-mail: benab@u-aizu.ac.jp

ECE-777 System Level Design and Automation Network-on-Chip ( NoC )

Digital Integrated Circuits A Design Perspective

Designing for 100+ MHz

SURVIVOR

PSoC: Configurable Mixed-Signal Array with On-chip Controller

Pinball: comparative genomics on regulatory regions

Community HOME Investment Program (CHIP)

QoS-Aware Memory Systems