1 / 55

Low-Power Chip-to-Chip I/O

Low-Power Chip-to-Chip I/O. John Poulton 4/18/13. Outline. Part I Motivation Why low-power chip-to-chip signaling is needed Signaling fundamentals Channels, termination, equalization Clocking Signaling budgets Signaling methods, power tradeoffs Part II Ground-referenced Signaling

daire
Download Presentation

Low-Power Chip-to-Chip I/O

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low-Power Chip-to-Chip I/O John Poulton 4/18/13

  2. Outline Part I • Motivation • Why low-power chip-to-chip signaling is needed • Signaling fundamentals • Channels, termination, equalization • Clocking • Signaling budgets • Signaling methods, power tradeoffs Part II • Ground-referenced Signaling • An experimental low-power link

  3. Scaling Trends Gates x GHz Doubled every 16 months Now every 30 months 109 107 105 Signal Pins x GHz Doubled every 28 months Now every 4 years 103 10 1978 1986 1994 2000 2008 Year

  4. The Bandwidth Gap (5 years ago) Rent’s Rule: Number of pins = K x Gatesa (IBM, 1960) K = 0.82, a = 0.45 for early Microprocessors 109 107 B 105 Pins x GHz from Rent’s Rule 103 10 Bandwidth Gap: ~500 x and growing! 1978 1986 1994 2000 2008 Year

  5. The Computer Architecture Crisis

  6. Present Day Situation • Power is flat (150W, air cooled) • Frequency is flat (see power) • Performance per processor is flat • Transistor count still going up.. • ..but transistors much more expen$ive • Can’t run all of them at highest speed (see power) • Today’s challenges: • Parallel computing (programming) • Energy efficiency

  7. By 2015.. …Will need 20X today’s off-chip bandwidth • Packaging improving only at 10%/year • Maybe 1,400 signal pins/chip • Power already at limits of air cooling • For processors and GPU’s, will need: • 1 TB/sec, through 350 transceivers, @ <10W • 25 Gb/sec transceivers at <1 mW/Gb/sec (pJ/bit) • Portable electronics is major market • Low active power important, but.. • Low standby power is critical (battery life).

  8. Transistor Technology Changing.. 1/Gds Id Pre-2005 (Constant-field scaling) Gm Saturation Gm/Gds large Good for “analog” (current-mode ckts) Linear Vds Id Now (Constant-voltage scaling) Gm/Gds small Bad for current-mode Linear Saturation Vds

  9. In the (near) Future.. • FinFETs will offer • Slightly better Gm/Gds • Limited choice of L and W • Short-L and slightly-less-short L • W replaced by “N” • No long-channel devices at all.. • ..except maybe legacy high-voltage FETs • Current-mode analog problematic • Paradox: • limited ability to mold transistors to our needs, but circuit design increasingly important

  10. Signaling Fundamentals

  11. What’s a signaling system? ..0100110.. ..0100110.. Transmitter: converts on-chip bits into a signal (a continuous function of voltage or current in time) Channel: physical structure (packages, connectors, cables, PC-board traces) that delivers the signal from transmitter to receiver Receiver: converts the signal (degraded by the channel) back into bits

  12. Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Termination Unlike on-chip wires, which are RC transmission lines, chip-to-chip interconnect structures are LC T-lines Lincr Rincr Gincr Cincr LC lines have a ‘characteristic impedance’ Z0 = (L/C)1/2 that relates instantaneous voltage and current on the line

  13. Termination.. • Must terminate one, or both, ends of an LC line • Termination must be resistive at Z0 to prevent unwanted reflections (which look like noise at the receiver!). • Termination must be built into transmitters and receivers. Z0 R = Z0

  14. Termination.. Z0 = 50W 50W 0.3F At 20 Gb/sec (10 GHz), 0.3pF ~ 50W This termination will generate a 20-30% reflection. Shunt capacitance is bad! Also, unavoidable..

  15. Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Channels Package PC Boards Connector Via Typical backplane interconnect Typical chip-to-chip interconnect

  16. Frequency-dependent Loss • Conductors and dielectrics both lossy • Loss in 0.5m of typical FR-4 Skin Effect ~ F 1/2 Dielectric Loss ~ F Overall Loss • For low-cost PC boards, dielectric loss dominates • In exotic materials, 2 effects are comparable

  17. Backplane vs Chip-to-chip Channels Connectors and vias introduce reflections, ‘notches’, crosstalk Bare 20” FR4 Trace 20” Backplane Channel > 100 MHz Flat Channel no attn 0.1 - 1 GHz Moderate Attn few dB/octave > 1 GHz Strong Attn > 5 dB/octave Chip-to-chip channels more like bare boards, and generally less than 20”

  18. Why is attenuation bad? Frequency-dependent attenuation in channel produces inter-symbol interference (ISI): Vin Lone 1 in stream of 0’s undetectable Vout Equalization required when attenuation more than a few dB per octave

  19. Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Equalization Filter can go at either end of the channel..

  20. Transmitter Equalization ..or “Transmitter Pre-emphasis” Equivalent to a 2-tap FIR filter. Dally, Poulton, Tell, “Transmitter Equalization for 4 Gb/s Signalling,” Hot Interconnects, 1996

  21. More General Tx Equalizer Multi-tap FIR Filter A flexible realization

  22. Setting the tap weights.. Measure response to unit impulse... Compute Tap Weights Transmitter equalizer output Original unequalized waveform Equalized response

  23. Are more taps worth the trouble..? 4-tap 2-tap 4-tap reduces undershoot significantly But.. rapidly diminishing return

  24. Tx EQ is effective.. Equalization OFF Equalization ON 0.18µ I/O, 1999 3.125Gb/s 50cm FR-4 + conn’s Approx -8 dB atten 2-Tap equalizer 210 PRBS pattern 160 mV eye height 150 psec P-P jitter 0.13µ I/O, 2003 6.25Gb/s 73cm FR-4 + conn’s Approx -15 dB atten 4-Tap equalizer 210 PRBS pattern 100 mV eye height 63 psec P-P jitter

  25. Problems with Tx Equalization Problem #1: Can’t drive higher than “max”, only lower! High-attenuation channels require lots of low-frequency attenuation ..so, not much signal into the line.

  26. Problems with Tx EQ.. Problem #2: • EQ requirements ‘known’ only at the receiver • Automatic EQ adjustment requires a ‘back channel’ EQ Adjust

  27. Rx equalizer also possible: • EQ at Rx : • Simplest possible transmitter • Straightforward ‘adaptation’ • No back-channel needed EQ EQ Adjust

  28. Receiver Equalizers •Source degenerated amp popular implementation •About as effective as a 2-tap Tx EQ • Problems: ° Amplifies crosstalk as well as signal ° Power-hungry 3 dB/octave Gain Min. R Out- Out+ 8-10 dB of ‘boost’ In+ In- Max. R Freq R

  29. Decision-Feedback Equalization •Biases receiver threshold in a way that depends on previous data • Depends on making a correct decision in slicer • Generates an error function that can be minimized by control system • Doesn’t amplify crosstalk! • But: area and power hog. • Generally used only in backplane transceivers, difficult channels. Slicer

  30. Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Clocking • Reference clock • Limited to < 200 MHz • Quartz crystal or ceramic resonator PLL (8x – 50x) bit clk Clock Src ref clk ref clk PLL (8x – 50x) bit clk • Multiply ref clock up to bit rate • Typically in a Phase Locked Loop (PLL) • Large area, high power, complex design

  31. Typical SERDES Clocking.. • A single multiplying PLL • Distribute the bit clock • + Share the PLL, reducing area • - Distributing multi-GHz clock costs power bit clk Clock Src ref clk PLL (8x – 50x)

  32. Multiplying PLL’s • Conflicting design goals: • Low loop bandwidth to filter ref clk jitter • High loop bandwidth to filter on-chip noise • Increasing use of on-chip inductors • Power, jitter ~ 1/Q • Can get Q’s of 5-10 or so on-chip • Inductors are huge area hogs • So, plays well with shared PLL idea

  33. 2:1 Multiplexing Tx and Rx 0 Parallel tx data data 1 clk div 1 UI PLL (8x – 25x) clk CDR sclk f Half-bit-rate clock div Parallel rx data data sclk Half-bit-rate (2:1 mux’ing) most common SERDES architecture

  34. Clock and Data Recovery (CDR) 1 UI data data clock edge clock • Most common architecture: Alexander Phase Detector • Two quadrature clocks, edge and data • Sample bit stream on both edges of both clocks • We want the data clock to sample in the middle of each data transition (unit interval) • Only simple logic needed to figure out how to adjust clocks

  35. CDR.. Early clocks … E0 D0 E1 D1 E0’ D0’ E1’ D1’ Late clocks … E0 D0 E1 D1 E0’ D0’ E1’ D1’ if (D1 != D0) { * edge detected if ((E1 != D1) early = TRUE; late = FALSE; else late = TRUE; early = FALSE; } else { early = FALSE; late = FALSE; } ..and similarly for E0

  36. Digital CDR.. D0 deserialized data D1 rx data data clock Early/Late Logic Digital Filter E0 E1 half-bit-rate ref clock f edge clock digital phase setting • Most common CDR architecture • Many other ways to do this..

  37. Timing regimes • Mesochronous • Communicating chips get same reference clock • Receiver ‘knows’ the frequency, but not the phase • CDR must recover phase from the data stream • CDR is literally a phase-locked loop • Low-bandwidth “first-order” loop will do the job • Plesiochronous • Communicating chips get different clocks • Frequencies are close, but not exactly the same • CDR must recover frequency and phase • Requires higher bandwidth, “2nd-order” loop • Much more complex logic than Meso..

  38. Termination Channels Equalization Clocking Signaling Bugets Signaling Styles How much signal… • ..do you need at the receiver to detect bits correctly with high probability? • Noise in on-chip logic: • Bounded sources, noise margins are high • Noise in chip-to-chip links: • Unbounded, random processes (“thermal” noise) • This is a good thing, believe it or not! • Want to drive signaling power down to the lowest possible value, but… • Nature (thermodynamics) will intervene eventually

  39. Bit Error Rate (BER) Any signaling system will fail; question is: “How often?” VSNR2 BER < exp( ) ..is a good approximation 2 Probability of making an error Gross voltage margin after all bounded sources of noise are accounted for VM VSNR = VR RMS sum of all random, unbounded noise sources BER = 2e-15 (one error/day @ 6 Gb/s) => VSNR = 8.2 BER = 5e-18 (one error/year @ 6 Gb/s) => VSNR = 8.9

  40. VSNR: Gross noise margin Signal voltage VM = VS (1 – K) - VN Proportional noise sources: Fixed noise sources: • Cross-talk • Reflections • Un-equalized ISI • Correlated power supply noise • Timing noise (jitter) • Receiver sensitivity • Receiver offset • Receiver reference offset • Transmitter offset • Uncorrelated power supply noise

  41. VR: Unbounded Random Noise Anything that dissipates power generates thermal noise: “White” noise VT B (RMS) Frequency VT = (4kBTRB)1/2 Boltzman constant Bandwidth Resistance Temp, deg K Ideal 50 W resistor, 10 GHz b/w, room temp: ~150 mVRMS

  42. VT in CMOS.. R C ..so doesn’t depend on R! VT ~ 2mV for 1fF

  43. Unbounded Random Noise.. Flicker noise: 1/f noise white noise • Equal noise per decade • AKA “1/f noise” • Noise “knee” frequency • Voltages range from • 10nV/decade – 1mV/decade VF (RMS) Frequency Frequency ‘knee’ Knee frequency for typical FET is about 1 MHz

  44. Unbounded Random Noise.. Shot Noise: • Arises from quantization of charge • “White” noise source • Proportional to current • Important mainly in junctions • Associated with gate leakage, though! IRMS = (2qIB)1/2 bandwidth electron charge current

  45. Unbounded Random Noise.. • In practice, always dominated by “white” noise • Always seems to be ~1mV RMS • VM needs to be about 10mV for reasonable BER • ..which in turn means voltage at receiver input terminals needs to be 30-50mV • Fundamental lower limit to signaling energy

  46. + + - - Termination Channels Equalization Clocking Signaling Bugets Signaling Styles Differential vs. Single-Ended Isig Isig Vrx Vrx Vsig Z0 Vsig 2Z0 Vref = Vsig/2 Isig = +I/2 (“1”) or - I/2 (“0”) Isig = I (“1”) or 0(“0”) Vsig = Isig x 2Z0 Vsig = Isig x Z0 P = 1/2 I2 Z0 P = 1/2 I2 Z0 |Vrx| = Vsig = I Z0 |Vrx| = Vsig/2 = I Z0/2 For same investment in power, differential signaling delivers twice the voltage to the receiver!

  47. Differential vs. Single-Ended Noise Proportional Noise Sources Source: Single-Ended Differential Large: signals strongly coupled (E/M) and by common Z Smaller: Partly cancelled, no common signal impedance Cross-talk Correlated power supply noise Large: simultaneous switching noise Very small, easily cancelled Fixed Noise Sources Source: Single-Ended Differential Receiver reference offset Large Zero: no reference needed Uncorrelated power supply noise Easier to cancel since mainly common-mode Difficult to cancel

  48. Differential vs. Single-ended.. • Single-ended • Self-generates significant noise (SSN) • Uncertainty in receiver reference large source of fixed noise • Vulnerable to externally and internally coupled noise • Half the power efficiency of differential • Differential • Tx consumes constant current from PS, so little self-generated noise • No receiver reference needed • Most external sources of noise are common mode • Easy to achieve 20-30 dB of CMRR • Bottom line: single-ended not a good candidate for low-power signaling.. • ..or, is it?

  49. “Current-Mode” vs “Voltage-Mode” Isig Z0 Z0 Vrx = Isig×Z0/2 Current-mode Z0 Voltage-mode Vsig Z0 Vrx = Vsig/2

  50. Current Mode Op: saturation Sets output current Vbias Op: saturation (high-impedance, isolate current source from line ) datP datN lineP lineN

More Related