1 / 53

Design of High-Speed Links: A look at Modern VLSI Design

Design of High-Speed Links: A look at Modern VLSI Design. Vladimir Stojanovi ć. Chip design is changing. Best systems trade-off circuits, architecture and system issues. Becoming constrained by power Not so much by area/density. Pentium 3M transistors 30mW/mm 2 0.6um tech 4W 0.1GHz.

urban
Download Presentation

Design of High-Speed Links: A look at Modern VLSI Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design of High-Speed Links: A look at Modern VLSI Design Vladimir Stojanović

  2. Chip design is changing • Best systems trade-off circuits, architecture and system issues • Becoming constrained by power • Not so much by area/density Pentium 3M transistors 30mW/mm2 0.6um tech 4W 0.1GHz Pentium 4 125M transistors 850mW/mm2 90nm tech 103W 3.4GHz

  3. Power-performance system optimization • Complex, many levels of hierarchy and variables

  4. D D Q Q Logic Clk Clk Power-performance system optimization • Complex, many levels of hierarchy and variables Individual components Flops & latches (power and timing critical)

  5. Q D Logic A Logic B D D Q Q Clk Logic Clk Clk D Q Logic A Logic B D D D Q Q Q Clk Logic A Logic B Clk Clk Clk Power-performance system optimization • Complex, many levels of hierarchy and variables Vdd1, Vth1 Individual components Flops & latches (power and timing critical) Vdd2, Vth2 Vdd3, Vth3 Vdd5, Vth5 Vdd4, Vth4 • System level, VLSI blocks and circuits • Physical (Vdd, Vth, Sizing) • Logic • uArchitecture (parallelism, pipelining)

  6. Q D Logic A Logic B D D Q Q Clk Logic Clk Clk D Q Logic A Logic B D D D Q Q Q Clk Logic A Logic B Clk Clk Clk Power-performance system optimization • Complex, many levels of hierarchy and variables Interfaces (Digital, Analog and Mixed-Signal) Vdd1, Vth1 Individual components Flops & latches (power and timing critical) Vdd2, Vth2 Vdd3, Vth3 Channel Receiver Vdd5, Vth5 Transmitter Vdd4, Vth4 • System level, VLSI blocks and circuits • Physical (Vdd, Vth, Sizing) • Logic • uArchitecture (paralellism, pipelining)

  7. Look at sub-problem: links • Seems pretty simple: • Challenging multi-disciplinary area • Circuits • Communications • Optimization Channel Receiver Transmitter

  8. What makes it challenging • Now, the bandwidth limit is in wires High speed link chip > 2 GHz signals

  9. New link design Dealing with bandwidth limited channels • This is an old research area • Textbooks on digital communications • Think modems, DSL • But can’t directly apply their solutions • Standard approach requires high-speed A/Ds and digital signal processing • 20Gs/s A/Ds are expensive • (Un)fortunately need to rethink issues

  10. Outline • Show system level optimization for links • Create a framework to evaluate trade-offs • Background on high-speed links • High-speed link modeling • System level optimization • Practical implementation issues • Current / future work

  11. Backplane environment • Line attenuation • Reflections from stubs (vias)

  12. Backplane channel • Loss is variable • Same backplane • Different lengths • Different stubs • Top vs. Bot • Attenuation is large • >30dB @ 3GHz • But is that bad? • Required signal amplitude set by noise

  13. Inter-symbol interference (ISI) • Channel is low pass • Our nice short pulse gets spread out • Dispersion – short latency(skin-effect, dielectric loss) • Reflections – long latency(impedance mismatches – connectors, via stubs, device parasitics, package)

  14. Error! ISI • Middle sample is corrupted by 0.2 trailing ISI (from the previous symbol), and 0.1 leading ISI (from the next symbol) resulting in 0.3 total ISI • As a result middle symbol is detected in error

  15. The right sub-system model • Need accurate models • To relate the power/complexity to performance • Main system impairments • Interference • Various noise sources • Voltage (thermal, supply, offsets, quantization noise) • Timing (jitter, offset)

  16. Problem with current models • Worst case analysis • Can be too pessimistic • If probability of worst case very small • Gaussian distributions • Works well near mean • Often way off at tails • e.g. ISI distribution is bounded • Use direct noise and interference statistics

  17. Effect of timing noise • Need to map from time to voltage Jittered sampling Ideal sampling Voltage noise when receiver clock is off Voltage noise • The effect is going to depend on the size of the jitter, the input sequence, and the channel

  18. Example: Effect of transmitter jitter Jittered pulse decomposition ideal • Decompose output into ideal and noise • Noise are pulses at front and end of symbol • Width of pulse is equal to jitter • Approximate with deltas on bandlimited channels noise

  19. kRx kRx Jitter effect on voltage noise • Transmitter jitter • High frequency (cycle-cycle) jitter is bad • Changes the energy (area) of the symbol • No correlation of noise sources that sum • Low frequency jitter is less bad • Effectively shifts waveform • Correlated noise give partial cancellation • Receive jitter • Modeled by shift of transmit sequence • Same as low frequency transmitter jitter • Bandwidth of the jitter is critical • It sets the magnitude of the noise created

  20. Jitter source from PLL clocks • Noise sources • Reference clock phase noise • VCO supply noise • Clock buffer supply noise M. Mansuri and C-K.K. Yang, "Jitter optimization based on phase-locked loop design parameters," IEEE Journal Solid-State Circuits, Nov. 2002

  21. dn en (late) dn-1 2x Oversampled bang-bang CDR dn en • Generate early/late from dn,dn-1,en • Simple 1st order loop, cancels receiver setup time • Now need jitter on data Clk, not PLL output • Base linear PLL jitter • Add non-linear phase selector noise from CDR

  22. Bang-bang CDR model • Model CDR loop as a state machine – Markov chain • Gives the probability distribution of phase • Which is the CDR jitter distribution A.E. Payzin, "Analysis of a Digital Bit Synchronizer," IEEE Transactions on Communications, April 1983.

  23. Outline • Show system level optimization for links • Create a framework to evaluate trade-offs • Background on high-speed links • High-speed link modeling • System level optimization • Limits – What is the capacity of these links? • Improving today’s baseband signaling • Practical implementation issues • Current / future work

  24. Baseline channels • Legacy (FR4) - lots of reflections • Microwave engineered (NELCO)

  25. Capacity with link-specific noise NELCO FR4 • Effective noise from phase noise • Proportional to signal energy • Decreases expected gains • Still, capacity much higher than data rates in today’s links

  26. Today’s links • Exclusively baseband • Biggest problem is ISI • Starting to use equalization • Thinking about multi-level modulation • Constrained by speed and power • Large number of links on a chip • Model links to find efficient implementations

  27. Transmit and Receive Equalization Changes signal to correct for ISI Often easier to work at transmitter DACs easier than ADCs Baseband links - removing ISI Linear transmit equalizer Decision-feedback equalizer J. Zerbe et al, "Design, Equalization and Clock Recovery for a 2.5-10Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE Journal Solid-State Circuits, Dec. 2003.

  28. Peak power constraint Transmit equalization – headroom constraint Amplitude of equalized signal depends on the channel • Transmit DAC has limited voltage headroom • Unknown target signal levels • Hard to formulate error or objective function • Need to tune the equalizer and receive comparator levels

  29. Optimization example: Power constrained linear precoding • Add variable gain to amplify to known target level • Formulate the objective function from error • SINR is not concave in win general • Change objective to quasiconcave

  30. Optimal linear precoding • Still, does this objective really relate to link performance? • Need to look at noise and interference distributions • Minimize BER • Residual dispersion into peak distortion • Reflections into mean distortion • Includes all link-specific noise sources 2=wTS0TXw+wTS0RXw+2thermal

  31. Feedback equalization Including feedback equalization • Feedback equalization (DFE) • Subtracts error from input • No attenuation • Problem with DFE • Need to know interfering bits • ISI must be causal • Problem - latency in the decision circuit • Receive latency + DAC settling < bit time • Can increase allowable time by loop unrolling • Receive next bit before the previous is resolved

  32. 1 bit loop unrolling • Instead of subtracting the error • Move the slicer level to include the noise • Slice for each possible level, since previous value unknown K.K. Parhi, "High-Speed architectures for algorithms with quantizer loops," IEEE International Symposium on Circuits and Systems, May 1990

  33. Residual error • Cannot correct all the ISI • Equalizers are finite length • EQ coefficients quantized • ISI-noise enhancement tradeoff • The error affects both voltage and timing • Need accurate distribution of this error • Random data • Standard textbook methods for distribution of the sum of weighted random variables

  34. Comparison with Gaussian model Cumulative ISI distribution Impact on CDR phase • Gaussian model only good down to 10-3 probability • Way pessimistic for much lower probabilities

  35. BER contours 5 tap Tx Eq 5 tap Tx Eq + 1 tap DFE • Voltage margin • Min. distance between the receiver threshold and contours with same BER

  36. Pulse amplitude modulation • Binary (NRZ) • 1 bit / symbol • Symbol rate = bit rate • PAM4 • 2 bits / symbol • Symbol rate = bit rate/2 00 01 1 0 11 10

  37. Multi-level: Offset and jitter are crucial thermal noise + offset+ jitter thermal noise + offset thermal noise • To make better use of available bandwidth, need better circuits

  38. Full ISI compensation too costly thermal noise + offset thermal noise + offset+ jitter w. thermal noise • Today’s links cannot afford to compensate all ISI • Limits today’s maximum achievable data rates

  39. Outline • Show system level optimization for links • Create a framework to evaluate trade-offs • Background on high-speed links • High-speed link modeling • System level optimization • Practical implementation issues • Low-cost adaptation • Dual-mode link (hardware re-use) • Current / future work

  40. TX PLL RX Fully adaptive dual-mode link • Reconfigurable dual-mode PAM2/PAM4 link • Adaptive equalization • Transmit and receive equalization • DFE with loop unrolling • PAM2/PAM4 • 2-10Gb/s • 0.13µm • 40mW/Gb

  41. Adaptation with minimum overhead • Adaptive sampler • Generates the error signal at reference level • Monitors the link • Adjustable voltage and time reference • On-chip sampling scope • Can replace any other sampler - calibration

  42. dLevinit dLevmid dLevend … … Initial eye Mid-way equalized Equalized Dual-loop adaptive algorithm • Data level reference loop errorinitp-p • Equalizer loop • Scale the equalizer - output Tx constraint

  43. Dual loop convergence – 4 tap example PAM2, 5Gb/s, 4taps Tx Equalization • Hard to estimate analytically • Experimental results show • Both loops are stable within wide range 0.1 – 10x of relative speeds

  44. thresh(+) 0 thresh(-) Hardware re-use: Dual-mode receiver • PAM4

  45. 0 Hardware re-use: Dual-mode receiver • PAM4 • PAM2

  46. thresh(+) thresh(-) Hardware re-use: Dual-mode receiver • PAM4 • PAM2 with loop-unrolled DFE tap • Leverage multi-level properties of signals in loop-unrolling

  47. Improvements with loop-unrolling • Signal as seen by the receiver (on-chip scope)

  48. Model and measurements • PAM4, 3taps of transmit equalization, 5Gb/s

  49. Outline • Show system level optimization for links • Create a framework to evaluate trade-offs • Background on high-speed links • High-speed link modeling • System level optimization • Practical implementation issues • Current / future work • Bridging the gap to link capacity

  50. Bridging the gap: Multi-tone link

More Related