High-Speed Digital Architectures

High-Speed Digital Architectures Chris Allen (callen@eecs.ku.edu) Course website URL people.eecs.ku.edu/~callen/713/EECS713.htm

Overview • Topics include • Pipelining • Latency • Demultiplexing • Multiplexing • Clock fanout and distribution • Clock skew and fine timing adjustments • Clock signal sources

Logic devices and high-speed designs • Pipelining & latency • Consider the multi-bit adder, A + B • We want to add two 18-bit binary numbers (unsigned binary) • A17 A16 … A1 A0 + B17 B16 … B1 B0 A0 and B0 are the least significant bits (LSBs) A17 and B17 are the most significant bits (MSBs) • How fast can we add two 18-bit numbers? A 6-bit ECL adder we be the building block for this design Inputs A5:A0 B5:B0 (carry input) Outputs F5:F0 (carry output) A: 0 to 31 B: 0 to 31 : 0 or 1 F: 0 to 31 : 0 or 1 + A + B (maximum case)  F = 63, = 0

Pipelining & latency • 18-bit adder • Consider the propagation delay (typical) • An, Bn,  Fn : 3 ns • An, Bn,  : 2.5 ns • How to find T Identify critical path (longest delay) 5:0 A, B  F 3 ns A, B  2.5 ns 11:6 A, B  F 3 ns  F 5.5 ns A, B  5 ns 17:12 A, B  F 3 ns  F 8 ns A, B  7.5 ns Note:The adder is combinational logic, not sequential, there is no clock signal For this configuration, the output is stable after 8 ns  125 MHz is the max rate for this 18-bit adder A greater number of bits (e.g., 36-bit adder) would further increase the delay, reducing the add rate

Pipelining & latency • Multiplexed 18-bit adder • twice as much hardware to produce results twice as fast • Propagation delay for each channel is still 8 ns • 2:1 multiplexing cuts time in half  4 ns (250 MHz) • 4:1 multiplexing cuts time by four  2 ns • But the latency is still 8 ns A clock signal has been added to synchronize the registers In this configuration, the adding rate is 250 MHzand the latency is 8 ns

Pipelining & latency • Pipelined 18-bit adder • While a single 6-bit add takes only 3 ns (333 MHz), the propagation of the carry bit slows the 18-bit addition to 8 ns (125 MHz) • A pipelined architecture allows the adder to operate with a 3-ns add time plus 0.5 ns for setup and propagation through the register  3.5 ns cycle time (286 MHz) • This scheme is expandable to N-bit adds with same rate In this configuration, the adding rate is 286 MHzand the latency is 6 clock cycles or 21 ns

Pipelining & latency • Pipelining & latency • The price to be paid for achieving this speed is • Circuit complexity • Data latency • Some applications can tolerated large latency Examples include one-way data transferssuch as TV broadcast signal • Other applications cannot tolerate much latency Examples include two-way data exchangesuch as voice communications (calls via satellite have latency of ~ 0.5 s) • Techniques to further speed up the add process If a 6-bit add takes 3 nsa 2-bit add should take ~ 1 nsa 1-bit add should take ~ 0.8 ns Theoretically could do adds as fast as 1.5 ns (667 MHz) with 18 add stages and 36 clock cycles of latency Note also that this approach requires a large number of clock signals (not shown)

High-speed digital design examples • Consider a data acquisition (DAQ) system • Analog signals are digitized and recorded • Example applications include –oscilloscopes, radar receiver • The maximum bandwidth of the acquired signal is limited by the ADC clock frequency • The precision of the digitized signal is limited by the number of bits in the ADC • The length of the data record is limited by the memory size (2N) WE: write enable

High-speed digital design examples • Consider an arbitrary waveform generator (AWG) system • Analog waveforms are produced from stored digital records • Example application –radar waveform generation • The maximum bandwidth of the output signal is limited by the DAC clock frequency • The precision of the digitized signal is limited by the number of bits in the DAC • The length of the waveform is limited by the memory size (2N) • In both cases, the maximum data vector size is X x 2Ni.e., X-bit wide word, 2N word vector length

High-speed digital design examples • Data acquisition system • Consider the case where X = 8 bits, N = 16  64k word vector1-GHz clock rate, maximum record length is 65.5 s • DAQ high-level timing Within the 1-ns clock period • The acquired data must stablize • The memory must be addressed • The Write Enable line must toggle All in compliance with the memory’s timing requirements • Setup and hold times for Data and Address relative to Write Enable • Key to the DAQ operation are the address generator and the memory This design requires a 16-bit synchronous counter with preset inputs • Not a ripple counter The Addr_CLK must be the system clock (1 GHz) The memory write cycle time < 1 ns • It is difficult to achieve the required timing with available technology

High-speed digital design examples • The timing requirements can be relaxed with demultiplexing • The SPT7760 ADC has an integrated 1:2 demux that reduces the effective output data rate (per channel) to 500-MSa/s

High-speed digital design examples • The ADC’s 1:2 demux doubles the memory’s write time to 2 ns • Consider a design using the following devices:8-bit, 1 GSa/s ADC with 1:2 demux700 MHz, 8-bit sync counter1k x 4 RAM with 5-ns write cycle time • Since the RAM’s 5-ns write cycle time > ADC’s 2-ns demux’d update timefurther demultiplexing is required • A 4:1 demux will reduce the data rate to 8 ns a rate the RAM can accommodate • One cost of this approach is the added complexityboth in terms of added hardwareand in terms of signal formatting for output 4:1 DEMUX

High-speed digital design examples • High-level timing for system with 2:1 ADC demux and 4:1 demux on PCB

High-speed digital design examples • Just as demultiplexing relaxed the DAQ timing requirements, multiplexing eases the arbitrary waveform generator’s timing challenges • A 4:1 mux will reduce the data rate from each memory device by 4 4:1 MUX

High-speed digital design examples • Integrating a multiplexer in the digital-to-analog converter allows the converter to operate at higher rates Integrated 1:2 Mux

High-speed digital design examples • General design rules for these high-speed applications • Keep uniform line lengths within a data bus to ensure constant signal latency • Keep analog signal lines away from digital linesdigital lines contain significant broadband ‘noise’ that can degrade the analog signal through crosstalk • Clock signal distribution design is critical to achieve maximum operating speed • Jitter in the clock signal (due to clock generator circuit) will result in phase noise in the data

Clock signal issues • Clock signals provide a time reference for the entire system • Issues to consider regarding clock signals • Clock fanout and distribution • Clock skew and fine timing adjustments • Clock division: fCLK/2, fCLK/4, … • Clock signal generation • Clock fanout • Consider case where multiple registers must be clocked simultaneously • However the fanout limit of the technology is ~ 5 (3 to 10) • Clock fanout buffers • Intended to provide multiple copies of the clock signal with equal latencies

Clock skew • Clock skew describes when timing signals arrive at different components at different times • Possible causes include • Clock buffer skew • Mismatched trace propagation delay • Capacitive loading or coupling • Clock buffer skew Gate-to-gate skew:20 ps (typ), 50 ps (max)

Clock skew • Even with low-skew clock buffers, some clock skew will remain • Timing variations can compound as devices are cascaded leading to increaed uncertainty • Impact? • System timing variations  reduced timing margin • How to compensate for clock skew? • For critical timing applications, we can employ delay adjustments • Delay line (passive) delay depends on length • Gate delay (active) delay depends on gate characteristics • Example • Consider two clock (or data) lines we wish to synchronize using delay line variations • By changing jumper connections can make • tB < tA or tB = tA or tB > tA

Clock skew • Similar schemes for varying signal delay.

Clock skew • Problem • Jumpers may cause impedance mismatch  reflections • Using surface mount strips close the gaps helps reduce mismatches • Problem • Occupies significant board area • Hard to implement at chip level or in MCM • Implement jumper selection electronically • Consider implementing the variable delay with a simple gate (OR, XOR, AND, … 300 to 1500 ps) • and a multiplexer • The delay is controlled electronically by bits S0 and S1 S1 S0 F Delay 0 0 A 0 0 1 B Tp 1 0 C 2Tp 1 1 D 3Tp

Clock skew • Single-chip programmable delay lines available

Clock division • Subharmonics of the clock signal (fCLK/2, fCLK/4, …) can be produced using simple flip-flops configured as clock frequency dividers • The output signals have a 50% duty cycle regardless of the input signal’s duty cycle • Shift registers can be used to divide the signal frequency by other integer multiples (know as ring counters or Johnson counters) • Various duty cycles can be produced from these configurations Ring counter Johnson counter

Clock sources • Clock signals are used to provide a timing reference • Typically only one clock oscillator is used per system • In computers, higher frequency signals may be derived from a single oscillator through frequency multiplication (e.g., PLL) • In radar systems, the radar frequency, the A/D sample clock, and other timing and frequency signals are derived from a master clock oscillator(an exception would be the clock that drives the DSP which operates asynchronously from the rest of the system) • Specifying the clock oscillator for digital apps, consider several parameters • Output voltage level (TTL or ECL, not sinusoidal with zero mean) • Frequency (MHz, GHz) nominal operating freq @ nominal temp & voltage • Stability (ppm) long-term frequency drift driven by temp, aging, voltage • Rise/Fall time (ps) • Waveform symmetry (%) may want to use CLK and CLK for split phase timing • Environmental factors temperature range, shock/vibration • Package DIP vs. SMT metal vs. plastic or ceramic

Clock sources • Stability factors • Temperature – quartz crystals used as resonant elements • Piezoelectric effect • Resonance frequency determined by physical dimensions Temperature induces expansion/contraction  frequency changes • Several varieties Non-compensated – large f / T Temperature compensated – less f / T Oven-controlled – T is constant • Short-term frequency variations Characterized in terms of phase noise or timing jitter • Phase noise refers to a random, uncorrelated clock-period variation • Introduces timing variations that reduce timing margin • Frequency multiplication amplifies the phase noise

Clock sources • Various methods available to characterize clock jitter (phase noise) • Spectral analysis • An ideal clock signal has spectral energy at the fundamental and harmonic frequencies only • Jitter (phase noise) causes a broadening of the spectral lines • Power level below fundamental at f offset • For example, -50 dBc @ 100 kHz

Clock sources • Converting jitter from measured phase noise W. Kester, “Converting Oscillator Phase Noise to Time Jitter,” MT-008 TUTORIAL, Rev. A, Oct. 2008, Analog Devices, Inc.

Clock sources • Delay line method of characterizing clock jitter • Beat a sample of the clock signalwith a delayed version of itself • Mixer produces  and  termsthe LPF rejects the  term leaving v0 • For fixed delay value, , and a stable v0 varies as  changes • To relate  to time • Example, for fo= 300 MHz,  = 2 (35 mrad), jitter = 18.5 psthis value will vary with delay line length

Clock sources • Example data sheet

Clock sources • For testing purposes, it is useful to vary the clock frequency • Finding the maximum operating clock frequency • In laboratory testing we can use a variable clock generator (if you have one)Older versions have a maximum clock output frequency of 250 MHz • However we can use standard laboratory oscillator (sinusoidal) if se set the amplitude to V (logic levels) and apply a DC bias = threshold voltage • Example, with ECL and GaAs devices C1: AC couples the oscillator to the circuit RT: provides impedance matchingand level shifting to VBB L: provides DC couples / AC blocksVBB from CLK C2: AC path for return current RT = Zo (50 ) C1, C2  (2 f C)-1 << RT (< 1 ) L  2f L >> RT (> 1 k)

High-Speed Digital Architectures

High-Speed Digital Architectures

Presentation Transcript

Routing of High Speed Digital PC Boards

High speed digital systems laboratory

High Speed Digital Design Project

High Speed Digital Design Project

Firewall Architectures for High-Speed Networks

High Speed Digital Access

High Speed Digital Design Project

High Speed Digital Systems Lab May 2009

High Speed Digital Design Project

High-speed TCP

High Speed Digital Systems Lab June 2008

High Speed Links

High Speed Digital Systems Lab June 2008

Power Management for High-speed Digital Systems

High Speed Data Acquisition Architectures

High-Speed Digital Circuit Design

HIGH SPEED TECHNOLOGY INFRASTRUCTURE ENGINEERING FOR HIGH SPEED

Implementation of high speed digital channel

Chapter 9. High-Speed Digital Access

High-speed Addition with Bipolar Digital Circuits

Implementation of high speed digital channel