300 likes | 434 Views
High-Speed Digital Architectures. Chris Allen (callen@eecs.ku.edu) Course website URL people.eecs.ku.edu/~callen/713/EECS713.htm. Overview. Topics include Pipelining Latency Demultiplexing Multiplexing Clock fanout and distribution Clock skew and fine timing adjustments
E N D
High-Speed Digital Architectures Chris Allen (callen@eecs.ku.edu) Course website URL people.eecs.ku.edu/~callen/713/EECS713.htm
Overview • Topics include • Pipelining • Latency • Demultiplexing • Multiplexing • Clock fanout and distribution • Clock skew and fine timing adjustments • Clock signal sources
Logic devices and high-speed designs • Pipelining & latency • Consider the multi-bit adder, A + B • We want to add two 18-bit binary numbers (unsigned binary) • A17 A16 … A1 A0 + B17 B16 … B1 B0 A0 and B0 are the least significant bits (LSBs) A17 and B17 are the most significant bits (MSBs) • How fast can we add two 18-bit numbers? A 6-bit ECL adder we be the building block for this design Inputs A5:A0 B5:B0 (carry input) Outputs F5:F0 (carry output) A: 0 to 31 B: 0 to 31 : 0 or 1 F: 0 to 31 : 0 or 1 + A + B (maximum case) F = 63, = 0
Pipelining & latency • 18-bit adder • Consider the propagation delay (typical) • An, Bn, Fn : 3 ns • An, Bn, : 2.5 ns • How to find T Identify critical path (longest delay) 5:0 A, B F 3 ns A, B 2.5 ns 11:6 A, B F 3 ns F 5.5 ns A, B 5 ns 17:12 A, B F 3 ns F 8 ns A, B 7.5 ns Note:The adder is combinational logic, not sequential, there is no clock signal For this configuration, the output is stable after 8 ns 125 MHz is the max rate for this 18-bit adder A greater number of bits (e.g., 36-bit adder) would further increase the delay, reducing the add rate
Pipelining & latency • Multiplexed 18-bit adder • twice as much hardware to produce results twice as fast • Propagation delay for each channel is still 8 ns • 2:1 multiplexing cuts time in half 4 ns (250 MHz) • 4:1 multiplexing cuts time by four 2 ns • But the latency is still 8 ns A clock signal has been added to synchronize the registers In this configuration, the adding rate is 250 MHzand the latency is 8 ns
Pipelining & latency • Pipelined 18-bit adder • While a single 6-bit add takes only 3 ns (333 MHz), the propagation of the carry bit slows the 18-bit addition to 8 ns (125 MHz) • A pipelined architecture allows the adder to operate with a 3-ns add time plus 0.5 ns for setup and propagation through the register 3.5 ns cycle time (286 MHz) • This scheme is expandable to N-bit adds with same rate In this configuration, the adding rate is 286 MHzand the latency is 6 clock cycles or 21 ns
Pipelining & latency • Pipelining & latency • The price to be paid for achieving this speed is • Circuit complexity • Data latency • Some applications can tolerated large latency Examples include one-way data transferssuch as TV broadcast signal • Other applications cannot tolerate much latency Examples include two-way data exchangesuch as voice communications (calls via satellite have latency of ~ 0.5 s) • Techniques to further speed up the add process If a 6-bit add takes 3 nsa 2-bit add should take ~ 1 nsa 1-bit add should take ~ 0.8 ns Theoretically could do adds as fast as 1.5 ns (667 MHz) with 18 add stages and 36 clock cycles of latency Note also that this approach requires a large number of clock signals (not shown)
High-speed digital design examples • Consider a data acquisition (DAQ) system • Analog signals are digitized and recorded • Example applications include –oscilloscopes, radar receiver • The maximum bandwidth of the acquired signal is limited by the ADC clock frequency • The precision of the digitized signal is limited by the number of bits in the ADC • The length of the data record is limited by the memory size (2N) WE: write enable
High-speed digital design examples • Consider an arbitrary waveform generator (AWG) system • Analog waveforms are produced from stored digital records • Example application –radar waveform generation • The maximum bandwidth of the output signal is limited by the DAC clock frequency • The precision of the digitized signal is limited by the number of bits in the DAC • The length of the waveform is limited by the memory size (2N) • In both cases, the maximum data vector size is X x 2Ni.e., X-bit wide word, 2N word vector length
High-speed digital design examples • Data acquisition system • Consider the case where X = 8 bits, N = 16 64k word vector1-GHz clock rate, maximum record length is 65.5 s • DAQ high-level timing Within the 1-ns clock period • The acquired data must stablize • The memory must be addressed • The Write Enable line must toggle All in compliance with the memory’s timing requirements • Setup and hold times for Data and Address relative to Write Enable • Key to the DAQ operation are the address generator and the memory This design requires a 16-bit synchronous counter with preset inputs • Not a ripple counter The Addr_CLK must be the system clock (1 GHz) The memory write cycle time < 1 ns • It is difficult to achieve the required timing with available technology
High-speed digital design examples • The timing requirements can be relaxed with demultiplexing • The SPT7760 ADC has an integrated 1:2 demux that reduces the effective output data rate (per channel) to 500-MSa/s
High-speed digital design examples • The ADC’s 1:2 demux doubles the memory’s write time to 2 ns • Consider a design using the following devices:8-bit, 1 GSa/s ADC with 1:2 demux700 MHz, 8-bit sync counter1k x 4 RAM with 5-ns write cycle time • Since the RAM’s 5-ns write cycle time > ADC’s 2-ns demux’d update timefurther demultiplexing is required • A 4:1 demux will reduce the data rate to 8 ns a rate the RAM can accommodate • One cost of this approach is the added complexityboth in terms of added hardwareand in terms of signal formatting for output 4:1 DEMUX
High-speed digital design examples • High-level timing for system with 2:1 ADC demux and 4:1 demux on PCB
High-speed digital design examples • Just as demultiplexing relaxed the DAQ timing requirements, multiplexing eases the arbitrary waveform generator’s timing challenges • A 4:1 mux will reduce the data rate from each memory device by 4 4:1 MUX
High-speed digital design examples • Integrating a multiplexer in the digital-to-analog converter allows the converter to operate at higher rates Integrated 1:2 Mux
High-speed digital design examples • General design rules for these high-speed applications • Keep uniform line lengths within a data bus to ensure constant signal latency • Keep analog signal lines away from digital linesdigital lines contain significant broadband ‘noise’ that can degrade the analog signal through crosstalk • Clock signal distribution design is critical to achieve maximum operating speed • Jitter in the clock signal (due to clock generator circuit) will result in phase noise in the data
Clock signal issues • Clock signals provide a time reference for the entire system • Issues to consider regarding clock signals • Clock fanout and distribution • Clock skew and fine timing adjustments • Clock division: fCLK/2, fCLK/4, … • Clock signal generation • Clock fanout • Consider case where multiple registers must be clocked simultaneously • However the fanout limit of the technology is ~ 5 (3 to 10) • Clock fanout buffers • Intended to provide multiple copies of the clock signal with equal latencies
Clock skew • Clock skew describes when timing signals arrive at different components at different times • Possible causes include • Clock buffer skew • Mismatched trace propagation delay • Capacitive loading or coupling • Clock buffer skew Gate-to-gate skew:20 ps (typ), 50 ps (max)
Clock skew • Even with low-skew clock buffers, some clock skew will remain • Timing variations can compound as devices are cascaded leading to increaed uncertainty • Impact? • System timing variations reduced timing margin • How to compensate for clock skew? • For critical timing applications, we can employ delay adjustments • Delay line (passive) delay depends on length • Gate delay (active) delay depends on gate characteristics • Example • Consider two clock (or data) lines we wish to synchronize using delay line variations • By changing jumper connections can make • tB < tA or tB = tA or tB > tA
Clock skew • Similar schemes for varying signal delay.
Clock skew • Problem • Jumpers may cause impedance mismatch reflections • Using surface mount strips close the gaps helps reduce mismatches • Problem • Occupies significant board area • Hard to implement at chip level or in MCM • Implement jumper selection electronically • Consider implementing the variable delay with a simple gate (OR, XOR, AND, … 300 to 1500 ps) • and a multiplexer • The delay is controlled electronically by bits S0 and S1 S1 S0 F Delay 0 0 A 0 0 1 B Tp 1 0 C 2Tp 1 1 D 3Tp
Clock skew • Single-chip programmable delay lines available
Clock division • Subharmonics of the clock signal (fCLK/2, fCLK/4, …) can be produced using simple flip-flops configured as clock frequency dividers • The output signals have a 50% duty cycle regardless of the input signal’s duty cycle • Shift registers can be used to divide the signal frequency by other integer multiples (know as ring counters or Johnson counters) • Various duty cycles can be produced from these configurations Ring counter Johnson counter
Clock sources • Clock signals are used to provide a timing reference • Typically only one clock oscillator is used per system • In computers, higher frequency signals may be derived from a single oscillator through frequency multiplication (e.g., PLL) • In radar systems, the radar frequency, the A/D sample clock, and other timing and frequency signals are derived from a master clock oscillator(an exception would be the clock that drives the DSP which operates asynchronously from the rest of the system) • Specifying the clock oscillator for digital apps, consider several parameters • Output voltage level (TTL or ECL, not sinusoidal with zero mean) • Frequency (MHz, GHz) nominal operating freq @ nominal temp & voltage • Stability (ppm) long-term frequency drift driven by temp, aging, voltage • Rise/Fall time (ps) • Waveform symmetry (%) may want to use CLK and CLK for split phase timing • Environmental factors temperature range, shock/vibration • Package DIP vs. SMT metal vs. plastic or ceramic
Clock sources • Stability factors • Temperature – quartz crystals used as resonant elements • Piezoelectric effect • Resonance frequency determined by physical dimensions Temperature induces expansion/contraction frequency changes • Several varieties Non-compensated – large f / T Temperature compensated – less f / T Oven-controlled – T is constant • Short-term frequency variations Characterized in terms of phase noise or timing jitter • Phase noise refers to a random, uncorrelated clock-period variation • Introduces timing variations that reduce timing margin • Frequency multiplication amplifies the phase noise
Clock sources • Various methods available to characterize clock jitter (phase noise) • Spectral analysis • An ideal clock signal has spectral energy at the fundamental and harmonic frequencies only • Jitter (phase noise) causes a broadening of the spectral lines • Power level below fundamental at f offset • For example, -50 dBc @ 100 kHz
Clock sources • Converting jitter from measured phase noise W. Kester, “Converting Oscillator Phase Noise to Time Jitter,” MT-008 TUTORIAL, Rev. A, Oct. 2008, Analog Devices, Inc.
Clock sources • Delay line method of characterizing clock jitter • Beat a sample of the clock signalwith a delayed version of itself • Mixer produces and termsthe LPF rejects the term leaving v0 • For fixed delay value, , and a stable v0 varies as changes • To relate to time • Example, for fo= 300 MHz, = 2 (35 mrad), jitter = 18.5 psthis value will vary with delay line length
Clock sources • Example data sheet
Clock sources • For testing purposes, it is useful to vary the clock frequency • Finding the maximum operating clock frequency • In laboratory testing we can use a variable clock generator (if you have one)Older versions have a maximum clock output frequency of 250 MHz • However we can use standard laboratory oscillator (sinusoidal) if se set the amplitude to V (logic levels) and apply a DC bias = threshold voltage • Example, with ECL and GaAs devices C1: AC couples the oscillator to the circuit RT: provides impedance matchingand level shifting to VBB L: provides DC couples / AC blocksVBB from CLK C2: AC path for return current RT = Zo (50 ) C1, C2 (2 f C)-1 << RT (< 1 ) L 2f L >> RT (> 1 k)