ECE 699 Digital Signal Processing Hardware Implementations Lecture 7

1. ECE 699�Digital Signal Processing Hardware ImplementationsLecture 7 Midterm Review 3/17/09

2. Outline Midterm specifics Midterm review

3. Midterm Specifics

4. Midterm Specifics The midterm will take place in class on Wednesday, March 25, during the class period from 4:30 pm � 7:10 pm. The midterm will be open-book, open-notes. You can bring any textbooks No electronic devices (cell phones, PDAs, laptops, etc.) are allowed ? you must PRINT OUT all notes on paper. Do not come to class with your notes in electronic form on your laptop, as you will not be allowed to use your laptop.

5. Midterm Content The midterm will have between 4-6 questions Some questions will be calculation-oriented Some questions will be VHDL/Matlab/design-oriented Some questions will be short answer The midterm will cover all material through today�s lecture including slides, notes from class, textbook, etc. If you missed notes from class, try to get them from one of your classmates. In the following slides we will highlight topics from each lecture Of course, all material (discussed today or not) will be covered on the midterm

6. Lecture 1 Highlights

7. Lecture 1 Highlights Lecture 1 gave an introduction to DSP systems in hardware and covered fixed-point Representations Covered unsigned vs. signed representations and maximum representable range Discussed addition and overflow concerns and how to model these in VHDL Discussed multiplication range and VHDL coding

8. Unsigned Binary Representation K integer bits, L fractional bits More integer bits, the larger the maximum representable value Larger the L, the greater the precision Notation UN.L U indicates unsigned N = number of total bits (i.e. K + L) L = number of fractional bits This is the notation used in fixed-point Matlab In hardware, there is no such thing as a "binary point" (i.e. decimal point) Designer must keep tack of appropriate binary point

9. Maximum Representable Range UN.L unsigned number has decimal range: Minimum: 0 Maximum: 2K-2-L = 2N-1 / 2L which is obtained when xi=1 for all i Exact representable range: 0 = X = 2K-2-L Rule of thumb: range of number X in UN.L notation 0 = X < 2K The number of integer bits K largely determines the maximum representable range

10. Unsigned Binary Maximum Representable Range Examples U7.0 ? rule of thumb: 0 = X < 128 (K=7) min: 0000000 = (0)10 max: 1111111 = (27-1 / 20 = 127)10 U5.3 ? rule of thumb: 0 = X < 4 (K=2) min: 00.000 = (0)10 max: 11.111 = 25-1 / 23 = 3.875 U5.4 ? rule of thumb: 0 = X < 2 (K=1) min: 0.0000 = (0)10 max: 1.1111 = 25-1 / 24 = 1.9375 U5.7 ? rule of thumb: 0 = X < 0.25 (K=-2) min: .[00]00000 = (0)10 max: .[00]11111 = 25-1 / 27 = 0.2421875

11. Two's Complement Notation K integer bits, L fractional bits More integer bits, the larger the maximum representable value Larger the L, the greater the precision Notation SN.L S indicates signed two's complement N = number of total bits (i.e. K + L) L = number of fractional bits This is the notation used in fixed-point Matlab In hardware, there is no such thing as a "binary point" (i.e. decimal point) Designer must keep tack of appropriate binary point

12. Maximum Representable Range SN.M unsigned number has decimal range: Minimum: -2K-1 = -2(N-L-1) which is obtained when xi=1 for i=K-1 and xi=0 otherwise Minimum value is the largest negative value Maximum: 2K-1-2-L = 2N-1-1 / 2L which is obtained when xi=0 for i=K-1 and xi=1 otherwise Maximum value is the largest positive value Exact representable range: -2K-1 = X = 2K-1-2-L Range is assymetric Rule of thumb: range of number X in SN.L notation -2-K-1 = X < 2K-1 K = N � L, the number of integer bits The number of integer bits K largely determines the maximum representable range

13. Unsigned Addition vs. Signed Addition Hardware for unsigned adder is the same as for signed adder (except overflow detection)!

14. Out of Range/Overflow Detection

15. Multiplication Output Range Multiplication: A x B = C Unsigned multiplication UN.L x UN'.L' = U(N+N').(L+L') number U4.3 x U5.4 = U9.7 number Example: Binary: 1.101 x 1.1001 = 10.1000101 Decimal: 1.625 x 1.5625 = 2.5390625 Signed multiplication (two's complement) SN.L x SN'.L' = (N+N').(L+L') number S4.3 x S4.3 = S8.6 number Example: Binary: 1.000 x 1.000 = 01.000000 Decimal: -1 x -1 = 1 Binary: 0.111 x 0.111 = 00.110001 Decimal: 0.875 x 0.875 = 0.765625 Binary: 1.000 x 0.111 = 11.001000 Decimal: -1 x 0.875 = -0.875 NOTE: Only need K+K' integer bits when A and B are their most negative allowable values If A and B can be restricted such that they do not reach most negative allowable values, then only need K+K'-1 integer bits, i.e. output result is S(N+N'-1).(L+L') Save one MSB: this is a useful trick often using in DSP systems to reduce wordlengths!

16. Multiplication Example in VHDL

17. Lecture 2 Highlights

18. Lecture 2 Highlights Lecture 2 continued the study of fixed-point representations, particularly Two�s complement subtraction The modulo property of two�s complement�very important wordlength determination for adding multiple signals The lecture continued with a discussion of quantization of three types Truncation Round-to-nearest Symmetric rounding Finally it discussed the effects of quantization on SQNR Notes in class were given to derive the error of each quantizer and its effect on the SQNR

19. Two's Complement Wraparound Property Temporary wraparounds are fine as long as final value is in the correct dynamic range: Example S4.0: add (-8 + -6) + 7 = -7 Step 1: 1000 + 1010 = 0010 Should be (-14)10 not (+2)10 ? wraparound/overflow Step 2: 0010 + 0111 = 1001 Final result is correct: (-7)10 If final result guaranteed to be in the correct dynamic range [-8,+7] then intermediate wraparounds are fine

20. Modulo Addition This works because of the modulo property Consider addition with modulus M Y = (A + B)modM = (AmodM + BmodM)modM Y = (A + B + C)modM = ((A + B)modM + CmodM)modM = ((AmodM + BmodM)modM + CmodM)modM etc. Two's complement has M=2K modulus(-like) property Specifically, if correct result of sum is out of range Overflow: subtract modulus M from correct result to obtain represented two's complement number Underflow: add modulus M to correct result to obtain represented two's complement number See slide 98 of Lecture 1 In a series of additions/subtractions (multiplication is simply a series of additions/subtractions) as long as the final result is within output dynamic range can ignore temporary overflows Say Y is an SN'.L' number (K' = N' � L') and the results of an addition chain As long as it is guaranteed that -2K'-1 = Y = 2K'-1 � 2-L' then temporary overflows do not affect final output Rule of thumb: K determines maximum representable range, since -2K'-1 = Y < 2K'-1 Intermediate truncations/rounding in the addition and multiplication only add quantization noise to the result More on this later

21. FIR Filter Example: Preventing Overflow

22. FIR Filter Example: Bounded Output Assume y(n) is guaranteed to be bounded to be within -1 = y(n) < 1 Final y(n) is an S12.11 number We will deal with quantization of fractional bits later Save several bits on each adder! Large savings seen for longer filters

23. Computing wordlength of Y: Unsigned Addition Minimum value Y can attain is 0 Step 1: Maximum value Y can attain is: Ymax = X1(max) + X2(max) + � XM(max) Step 2: Maximum representable value of UN'.L' number: To preserve precision in addition, set L' = max(Li) Maximum representable value of a UN'.L' number is (2N' � 1) / 2L' = Step 3: Set maximum representable value of UN'.L' number = Ymax. Solve for N':

24. Example Xi = U7.4 , X2 = U8.4, X3 = U9.3, X4 = U6.2 Solution: L' = 4 Ymax = 103.5 Solving equation, 2N' = 1657 ? N' = 11 Y is a S11.4 number

25. Computing Wordlength of Y: Signed Addition Step 1a: Maximum (most positive) value Y can attain is: Ymax = X1(max) + X2(max) + � XM(max) Step 1b: Minimum (most negative) value Y can attain is: Ymin = X1(min) + X2(min) + � XM(min) Step 2: Max/min representable value of SN'.L' number: To preserve precision in addition, set L' = max(Li) Maximum representable value of an SN'.L' number: (2N'-1 � 1) / 2L' Minimum representable value of SN'L' number: -2N'-1 / 2L' Step 3: Set maximum representable value = Ymax, and min representable value = Ymin. Solve for N' that satisfies both:

26. TYPE 1: Truncation Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1. Example L'=0 xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L = yk-1 yk-2 .. y1 y0. Truncation in two's complement results in a number trunc(x) that is always smaller than x. This is also called round towards -8 or downward-directed rounding: 011.10 (3.5)10 ? 011 (3)10 Error = -0.5 100.01 (-3.5)10 ? 100 (-4)10 Error = -0.5

27. Statistics of Truncation Error distribution (assuming random input) Range: - (2-L'-2-L) = Et = 0 Mean: -(2-L'-2-L )/2 Assuming L>>L'� Range: approx. - 2-L' = Et = 0 Define ? = 2-L' � = Mean = E[Et] = -?/2 E[Et2] = ?2/3 Var[Et] = E[(Et- �)2] = ?2/12 Noise power determined by variance but non-zero mean creates DC bias

28. TYPE 2: Round to nearest Round to nearest is what we normally think of when say round rtn in two's complement 010.01 (2.25)10 ? 010 (2)10 Error = -0.25 101.11 (-2.25)10 ? 110 (-2)10 Error = +0.25

29. Implementing round to nearest (rtn) in hardware Two methods Method 1: Add '1' in position one digit right of new LSB (i.e. digit L'+1) and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + 1 = yk-1 yk-2 .. y1 y0. Method 2: Add the value of the digit one position to right of new LSB (i.e. digit L'+1) into the new LSB digit (i.e. digit L') and keep only L' fractional bits xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L + x-1 yk-1 yk-2 .. y1 y0.

30. Bias in two's complement round to nearest Assuming all combinations of positive and negative values of x equally possible, average error is +0.125 in this example Smaller average error than truncation, but still not symmetric error We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias always Also have the problem that you can get overflow if only allocate K' = K integral bits Example: rtn(011.10) ? 100 : overflow! This overflow only occurs on positive numbers near the maximum positive value, not on negative numbers As long as final output value (of final addition chain) is in representable range, this temporary overflow is fine. If this is the final stage, or a stage where correct value is necessary, then overflow detect or saturation logic is needed.

31. Statistics of Round to Nearest Error distribution (assuming random input) Range: - � (2-L'-2-L) = Et = � (2-L') Mean: 2-L /2 CAVEAT: mean depends heavily on how often exactly HALFWAY BETWEEN two representable values. This depends on the relative distance between L and L' Experimental mean may be smaller (absolutely) due to non-occurrence of values exactly halfway between two representable values Assuming L>>L' Range: approx. - � (2-L') = Et = � (2-L') Define ? = 2-L' � = Mean = E[Et] = 0 (this is an approximation when L is large) E[Et2] = ?2/12 Var[Et] = E[(Et- �)2] = ?2/12 Noise power determined by variance but zero mean indicates no DC bias

32. TYPE 3: Symmetric/Balanced Rounding Round to nearest has almost no bias (i.e. it is zero mean) when L is large Relatively no bias (compared to precision) when L>>L' When L' and L are close, round to nearest becomes more and more relatively "biased" Problem occurs when number to be rounded with L fractional bits is EXACTLY HALFWAY between two numbers which can be represented by L' fractional bits Example: S4.1 to become S3.0 001.1 = 1.510 ? lies exactly between 001 and 010 Using round-to-nearest forces this value to take the largest value (toward +8) See previous slides When L and L' are close, this halfway point occurs more frequently Worst-case scenario: L' = L - 1 000.0 ? 000. error = 0 000.1 ? 001. error = +0.5 001.0 ? 001. error = 0 001.1 ? 010. error = +0.5 Mean error = +0.25 ? large DC bias Also remember truncation performs the opposite when when number to be rounded with L fractional bits is EXACTLY HALFWAY between two numbers which can be represented by L' fractional bits Using truncation forces this value to take smallest value (toward -8) When L' is near L use symmetric or balanced rounding schemes Takeaway: When L >> L' typically do not see round-to-nearest show a significant bias When L and L' are close you see a bias because exact halfway value occurs regularly

33. Xilinx DSP48 Symmetric Rounding

34. SQNR Analysis Signal-to-quantization noise ratio determines the effect of quantization noise on the system The SQNR should be above the required SNR of the communication system You do not want fixed-point effects determining the performance of your system SQNR = 20 log10 (std[yfloat(n] / std[e(n)]) dB, where std = standard deviation = 10 log10 (var[yfloat(n]) / var[e(n)]) dB, where var = variance Mean error = mean[e(n)] Can also use yfixed(n) as reference level

35. Output (Single Quantizer) Round-to-nearest Measured mean: -2.30e-008 Theoretical mean: 0 ? based on equations from earlier slides (assuming L>>L' and L large) Measured SQNR: 90.89 dB Theoretical SQNR: 91.18 dB ? based on equations from earlier slides Truncation Measured mean: -1.52e-005 Theoretical mean: -1.53e-005 ? based on equations from earlier slides (assuming L>>L') Measured SQNR: 91.46 dB Theoretical SQNR: 91.18 dB? based on equations from earlier slides Proves that the calculations previously can be used to model an actual DSP system


37. Lecture 3 Highlights Lecture 3 began with a discussion of modeling fixed point systems in Matlab, including quantization and wraparound Lecture 3 also discussed FIR filters in detail, discussing 7 FIR structures: 1) Direct Form FIR Filters, 2) Linear-Phase FIR Filters, 3) Transpose / Data Broadcast FIR Filters, 4) Pipelined FIR Filters, 5) Parallel FIR Filters, 6) Fast Parallel FIR Filters (Duhamel), 7) Serial/Multi-Cycle FIR Filters In these structures other notions were introduced in the slides or class notes: Importance/derivation of linear phase, SFGs and transposition, cutset pipelining, using parallel and pipeline structures in ASICs to reduce power

38. Quantization: Fixed Point to Fixed Point Quantize a fixed point value Sinf.L1 to a fixed point value Sinf.L2, where inf = infinite number of integer bits (hence infinite total bits) Obviously not infinite, but used to denote fact that we do not take into account integer bits Matlab A_fxp1 = fixed point signal with L1 frac bits L2 = number of fractional bits of quantized result A_fxp2 = floor(A_fxp1*2^L2)/2^L2; ? truncation A_fxp2 = floor(A_fxp1*2^L2 + 0.5)/2^L2; ? round to nearest Looks the same as for floating point conversion No dependence on L1 (as long as L1 >= L2)

39. Wraparound Example % multiplication example A = -1; B = 0.875; N = 4; L = 3; K=N-L; C = A * B; % compute multiplication to produce C = Sinf.6 number C_quant = floor(C*2^L + 0.5)/2^L; % round to C_quant = Sinf.3 number %check wraparound C_quant_wrap = C_quant; while C_quant_wrap < -(2^(K-1)) C_quant_wrap = C_quant_wrap + 2^K; end while C_quant_wrap > (2^(K-1) - 2^-L) C_quant_wrap = C_quant_wrap - 2^K; end

40. FIR Filter Difference Equation FIR filter defined by difference equation FIR = finite impulse response M-tap filter M "taps" or coefficients Often h(i) written as hi Different ways of implementing FIR filter in hardware

41. 1) Direct Form FIR Filters M-tap FIR filter in direct form Critical path: TA = delay through adder TM = delay through multiplier Critical path delay: 1 TM +(M-1) TA Area: M-1 registers M multipliers M-1 adders Latency: Latency is number of cycles between x(0) and y(0), x(1) and y(1), etc. 0 cycles latency Arithmetic complexity of M-tap filter modeled as: M multiplications/sample + M-1 adds/sample

42. 2) Linear Phase FIR Filters Linear phase filter occurs when h(n) = +/- h(M-1-n). M can be odd or even. Linear phase filters are used when constant group delay is needed Linear phase structures can be designed to save area Example: M even Critical path: TA = delay through adder TM = delay through multiplier Critical path delay: 1 TM +(M/2) TA Area: M-1 registers M/2 multipliers M-1 adders

43. 3) Direct Form Transpose Filters FIR filter can be decomposed into a signal flow graph Nodes Edges SFG transposition rule: "Reversing the direction of an SFG and interchanging the input and output ports preserves the functionality of the system." Transposition to direct form filter results in direct form transpose filter, also called data broadcast structure

44. 4) Pipelined FIR Filters Example: coarse-grain pipelining for direct form filter Pipelining generally only valid for feed-forward cutsets of a SFG Feedback structures will be covered later

45. 5) Parallel FIR Filters Parallel processing maintains overall sample throughput while reducing clock rate Useful: when input/output bottlenecks exist

46. 6) Fast Parallel FIR Filters Direct form and transpose form structures (running at the same rate) with M taps require M multiplications/sample and M-1 adds/sample Methods exist to reduce this complexity by parallel processing and subexpression sharing. In the 2-parallel structure above, two inputs arrive at half the original clock rate and are processed in parallel by three ceil(M/2)-tap filters [ceil() is the ceiling function] Arithmetic complexity of the 2-parallel filter is approximately: 3 x M/2 multiplications / two samples + 3 x (M/2-1) adds / two samples + 4 adds / two samples = 3/4 M multiplications/sample + (3M/4 + 1/2) adds/sample If power is dominated by multipliers, 25% power savings over traditional structures!

47. 7) Serial / Multi-Cycle Trade off area for speed Parallel filter: M multipliers, output ready in one cycle Serial filter: 1 multiplier, output ready in M cycles


49. Lecture 4 Highlights Lecture 4 began with a discussion of FIR implementation issues, either in slides or in class notes. Some issues include: FIR scaling (3 methods)�an example done in class CSD filters, carry-save arithmetic, and the combination of CSD with carry-save for efficient fixed-coefficient FIR filters in ASICs Lecture 4 also discussed DSP-specific hardware for Xilinx FPGAs, particularly the DSP48 block Block diagram, functionality of arithmetic components, muxes, and registers Configurability of DSP48 blocks to implement different math structures FIR filter structures using DSP48 blocks: MACC/serial FIR filter, transpose FIR filter, systolic FIR filter, multirate FIR filters

50. FIR Scaling Scale input x(n) to prevent overflow on y(n) Three methods: Method 1: Prevent Overflow (L1 Scaling) Method 2: Narrowband signal scaling Method 3: Energy scaling (L2 Scaling) To find scale factor Decide on number integer bits for Y ? say K' integer bits Find Ymax based on one of three methods Scale input such that Ymax bits fits in K' integer bits In terms of SQNR: Method 1 < Method 2 < Method 3 i.e. Method 1 introduces most noise, then method 2, then method 3

51. Scaling Example How can we guarantee y(n) requires only one integer bit? Must prescale x(n) such that its maximum amplitude after scaling (Ax) fits one of the scaling techniques

52. Ripple-Carry Carry Propagate Adder (CPA) Critical path is n full adders

53. Carry Save Adder (CSA) Critical path is 1 full adder, regardless of the number of bits n! Also called 3:2 compressor

54. CSD Conversion Example Number is -0.55078125 = -(2-1) - (2-4) + (2-6) - (2-8)

55. DSP48 Slice: Virtex 4

56. Direct Form Transpose FIR Parallel Filter Transpose filter is suited to DSP48 Uses dedicated rounding channels between slices

57. Systolic FIR Parallel Filter Xilinx notation: "systolic filter" This is a direct form FIR filter plus some additional cutset pipeline stages


59. Lecture 5 Highlights Lecture 5 studied IIR filters and showed various structures for IIR filters IIR filters are feedback structures and cannot be pipelined naively Lecture 5 also discussed iteration bound, which sets the minimum clock period a recursive DFG can operate at Critical path delay cannot be smaller than iteration bound for a recursive DFG Two ways of computing iteration bound: 1) Longest Path Matrix algorithm 2) Minimum Cycle Mean Algorithm. You should be able to calculate iteration bound using both methods. Lecture 5 also discussed methods to pipeline higher-order IIR filters. Two methods were introduced: clustered lookahead and scattered lookahead Lecture 5 concluded with class notes on 3 IIR implementation issues: datapath quantization, coefficient quantization, and limit cycles

60. IIR Filter Difference Equation IIR filter defined by difference equation IIR = finite impulse response N feedback coefficients (N = order of the system) determine poles in system Poles outside unit circle of H(z) cause instability M +1 feedforward coefficients In the z domain:

61. Loop Bound In this lecture we focus on recursive DFGs Loop bound A recursive DFG has one or more loops A loop bound for the L-th loop is defined as tL / wL tL is the loop computation time wL is the number of delays in the loop Iteration bound T8 Iteration bound is the maximum loop bound of all loops in the DFG The loop that gives the iteration bound is called the critical loop The iteration bound determines the minimum critical path of a recursive system represented by that DFG structure! In other words, no matter how you pipeline or retime the DFG, you cannot get a circuit with lower critical path than the iteration bound! Conversely, if the current critical path of your DFG is greater than the iteration bound then (assuming pipelining and/or retiming can break any computational block cleanly) you can transform your DFG such that the critical path exactly equals the iteration bound

62. Example of Iteration Bound Loops Loop 1: ADBA Loop bound = 4/2 Loop 2: AECBA Loop bound = 5/3 Loop 3: AFCB Loop bound = 5/4 Critical Loop Loop 1 Iteration Bound Max{4/2,5/3,5/4} = 4/2 = 2 T8=2 units of time. That is the minimum clock period (max frequency) this circuit can operate at after pipelining and retiming

63. Computation of Iteration Bound Iteration bound can be computed by hand, as described earlier However, in systems with a large number of loops, this can be prohibitive Two methods to compute iteration bound using matrix manipulation 1) Longest Path Matrix algorithm 2) Minimum Cycle Mean Algorithm The following slides come from Dr. Parhi at: http://www.ece.umn.edu/users/parhi/slides.html

64. Pipelining in higher order IIR filters

65. 2) Pipelining in higher order IIR filters: Clustered Lookahead

66. 3) Pipelining in higher order IIR filter: Scattered Lookahead

67. 3) Pipelining in higher order IIR filter: Scattered Lookahead

68. Datapath quantization Analysis performed in class Takeaway: Location of the IIR filter poles affects the impact of datapath (i.e. multiplier) quantization on the system SQNR. Poles closer to the unit circle may negatively affect SQNR depending on scaling method.

69. Coefficient quantization Analysis performed in class Takeaway: To reduce sensitivity to coefficient quantization in IIR filters, use a cascade of second order sections (not direct form structure or its variants)! FIR coefficient quantization does not have as detrimental of effect (only magnitude response affected)

70. Limit Cycles Analysis performed in class Takeaway: There may exist dead-bands in IIR filters which a filter output never goes to zero, even after the input becomes zero, but oscillates around a certain small value forever ? limit cycle


72. Lecture 6 Highlights Lecture 6 began with a study of the Discrete Time Fourier Transform (DTFT) and continued to a sample version of the DTFT, called the Discrete Fourier Transform The FFT was introduced as a computationally-efficient mechanism to implement the DFT Radix-2 FFT (DIF and DIT) Radix-4 FFT (DIF and DIF) Finally, various implementation issues were discussed including FFT architectures (serial, parallel, pipeline, etc.) and bit-level issues

73. DFT Properties N-point DFT maps N inputs in time domain: x(0) through x(N-1) to N output output in frequency domain: X(0) to X(N-1) Inherent block processing If x(n) is real then X(N-k) = X*(k) for k=1 to N/2-1 where * denotes complex conjugate |X(k)| is symmtric about k=N/2 Even if x(n) is real, X(k) is usually complex (except for X(0) and X(N/2)) X(0) denotes DC frequency, X(N/2) denotes Fs/2 frequency

74. Discrete Fourier Transform Computations The DFT equation: WN's are also called "twiddle factors" which are complex values around the unit circle in the complex plane Multiplication by twiddle factor serve to "rotate" a value around the unit circle in the complex plane Computations: To compute X(0), require N complex multiplications To compute X(1), require N complex multiplications ..To compute X(N-1), require N complex multiplications Total: N2 complex multiplications and N2 � N complex additions to compute N-point DFT

75. Fast Fourier Transform Can exploit shared twiddle factor properties (i.e. sub-expression sharing) to reduce the number of multiplications in DFT These class of algorithms are called Fast Fourier Transforms An FFT is simply an efficient implementation of the DFT Mathematically FFT = DFT FFT exploits two properties in the twiddle factors: Symmetry Property: Periodicity Property: FFTs use a divide and conquer approach, breaking an N-point DFT into several smaller DFTs N can be factored as N=r1r2r2�rv where the {ri} are prime Particular focus on r1=r2=..=rv=r, where r is called the radix of the FFT algorithm In this case N=rv and the FFT has a regular pattern We will study radix-2 (r=2) and radix-4 (r=4) FFTs in this class

76. Decimation-in-time Radix-2 FFT Split x(n) into even and odd samples and perform smaller FFTs f1(n) = x(2n) f2(n) = x(2n+1) n=0, 1, � N/2-1 Derivation performed in class Radix-2 Decimation-in-time (DIT) algorithm In radix-2, the "butterfly" element takes in 2 inputs and produces 2 outputs Butterfly implements 2-point FFT Computations: (N/2)log2N complex multiplications Nlog2N complex additions

77. Decimation-in-time Radix-2 FFT (N=8)

78. Decimation-in-frequency Radix-2 FFT Decompose X(k) such that it is split into FFT of points 0 to N/2-1 and points N/2 to N-1 Then decimate X(k) into even and odd numbered samples Derivation performed in class Radix-2 Decimation-in-frequency (DIF) algorithm In radix-2, the "butterfly" element takes in 2 inputs and produces 2 outputs Butterfly implements 2-point FFT Computations: (N/2)log2N complex multiplications Nlog2N complex additions

79. Decimation-in-frequency Radix-2 FFT (N=8)

80. Radix-4 FFT In radix-2 you have log2N stages Can also implement radix-4 and now have log4N stages Radix-4 Decimation-in-time: split x(n) into four time sequences instead of two Derivation performed in class Split x(n) into four decimated sample streams f1(n) = x(4n) f2(n) = x(4n+1) f3(n) = x(4n+2) f4(n) = x(4n+3) n=0, 1, .. N/4-1 Radix-4 Decimation-in-time (DIT) algorithm In radix-4, the "butterfly" element takes in 4 inputs and produces 4 outputs Butterfly implements 4-point FFT Computations: (3N/4)log4N = (3N/8)log2N complex multiplications ? decrease from radix-2 algorithms (3N/2)log2N complex additions ? increase from radix-2 algorithms Downside: can only deal with FFTs of a factor of 4, such as N=4, 16, 64, 256, 1024, etc.

81. Parallel Implementation Implement entire FFT structure in a parallel fashion Advantages: Control is easy (i.e. no controller), low latency (i.e. 0 cycles in this example), customize each twiddle factor as a multiplication by a constant Disadvantages: Huge Area, Routing congestion

82. Serial/In-Place FFT Implementation Implement a single butterfly. Use that butterfly and some memory to compute entire FFT Advantages: Small area Disadvantages: Large latency, complex controller

83. Pipeline FFT Pipeline FFT is very common for communication systems (OFDM, DMT) Implements an entire "slice" of the FFT and reuses-hardware to perform other slices Advantages: Particularly good for systems in which x(n) comes in serially (i.e. no block assembly required), very fast, more area efficient than parallel, can be pipelined Disadvantages: Controller can become complicated, large intermediate memories may be required between stages, latency of N cycles (more if pipelining introduced)

ECE 699 Digital Signal Processing Hardware Implementations Lecture 7

ECE 699 Digital Signal Processing Hardware Implementations Lecture 7

Presentation Transcript

ECE 699 Digital Signal Processing Hardware Implementations Lecture 6

EcE 5013 Digital Signal Processing

ECE 699 Digital Signal Processing Hardware Implementations Lecture 5

Digital Signal Processing

EcE 5013 Digital Signal Processing

Lecture 7: Signal Processing IV

Digital Signal Processing

Digital Signal Processing

Digital Signal Processing

Digital Signal Processing

Digital Signal Processing

EE311 - Digital Signal Processing (Lecture # 04)

DIGITAL SIGNAL PROCESSING

Digital Signal Processing – Chapter 7

Digital signal Processing

EE311 - Digital Signal Processing (Lecture # 18)

Digital Signal Processing

Digital Signal Processing

EE311 - Digital Signal Processing (Lecture # 16)

Digital Signal Processing II Lecture 7: Modulated Filter Banks

Digital Signal Processing II Lecture 7: Modulated Filter Banks

Digital Signal Processing