830 likes | 1.3k Views
Outline. Midterm specificsMidterm review. Midterm Specifics. Midterm Specifics. The midterm will take place in class on Wednesday, March 25, during the class period from 4:30 pm ? 7:10 pm.The midterm will be open-book, open-notes.You can bring any textbooksNo electronic devices (cell phones, PD
E N D
1. ECE 699—Digital Signal Processing Hardware ImplementationsLecture 7
Midterm Review
3/17/09
2. Outline Midterm specifics
Midterm review
3. Midterm Specifics
4. Midterm Specifics The midterm will take place in class on Wednesday, March 25, during the class period from 4:30 pm – 7:10 pm.
The midterm will be open-book, open-notes.
You can bring any textbooks
No electronic devices (cell phones, PDAs, laptops, etc.) are allowed ? you must PRINT OUT all notes on paper.
Do not come to class with your notes in electronic form on your laptop, as you will not be allowed to use your laptop.
5. Midterm Content The midterm will have between 4-6 questions
Some questions will be calculation-oriented
Some questions will be VHDL/Matlab/design-oriented
Some questions will be short answer
The midterm will cover all material through today’s lecture including slides, notes from class, textbook, etc.
If you missed notes from class, try to get them from one of your classmates.
In the following slides we will highlight topics from each lecture
Of course, all material (discussed today or not) will be covered on the midterm
6. Lecture 1 Highlights
7. Lecture 1 Highlights Lecture 1 gave an introduction to DSP systems in hardware and covered fixed-point Representations
Covered unsigned vs. signed representations and maximum representable range
Discussed addition and overflow concerns and how to model these in VHDL
Discussed multiplication range and VHDL coding
8. Unsigned Binary Representation K integer bits, L fractional bits
More integer bits, the larger the maximum representable value
Larger the L, the greater the precision
Notation UN.L
U indicates unsigned
N = number of total bits (i.e. K + L)
L = number of fractional bits
This is the notation used in fixed-point Matlab
In hardware, there is no such thing as a "binary point" (i.e. decimal point)
Designer must keep tack of appropriate binary point
9. Maximum Representable Range UN.L unsigned number has decimal range:
Minimum: 0
Maximum: 2K-2-L = 2N-1 / 2L which is obtained when xi=1 for all i
Exact representable range: 0 = X = 2K-2-L
Rule of thumb: range of number X in UN.L notation
0 = X < 2K
The number of integer bits K largely determines the maximum representable range
10. Unsigned Binary Maximum Representable Range Examples U7.0 ? rule of thumb: 0 = X < 128 (K=7)
min: 0000000 = (0)10
max: 1111111 = (27-1 / 20 = 127)10
U5.3 ? rule of thumb: 0 = X < 4 (K=2)
min: 00.000 = (0)10
max: 11.111 = 25-1 / 23 = 3.875
U5.4 ? rule of thumb: 0 = X < 2 (K=1)
min: 0.0000 = (0)10
max: 1.1111 = 25-1 / 24 = 1.9375
U5.7 ? rule of thumb: 0 = X < 0.25 (K=-2)
min: .[00]00000 = (0)10
max: .[00]11111 = 25-1 / 27 = 0.2421875
11. Two's Complement Notation K integer bits, L fractional bits
More integer bits, the larger the maximum representable value
Larger the L, the greater the precision
Notation SN.L
S indicates signed two's complement
N = number of total bits (i.e. K + L)
L = number of fractional bits
This is the notation used in fixed-point Matlab
In hardware, there is no such thing as a "binary point" (i.e. decimal point)
Designer must keep tack of appropriate binary point
12. Maximum Representable Range SN.M unsigned number has decimal range:
Minimum: -2K-1 = -2(N-L-1) which is obtained when xi=1 for i=K-1 and xi=0 otherwise
Minimum value is the largest negative value
Maximum: 2K-1-2-L = 2N-1-1 / 2L which is obtained when xi=0 for i=K-1 and xi=1 otherwise
Maximum value is the largest positive value
Exact representable range: -2K-1 = X = 2K-1-2-L
Range is assymetric
Rule of thumb: range of number X in SN.L notation
-2-K-1 = X < 2K-1
K = N – L, the number of integer bits
The number of integer bits K largely determines the maximum representable range
13. Unsigned Addition vs. Signed Addition Hardware for unsigned adder is the same as for signed adder (except overflow detection)!
14. Out of Range/Overflow Detection
15. Multiplication Output Range Multiplication: A x B = C
Unsigned multiplication
UN.L x UN'.L' = U(N+N').(L+L') number
U4.3 x U5.4 = U9.7 number
Example:
Binary: 1.101 x 1.1001 = 10.1000101
Decimal: 1.625 x 1.5625 = 2.5390625
Signed multiplication (two's complement)
SN.L x SN'.L' = (N+N').(L+L') number
S4.3 x S4.3 = S8.6 number
Example:
Binary: 1.000 x 1.000 = 01.000000
Decimal: -1 x -1 = 1
Binary: 0.111 x 0.111 = 00.110001
Decimal: 0.875 x 0.875 = 0.765625
Binary: 1.000 x 0.111 = 11.001000
Decimal: -1 x 0.875 = -0.875
NOTE: Only need K+K' integer bits when A and B are their most negative allowable values
If A and B can be restricted such that they do not reach most negative allowable values, then only need K+K'-1 integer bits, i.e. output result is S(N+N'-1).(L+L')
Save one MSB: this is a useful trick often using in DSP systems to reduce wordlengths!
16. Multiplication Example in VHDL
17. Lecture 2 Highlights
18. Lecture 2 Highlights Lecture 2 continued the study of fixed-point representations, particularly
Two’s complement subtraction
The modulo property of two’s complement—very important
wordlength determination for adding multiple signals
The lecture continued with a discussion of quantization of three types
Truncation
Round-to-nearest
Symmetric rounding
Finally it discussed the effects of quantization on SQNR
Notes in class were given to derive the error of each quantizer and its effect on the SQNR
19. Two's Complement Wraparound Property Temporary wraparounds are fine as long as final value is in the correct dynamic range:
Example S4.0: add (-8 + -6) + 7 = -7
Step 1: 1000 + 1010 = 0010
Should be (-14)10 not (+2)10 ? wraparound/overflow
Step 2: 0010 + 0111 = 1001
Final result is correct: (-7)10
If final result guaranteed to be in the correct dynamic range [-8,+7] then intermediate wraparounds are fine
20. Modulo Addition This works because of the modulo property
Consider addition with modulus M
Y = (A + B)modM = (AmodM + BmodM)modM
Y = (A + B + C)modM = ((A + B)modM + CmodM)modM = ((AmodM + BmodM)modM + CmodM)modM
etc.
Two's complement has M=2K modulus(-like) property
Specifically, if correct result of sum is out of range
Overflow: subtract modulus M from correct result to obtain represented two's complement number
Underflow: add modulus M to correct result to obtain represented two's complement number
See slide 98 of Lecture 1
In a series of additions/subtractions (multiplication is simply a series of additions/subtractions) as long as the final result is within output dynamic range can ignore temporary overflows
Say Y is an SN'.L' number (K' = N' – L') and the results of an addition chain
As long as it is guaranteed that -2K'-1 = Y = 2K'-1 – 2-L' then temporary overflows do not affect final output
Rule of thumb: K determines maximum representable range, since -2K'-1 = Y < 2K'-1
Intermediate truncations/rounding in the addition and multiplication only add quantization noise to the result
More on this later
21. FIR Filter Example: Preventing Overflow
22. FIR Filter Example: Bounded Output Assume y(n) is guaranteed to be bounded to be within -1 = y(n) < 1
Final y(n) is an S12.11 number
We will deal with quantization of fractional bits later
Save several bits on each adder!
Large savings seen for longer filters
23. Computing wordlength of Y: Unsigned Addition Minimum value Y can attain is 0
Step 1: Maximum value Y can attain is:
Ymax = X1(max) + X2(max) + … XM(max)
Step 2: Maximum representable value of UN'.L' number:
To preserve precision in addition, set L' = max(Li)
Maximum representable value of a UN'.L' number is (2N' – 1) / 2L' =
Step 3: Set maximum representable value of UN'.L' number = Ymax. Solve for N':
24. Example Xi = U7.4 , X2 = U8.4, X3 = U9.3, X4 = U6.2
Solution:
L' = 4
Ymax = 103.5
Solving equation, 2N' = 1657 ? N' = 11
Y is a S11.4 number
25. Computing Wordlength of Y: Signed Addition Step 1a: Maximum (most positive) value Y can attain is:
Ymax = X1(max) + X2(max) + … XM(max)
Step 1b: Minimum (most negative) value Y can attain is:
Ymin = X1(min) + X2(min) + … XM(min)
Step 2: Max/min representable value of SN'.L' number:
To preserve precision in addition, set L' = max(Li)
Maximum representable value of an SN'.L' number: (2N'-1 – 1) / 2L'
Minimum representable value of SN'L' number: -2N'-1 / 2L'
Step 3: Set maximum representable value = Ymax, and min representable value = Ymin. Solve for N' that satisfies both:
26. TYPE 1: Truncation Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1. Example L'=0
xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L
= yk-1 yk-2 .. y1 y0.
Truncation in two's complement results in a number trunc(x) that is always smaller than x. This is also called round towards -8 or downward-directed rounding:
011.10 (3.5)10 ? 011 (3)10
Error = -0.5
100.01 (-3.5)10 ? 100 (-4)10
Error = -0.5
27. Statistics of Truncation Error distribution (assuming random input)
Range: - (2-L'-2-L) = Et = 0
Mean: -(2-L'-2-L )/2
Assuming L>>L'…
Range: approx. - 2-L' = Et = 0
Define ? = 2-L'
µ = Mean = E[Et] = -?/2
E[Et2] = ?2/3
Var[Et] = E[(Et- µ)2] = ?2/12
Noise power determined by variance but non-zero mean creates DC bias
28. TYPE 2: Round to nearest Round to nearest is what we normally think of when say round
rtn in two's complement
010.01 (2.25)10 ? 010 (2)10
Error = -0.25
101.11 (-2.25)10 ? 110 (-2)10
Error = +0.25
29. Implementing round to nearest (rtn) in hardware Two methods
Method 1: Add '1' in position one digit right of new LSB (i.e. digit L'+1) and keep only L' fractional bits
xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L
+ 1
= yk-1 yk-2 .. y1 y0.
Method 2: Add the value of the digit one position to right of new LSB (i.e. digit L'+1) into the new LSB digit (i.e. digit L') and keep only L' fractional bits
xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L
+ x-1
yk-1 yk-2 .. y1 y0.
30. Bias in two's complement round to nearest Assuming all combinations of positive and negative values of x equally possible, average error is +0.125 in this example
Smaller average error than truncation, but still not symmetric error
We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive error bias always
Also have the problem that you can get overflow if only allocate K' = K integral bits
Example: rtn(011.10) ? 100 : overflow!
This overflow only occurs on positive numbers near the maximum positive value, not on negative numbers
As long as final output value (of final addition chain) is in representable range, this temporary overflow is fine. If this is the final stage, or a stage where correct value is necessary, then overflow detect or saturation logic is needed.
31. Statistics of Round to Nearest Error distribution (assuming random input)
Range: - ˝ (2-L'-2-L) = Et = ˝ (2-L')
Mean: 2-L /2
CAVEAT: mean depends heavily on how often exactly HALFWAY BETWEEN two representable values. This depends on the relative distance between L and L'
Experimental mean may be smaller (absolutely) due to non-occurrence of values exactly halfway between two representable values
Assuming L>>L'
Range: approx. - ˝ (2-L') = Et = ˝ (2-L')
Define ? = 2-L'
µ = Mean = E[Et] = 0 (this is an approximation when L is large)
E[Et2] = ?2/12
Var[Et] = E[(Et- µ)2] = ?2/12
Noise power determined by variance but zero mean indicates no DC bias
32. TYPE 3: Symmetric/Balanced Rounding Round to nearest has almost no bias (i.e. it is zero mean) when L is large
Relatively no bias (compared to precision) when L>>L'
When L' and L are close, round to nearest becomes more and more relatively "biased"
Problem occurs when number to be rounded with L fractional bits is EXACTLY HALFWAY between two numbers which can be represented by L' fractional bits
Example: S4.1 to become S3.0
001.1 = 1.510 ? lies exactly between 001 and 010
Using round-to-nearest forces this value to take the largest value (toward +8)
See previous slides
When L and L' are close, this halfway point occurs more frequently
Worst-case scenario: L' = L - 1
000.0 ? 000. error = 0
000.1 ? 001. error = +0.5
001.0 ? 001. error = 0
001.1 ? 010. error = +0.5
Mean error = +0.25 ? large DC bias
Also remember truncation performs the opposite when when number to be rounded with L fractional bits is EXACTLY HALFWAY between two numbers which can be represented by L' fractional bits
Using truncation forces this value to take smallest value (toward -8)
When L' is near L use symmetric or balanced rounding schemes
Takeaway: When L >> L' typically do not see round-to-nearest show a significant bias
When L and L' are close you see a bias because exact halfway value occurs regularly
33. Xilinx DSP48 Symmetric Rounding
34. SQNR Analysis Signal-to-quantization noise ratio determines the effect of quantization noise on the system
The SQNR should be above the required SNR of the communication system
You do not want fixed-point effects determining the performance of your system
SQNR = 20 log10 (std[yfloat(n] / std[e(n)]) dB, where std = standard deviation
= 10 log10 (var[yfloat(n]) / var[e(n)]) dB, where var = variance
Mean error = mean[e(n)]
Can also use yfixed(n) as reference level
35. Output (Single Quantizer) Round-to-nearest
Measured mean: -2.30e-008
Theoretical mean: 0 ? based on equations from earlier slides (assuming L>>L' and L large)
Measured SQNR: 90.89 dB
Theoretical SQNR: 91.18 dB ? based on equations from earlier slides
Truncation
Measured mean: -1.52e-005
Theoretical mean: -1.53e-005 ? based on equations from earlier slides (assuming L>>L')
Measured SQNR: 91.46 dB
Theoretical SQNR: 91.18 dB? based on equations from earlier slides
Proves that the calculations previously can be used to model an actual DSP system
36. Lecture 3 Highlights
37. Lecture 3 Highlights Lecture 3 began with a discussion of modeling fixed point systems in Matlab, including quantization and wraparound
Lecture 3 also discussed FIR filters in detail, discussing 7 FIR structures:
1) Direct Form FIR Filters, 2) Linear-Phase FIR Filters, 3) Transpose / Data Broadcast FIR Filters, 4) Pipelined FIR Filters, 5) Parallel FIR Filters, 6) Fast Parallel FIR Filters (Duhamel), 7) Serial/Multi-Cycle FIR Filters
In these structures other notions were introduced in the slides or class notes:
Importance/derivation of linear phase, SFGs and transposition, cutset pipelining, using parallel and pipeline structures in ASICs to reduce power
38. Quantization: Fixed Point to Fixed Point Quantize a fixed point value Sinf.L1 to a fixed point value Sinf.L2, where inf = infinite number of integer bits (hence infinite total bits)
Obviously not infinite, but used to denote fact that we do not take into account integer bits
Matlab
A_fxp1 = fixed point signal with L1 frac bits
L2 = number of fractional bits of quantized result
A_fxp2 = floor(A_fxp1*2^L2)/2^L2; ? truncation
A_fxp2 = floor(A_fxp1*2^L2 + 0.5)/2^L2; ? round to nearest
Looks the same as for floating point conversion
No dependence on L1 (as long as L1 >= L2)
39. Wraparound Example % multiplication example
A = -1; B = 0.875; N = 4; L = 3; K=N-L;
C = A * B; % compute multiplication to produce C = Sinf.6 number
C_quant = floor(C*2^L + 0.5)/2^L; % round to C_quant = Sinf.3 number
%check wraparound
C_quant_wrap = C_quant;
while C_quant_wrap < -(2^(K-1))
C_quant_wrap = C_quant_wrap + 2^K;
end
while C_quant_wrap > (2^(K-1) - 2^-L)
C_quant_wrap = C_quant_wrap - 2^K;
end
40. FIR Filter Difference Equation FIR filter defined by difference equation
FIR = finite impulse response
M-tap filter
M "taps" or coefficients
Often h(i) written as hi
Different ways of implementing FIR filter in hardware
41. 1) Direct Form FIR Filters M-tap FIR filter in direct form
Critical path:
TA = delay through adder
TM = delay through multiplier
Critical path delay: 1 TM +(M-1) TA
Area:
M-1 registers
M multipliers
M-1 adders
Latency:
Latency is number of cycles between x(0) and y(0), x(1) and y(1), etc.
0 cycles latency
Arithmetic complexity of M-tap filter modeled as:
M multiplications/sample + M-1 adds/sample
42. 2) Linear Phase FIR Filters Linear phase filter occurs when h(n) = +/- h(M-1-n). M can be odd or even.
Linear phase filters are used when constant group delay is needed
Linear phase structures can be designed to save area
Example: M even
Critical path:
TA = delay through adder
TM = delay through multiplier
Critical path delay: 1 TM +(M/2) TA
Area:
M-1 registers
M/2 multipliers
M-1 adders
43. 3) Direct Form Transpose Filters FIR filter can be decomposed into a signal flow graph
Nodes
Edges
SFG transposition rule: "Reversing the direction of an SFG and interchanging the input and output ports preserves the functionality of the system."
Transposition to direct form filter results in direct form transpose filter, also called data broadcast structure
44. 4) Pipelined FIR Filters Example: coarse-grain pipelining for direct form filter
Pipelining generally only valid for feed-forward cutsets of a SFG
Feedback structures will be covered later
45. 5) Parallel FIR Filters Parallel processing maintains overall sample throughput while reducing clock rate
Useful: when input/output bottlenecks exist
46. 6) Fast Parallel FIR Filters Direct form and transpose form structures (running at the same rate) with M taps require M multiplications/sample and M-1 adds/sample
Methods exist to reduce this complexity by parallel processing and subexpression sharing.
In the 2-parallel structure above, two inputs arrive at half the original clock rate and are processed in parallel by three ceil(M/2)-tap filters [ceil() is the ceiling function]
Arithmetic complexity of the 2-parallel filter is approximately:
3 x M/2 multiplications / two samples + 3 x (M/2-1) adds / two samples + 4 adds / two samples
= 3/4 M multiplications/sample + (3M/4 + 1/2) adds/sample
If power is dominated by multipliers, 25% power savings over traditional structures!
47. 7) Serial / Multi-Cycle Trade off area for speed
Parallel filter: M multipliers, output ready in one cycle
Serial filter: 1 multiplier, output ready in M cycles
48. Lecture 4 Highlights
49. Lecture 4 Highlights Lecture 4 began with a discussion of FIR implementation issues, either in slides or in class notes. Some issues include:
FIR scaling (3 methods)—an example done in class
CSD filters, carry-save arithmetic, and the combination of CSD with carry-save for efficient fixed-coefficient FIR filters in ASICs
Lecture 4 also discussed DSP-specific hardware for Xilinx FPGAs, particularly the DSP48 block
Block diagram, functionality of arithmetic components, muxes, and registers
Configurability of DSP48 blocks to implement different math structures
FIR filter structures using DSP48 blocks: MACC/serial FIR filter, transpose FIR filter, systolic FIR filter, multirate FIR filters
50. FIR Scaling Scale input x(n) to prevent overflow on y(n)
Three methods:
Method 1: Prevent Overflow (L1 Scaling)
Method 2: Narrowband signal scaling
Method 3: Energy scaling (L2 Scaling)
To find scale factor
Decide on number integer bits for Y ? say K' integer bits
Find Ymax based on one of three methods
Scale input such that Ymax bits fits in K' integer bits
In terms of SQNR: Method 1 < Method 2 < Method 3
i.e. Method 1 introduces most noise, then method 2, then method 3
51. Scaling Example How can we guarantee y(n) requires only one integer bit?
Must prescale x(n) such that its maximum amplitude after scaling (Ax) fits one of the scaling techniques
52. Ripple-Carry Carry Propagate Adder (CPA) Critical path is n full adders
53. Carry Save Adder (CSA) Critical path is 1 full adder, regardless of the number of bits n!
Also called 3:2 compressor
54. CSD Conversion Example Number is -0.55078125 = -(2-1) - (2-4) + (2-6) - (2-8)
55. DSP48 Slice: Virtex 4
56. Direct Form Transpose FIR Parallel Filter Transpose filter is suited to DSP48
Uses dedicated rounding channels between slices
57. Systolic FIR Parallel Filter Xilinx notation: "systolic filter"
This is a direct form FIR filter plus some additional cutset pipeline stages
58. Lecture 5 Highlights
59. Lecture 5 Highlights Lecture 5 studied IIR filters and showed various structures for IIR filters
IIR filters are feedback structures and cannot be pipelined naively
Lecture 5 also discussed iteration bound, which sets the minimum clock period a recursive DFG can operate at
Critical path delay cannot be smaller than iteration bound for a recursive DFG
Two ways of computing iteration bound: 1) Longest Path Matrix algorithm 2) Minimum Cycle Mean Algorithm. You should be able to calculate iteration bound using both methods.
Lecture 5 also discussed methods to pipeline higher-order IIR filters. Two methods were introduced: clustered lookahead and scattered lookahead
Lecture 5 concluded with class notes on 3 IIR implementation issues: datapath quantization, coefficient quantization, and limit cycles
60. IIR Filter Difference Equation IIR filter defined by difference equation
IIR = finite impulse response
N feedback coefficients (N = order of the system)
determine poles in system
Poles outside unit circle of H(z) cause instability
M +1 feedforward coefficients
In the z domain:
61. Loop Bound In this lecture we focus on recursive DFGs
Loop bound
A recursive DFG has one or more loops
A loop bound for the L-th loop is defined as tL / wL
tL is the loop computation time
wL is the number of delays in the loop
Iteration bound T8
Iteration bound is the maximum loop bound of all loops in the DFG
The loop that gives the iteration bound is called the critical loop
The iteration bound determines the minimum critical path of a recursive system represented by that DFG structure!
In other words, no matter how you pipeline or retime the DFG, you cannot get a circuit with lower critical path than the iteration bound!
Conversely, if the current critical path of your DFG is greater than the iteration bound then (assuming pipelining and/or retiming can break any computational block cleanly) you can transform your DFG such that the critical path exactly equals the iteration bound
62. Example of Iteration Bound Loops
Loop 1: ADBA
Loop bound = 4/2
Loop 2: AECBA
Loop bound = 5/3
Loop 3: AFCB
Loop bound = 5/4
Critical Loop
Loop 1
Iteration Bound
Max{4/2,5/3,5/4} = 4/2 = 2
T8=2 units of time. That is the minimum clock period (max frequency) this circuit can operate at after pipelining and retiming
63. Computation of Iteration Bound Iteration bound can be computed by hand, as described earlier
However, in systems with a large number of loops, this can be prohibitive
Two methods to compute iteration bound using matrix manipulation
1) Longest Path Matrix algorithm
2) Minimum Cycle Mean Algorithm
The following slides come from Dr. Parhi at:
http://www.ece.umn.edu/users/parhi/slides.html
64. Pipelining in higher order IIR filters
65. 2) Pipelining in higher order IIR filters: Clustered Lookahead
66. 3) Pipelining in higher order IIR filter: Scattered Lookahead
67. 3) Pipelining in higher order IIR filter: Scattered Lookahead
68. Datapath quantization Analysis performed in class
Takeaway:
Location of the IIR filter poles affects the impact of datapath (i.e. multiplier) quantization on the system SQNR.
Poles closer to the unit circle may negatively affect SQNR depending on scaling method.
69. Coefficient quantization Analysis performed in class
Takeaway:
To reduce sensitivity to coefficient quantization in IIR filters, use a cascade of second order sections (not direct form structure or its variants)!
FIR coefficient quantization does not have as detrimental of effect (only magnitude response affected)
70. Limit Cycles Analysis performed in class
Takeaway:
There may exist dead-bands in IIR filters which a filter output never goes to zero, even after the input becomes zero, but oscillates around a certain small value forever ? limit cycle
71. Lecture 6 Highlights
72. Lecture 6 Highlights Lecture 6 began with a study of the Discrete Time Fourier Transform (DTFT) and continued to a sample version of the DTFT, called the Discrete Fourier Transform
The FFT was introduced as a computationally-efficient mechanism to implement the DFT
Radix-2 FFT (DIF and DIT)
Radix-4 FFT (DIF and DIF)
Finally, various implementation issues were discussed including FFT architectures (serial, parallel, pipeline, etc.) and bit-level issues
73. DFT Properties N-point DFT maps
N inputs in time domain: x(0) through x(N-1)
to N output output in frequency domain: X(0) to X(N-1)
Inherent block processing
If x(n) is real then X(N-k) = X*(k) for k=1 to N/2-1 where * denotes complex conjugate
|X(k)| is symmtric about k=N/2
Even if x(n) is real, X(k) is usually complex (except for X(0) and X(N/2))
X(0) denotes DC frequency, X(N/2) denotes Fs/2 frequency
74. Discrete Fourier Transform Computations The DFT equation:
WN's are also called "twiddle factors" which are complex values around the unit circle in the complex plane
Multiplication by twiddle factor serve to "rotate" a value around the unit circle in the complex plane
Computations:
To compute X(0), require N complex multiplications
To compute X(1), require N complex multiplications
..To compute X(N-1), require N complex multiplications
Total: N2 complex multiplications and N2 – N complex additions to compute N-point DFT
75. Fast Fourier Transform Can exploit shared twiddle factor properties (i.e. sub-expression sharing) to reduce the number of multiplications in DFT
These class of algorithms are called Fast Fourier Transforms
An FFT is simply an efficient implementation of the DFT
Mathematically FFT = DFT
FFT exploits two properties in the twiddle factors:
Symmetry Property:
Periodicity Property:
FFTs use a divide and conquer approach, breaking an N-point DFT into several smaller DFTs
N can be factored as N=r1r2r2…rv where the {ri} are prime
Particular focus on r1=r2=..=rv=r, where r is called the radix of the FFT algorithm
In this case N=rv and the FFT has a regular pattern
We will study radix-2 (r=2) and radix-4 (r=4) FFTs in this class
76. Decimation-in-time Radix-2 FFT Split x(n) into even and odd samples and perform smaller FFTs
f1(n) = x(2n)
f2(n) = x(2n+1)
n=0, 1, … N/2-1
Derivation performed in class
Radix-2 Decimation-in-time (DIT) algorithm
In radix-2, the "butterfly" element takes in 2 inputs and produces 2 outputs
Butterfly implements 2-point FFT
Computations:
(N/2)log2N complex multiplications
Nlog2N complex additions
77. Decimation-in-time Radix-2 FFT (N=8)
78. Decimation-in-frequency Radix-2 FFT Decompose X(k) such that it is split into FFT of points 0 to N/2-1 and points N/2 to N-1
Then decimate X(k) into even and odd numbered samples
Derivation performed in class
Radix-2 Decimation-in-frequency (DIF) algorithm
In radix-2, the "butterfly" element takes in 2 inputs and produces 2 outputs
Butterfly implements 2-point FFT
Computations:
(N/2)log2N complex multiplications
Nlog2N complex additions
79. Decimation-in-frequency Radix-2 FFT (N=8)
80. Radix-4 FFT In radix-2 you have log2N stages
Can also implement radix-4 and now have log4N stages
Radix-4 Decimation-in-time: split x(n) into four time sequences instead of two
Derivation performed in class
Split x(n) into four decimated sample streams
f1(n) = x(4n)
f2(n) = x(4n+1)
f3(n) = x(4n+2)
f4(n) = x(4n+3)
n=0, 1, .. N/4-1
Radix-4 Decimation-in-time (DIT) algorithm
In radix-4, the "butterfly" element takes in 4 inputs and produces 4 outputs
Butterfly implements 4-point FFT
Computations:
(3N/4)log4N = (3N/8)log2N complex multiplications ? decrease from radix-2 algorithms
(3N/2)log2N complex additions ? increase from radix-2 algorithms
Downside: can only deal with FFTs of a factor of 4, such as N=4, 16, 64, 256, 1024, etc.
81. Parallel Implementation Implement entire FFT structure in a parallel fashion
Advantages: Control is easy (i.e. no controller), low latency (i.e. 0 cycles in this example), customize each twiddle factor as a multiplication by a constant
Disadvantages: Huge Area, Routing congestion
82. Serial/In-Place FFT Implementation Implement a single butterfly. Use that butterfly and some memory to compute entire FFT
Advantages: Small area
Disadvantages: Large latency, complex controller
83. Pipeline FFT Pipeline FFT is very common for communication systems (OFDM, DMT)
Implements an entire "slice" of the FFT and reuses-hardware to perform other slices
Advantages: Particularly good for systems in which x(n) comes in serially (i.e. no block assembly required), very fast, more area efficient than parallel, can be pipelined
Disadvantages: Controller can become complicated, large intermediate memories may be required between stages, latency of N cycles (more if pipelining introduced)