540 likes | 1.39k Views
Zero Padding. Most implementations of the FFT require that the length of x(n) be an integer power of 2 (i.e., 4, 8, 16, 32, …). What if x(n) is not an integer power of 2 in length? No problem! Just append zeros to x(n) until a power of 2 is reached.
E N D
Zero Padding Most implementations of the FFT require that the length of x(n) be an integer power of 2 (i.e., 4, 8, 16, 32, …). What if x(n) is not an integer power of 2 in length? No problem! Just append zeros to x(n) until a power of 2 is reached. This is called zero padding, and can be used to increase the FFT frequency resolution even though the number of nonzero samples is limited.
Zero Padding To see that this does in fact work, consider a sequence x(n) consisting of L samples, where L<N. Its DTFT is Since x(n) is zero for samples beyond L-1, we can extend the summation to N-1 without changing the result:
Zero Padding This is the DTFT of the zero-padded sequence, and is exactly the same as the DTFT of the non-zero-padded sequence! We can denote the zero-padded sequence xzp(n):
Zero Padding The DFT of the zero-padded sequence is given by and consists of N samples of the DTFT, spaced at intervals of 2p/N from 0 to 2p.
Zero Padding Example: Suppose we have an FIR filter, whose impulse response sequence has a length of 15 samples. The impulse response is given by: h(n) is plotted on the next slide:
Zero Padding We can use the FFT to find the frequency response of this filter, but first we have to pad it with one zero to make its length 16. This is shown next, followed by the absolute value of the FFT result.
Zero Padding This looks like a lowpass filter, but we’d like better frequency resolution. Let’s zero-pad the impulse response to make it’s length 32. The new (zero-padded) impulse response, and the magnitude of the FFT, are shown next:
Zero Padding The improvement is obvious. Let’s try 512 samples:
Zero Padding Much better! Of course, a longer sequence means more multiplicatios to compute the FFT, so the process slows down. Notice that the zero padded impulse response sequence is no longer symmetric about its center point, but the symmetry of the amplitude response is unchanged. Because of this symmetry, we need only plot the amplitude response from 0 to N/2:
DFT and Convolution Let f(n) and g(n) be sequences of length N. They have N-point DFTs, F(k) and G(k) respectively: A new sequence Y(k) can be obtained by multiplying F(k) and G(k) point-by-point:
DFT and Convolution Then, the sequence y(n) can be obtained by computing the N-point inverse DFT of Y(k) : We obtained Y(k) by multiplying the DFTs of f(n) and g(n) together, then we got y(n) by computing the inverse DFT of Y(k). Remembering that time-domain convolution is equivalent to frequency domain multiplication, we would expect that y(n) obtained in this way is the same as the sequence we would have gotten by convolving f(n) with g(n)
DFT and Convolution However, the sequence obtained by convolving f(n) with g(n) should be 2N-1 samples in length. Our computation of y(n) yielded a sequence of N samples. Therefore, What gives? It turns out that the process we used to compute y(n) is similar to convolution, but not quite the same. It’s called circular convolution.
DFT and Convolution We’ll use a different operator for circular convolution: Circular convolution can be described by the following equation:
DFT and Convolution Here’s what “modulo” means in this context. If P and N are integers, we evaluate “P modulo N” by adding to or subtracting from P enough integer multiples of N so the resulting number is in the range [0, N-1]. For example,
DFT and Convolution To refresh your memory, here’s the formula for ordinary convolution: It turns out that we can compute the regular convolution of two finite-duration sequences like f(n) and g(n) by multiplying their DFTs and taking the inverse DFT if we first zero pad both sequences
DFT and Convolution Let x(n) be a sequence of length L (it’s last sample is at L-1) and q(n) is a sequence of length P. Let • The length of y(n) is L+P-1. We can compute y(n) using the DFT as follows: • Zero pad x(n) and g(n) to a length of N samples, where N is at least the length of y(n) given above. If using the FFT, N must be an integer power of 2. • Compute the N-point FFTs of x(n) and q(n). Denote these X(k) and Q(k), respectively.
DFT and Convolution • Multiply X(k) and Q(k), point-by-point. Denote the result Y(k). • Compute the inverse N-point of Y(k) to obtain y(n).
DFT and Convolution If we wish to use fast convolution to filter a signal continuously streaming through a process (greater than N samples in length), it must be divided into subsequences of N samples which are processed individually and then reassembled. The techniques for reassembling them are called overlap-save and overlap-add. Cartinhour mentions these, but does not explain how they work or how they are used. I may present an example later in the semester, if time permits
A DSP Chip The Analog Devices ADSP-2181 We’re finally going to take a look at how a Digital Signal Processor chip works, and how to use it. First, though, let’s take a look at a general purpose microprocessor, and why it’s not very suitable for DSP applications
A DSP Chip Here’s the architecture of a first-generation, von Neumann processor: Data Bus TMP. Data Instruction. Registers ALU. Memory (Program and Data) Accum. Prog. Cntr Addr. Reg. Address Address Bus Control & Timing
A DSP Chip • Notice that the von Neumann processor has one memory space which is used for both program and data. • It has one data bus, and one address bus. • To execute an instruction, do the following: • Fetch the instruction. • If an operand is needed, fetch it. • If a second operand is needed, fetch it. • Perform the operation • Write the result to memory.
A DSP Chip This worst-case, two-operand instructions requires 5 cycles, and must access memory four times. This limits throughput, which is very important in DSP Using a single memory space for both program and data causes a bottleneck. One way of speeding things up is to separate the program memory from the data memory
A DSP Chip Data Bus Instruction Bus Instruction. Computation Unit Data Memory Program Memory Sequencer Address DAG Address Data. Addr Prog. Addr
A DSP Chip This is called Harvard architecture An instruction and an operand may be fetched at the same time. A result can be written to Data Memory while the next instruction is being fetched from Program Memory The DAG (Data Address Generator) calculates addresses, which would have been done by the ALU. This reduces the load on the computation unit. The architecture of a DSP chip is optimized for executing an algorithm repeatedly, very fast.
A DSP Chip A top-level block diagram of the 2181 is shown on the next slide. Note that in addition to an ALU, it has a dedicated Multiplier/Accumulator (MAC) for multiplying data samples by weighting factors, and a dedicated shifter for scaling data to prevent overflow and underflow It also has two Data Address Generators, which are useful for implementing circular buffers.
A DSP Chip Some example code – an FIR filter /*ADSP-2181 FIR Filter Routine -serial port 0 used for I/O -internally generated serial clock -40.000 MHz processor clock rate is divided to generate a 1.5385 MHz serial clock -serial clock divided to 8 kHz frame sampling rate*/ #include <def2181.h> #define taps 15 #define taps_less_one 14 .section/dm dm_data; .var/circ data_buffer[taps]; /* dm data buffer */
A DSP Chip /* The following lines set up a circular buffer in data memory, which is used as a delay line of samples */ . section/dm dm_data; .var/circ data_buffer[taps]; /* dm data buffer */ /* This section sets up a circular buffer in program memory which will hold the filter coefficients. The data width is 24 bits. This buffer is loaded from the named file by the linker. */ section/pm pm_data; .var/circ/init24 coefficient[taps] = "coeff.dat";
A DSP Chip /* The following lines of code set up the interrupt table. */ .section/pm Interrupts; start: jump main; rti; rti; rti; /* 0x0000: ~Reset vector */ rti; rti; rti; rti; /* 0x0004: ~IRQ2 */ rti; rti; rti; rti; ‘ /* 0x0008: ~IRQL1 */ rti; rti; rti; rti; /* 0x000c: ~IRQL0 */ rti; rti; rti; rti; /* 0x0010: SPORT0 Transmit */ jump fir_start; rti; rti; rti; /* 0x0014: SPORT0 Receive */ rti; rti; rti; rti; /* 0x0018: ~IRQE */ rti; rti; rti; rti; /* 0x001c: BDMA */ rti; rti; rti; rti; /* 0x0020: SPORT1 Transmit or ~IRQ1 */ rti; rti; rti; rti; /* 0x0024: SPORT1 Receive or ~IRQ0 */ rti; rti; rti; rti; /* 0x0028: Timer */ rti; rti; rti; rti; /* 0x002c: Power Down (non-maskable) */
A DSP Chip /* This portion of the code sets up the DAG registers for the two circular buffers */ .section/pm pm_code; main: l0 = length (data_buffer); /* setup circular buffer length */ l4 = length (coefficient); /*setup circular buffer */ m0 = 1; /* modify =1 for increment */ m4 = 1; /* through buffers */ i0 = data_buffer; /* point to start of buffer */ i4 = coefficient; /* point to start of buffer */ ax0 = 0; cntr = length(data_buffer); /* initialize loop counter */ do clear until ce; clear: dm(i0,m0) = ax0; /* clear data buffer */
A DSP Chip /* This slide and the next contain code which sets up the control registers. The following lines set up the internal serial clock, and the receive frame sync rate. */ /* setup divide value for 8KHz RFS*/ ax0 = 0x00c0; dm(Sport0_Rfsdiv) = ax0; /* 1.5385 MHz internal serial clock */ ax0 = 0x000c; dm(Sport0_Sclkdiv) = ax0;
A DSP Chip /* more control register setup */ /* multichannel disabled, internally generated sclk, receive frame sync required, receive width = 0, transmit frame sync required, transmit width = 0, external transmit frame sync, internal receive frame sync,u-law companding, 8-bit words */ ax0 = 0x69b7; dm(Sport0_Ctrl_Reg) = ax0; ax0 = 0x1000; /* enable sport0 */ dm(Sys_Ctrl_Reg) = ax0; icntl = 0x00; /* disable interrupt nesting */ imask = 0x0060; /* enable sport0 rx and tx interrupts only */
A DSP Chip /* The processor will sit in this loop, until data is received from SPORT0 */ mainloop: idle; /* wait here for interrupt */ jump mainloop; /* jump back to idle after rti */
A DSP Chip fir_start: si = rx0; /* read from sport0 */ dm(i0,m0) = si; /* transfer data to buffer */ mr = 0, my0 = pm(i4,m4), mx0 = dm(i0,m0); /* setup multiplier for loop */ cntr = taps_less_one; /* perform loop taps-1 times */ do convolution until ce; convolution: mr = mr + mx0 * my0 (ss), my0 = pm(i4,m4), mx0 = dm(i0,m0); /* perform MAC and fetch next values */ mr = mr + mx0 * my0 (rnd); /* Nth pass of loop with rounding of result */ if mv sat mr; tx0 = mr1; /* write result to sport0 tx */ rti; /* return from interrupt */
A DSP Chip Highly Recommended: Read Chapter 2 of the ADSP-218x DSP Instruction Set Reference