EE 445S Real-Time Digital Signal Processing Lab Fall 2011

EE 445S Real-Time Digital Signal Processing LabFall 2011 Lab #3.1Digital Filters Debarati Kundu (With the help of Mr. Eric Wilbur, TI)

Outline • Discrete-Time Convolution • FIR Filter Design • Convolution Using Circular Buffer • FIR Filter Implementation • FIR Filter Block Processing • Code Optimization 2

Discrete-Time Convolution • Represented by the following equation • Filter implementations will use the second version (hold h[n] in place and flip-and-slide x[n] about h[n]) • Z-transform of convolution 3

Discrete-Time Sinusoidal Response • Input two-sided complex sinusoid: x[n] = e jn • LTI system has impulse response h[n] • Output y[n] = x[n] * h[n] • H() is frequency response of the LTI system • Filters are stable, so H() = H[z] |z=exp(j ) • Multiplying by H() = A() e j () causes change in magnitude by A() and change in phase by () H(w) 4

FIR Filters Design & Implementation • An FIR filter does discrete-time convolution • z-1 indicates delay elements and hence we need a buffer • We shall implement FIR filters using circular buffers 5

FIR Filters Design & Implementation • Implementation • Use the Filter Design & Analysis Tool (fdatool) to get the co-efficient. • Specifications are given in the task list • Use convolve function (explained in subsequent slides) to implement FIR filter given coefficients from fdatool. 6

Convolution Using Circular Buffer • Always choose the size of circular buffer to be larger than N. • Make sure that the size of the circular buffer is a power of 2. 7

Convolution Using Circular Buffer main() { int x_index = 0; float y, xcirc[N]; --- --- /*--------------------------------------------*/ /* circularly increment newest (No %)*/ ++newest; if(newest == N) newest = 0; /*-------------------------------------------*/ /* Put new sample in delay line. */ xcirc[newest] = newsample; /*-------------------------------------------*/ /* Do convolution sum */ Go on to the next column   y = 0; x_index = newest for (k = 0; k < No_of_coeff; k++) { y += h[k]*xcirc[x_index]; /*-------------------------------------*/ /* circularly decrement x_index */ --x_index; if(x_index == -1) x_index = N-1; /*-------------------------------------*/ } ... } 8

PING rcvPingL.hist hist rcvPingL.data data PONG rcvPongL.hist hist rcvPongL.data data Block Processing using Ping-Pong Buffer • This lab uses a double-buffered (PING/PONG) channel-sorted (L/R) buffering scheme. • A FIR algorithm requires “history” to be preserved over calls to the algorithm. • FIR_process() must first copy the history, then process the data. • Processing of the last data blk (PONG) starts from the top of hist down thru data for DATA_SIZE items. • This leaves the last ORDER-1 data items NOT processed. • Therefore, user must copy the history of the last processed buffer (PONG) to the new buffer (PING), then filter. • Repeat the process… 9

SR11 McASP HWI TSK rcvBufs SR12 isrAudio ADC SEM_post() FIR orCOPY AIC3106 Audio Codec xmtBufs isrAudio DAC SW8 LED PRD1 PRD2 100ms 500ms CLK

count Y =  coeffi * xi i = 1 Code Optimization Goals: • A typical goal of any system’s algorithm is to meet real-time • You might also want to approach or achieve “CPU Min” inorder to maximize #channels processed CPU Min (the “limit”): • The minimum # cycles the algorithm takes based on architecturallimits (e.g. data size, #loads, math operations req’d) Real-time vs. CPU Min • Often, meeting real-time only requires setting a few compiler options • However, achieving “CPU Min” often requires extensive knowledgeof the architecture (harder, requires more time)

“Debug” vs “Optimized” Benchmarks for (j = 0; j < nr; j++) { sum = 0; for (i = 0; i < nh; i++) sum += x[i + j] * h[i]; r[j] = sum >> 15; } • Debug – get your code LOGICALLY correct first (no optimization) • “Opt” – increase performance using compiler options (easier) • “CPU Min” – it depends. Could require extensive time 12

“Debug” (–g, NO opt): Get Code Logically Correct • Provides the best “debug” environment with full symbolicsupport, no “code motion”, easy to single step • Code is NOT optimized – i.e. very poor performance • Create test vectors on FUNCTION boundaries (use samevectors as Opt Env) “Release” (–o3, –g ): Increase Performance • Higher levels of optimization results in code motion – functions become “black boxes” (hence the use of FXN vectors) • Optimizer can find “errors” in your code (use volatile) • Highly optimized code (can reach “CPU Min” w/some algos) • Each level of optimization increases optimizer’s “scope”…

Levels of Optimization FILE1.C { { } { . . . } } { . . . } -o0, -o1 -o2 -o3 -pm -o3 LOCAL single block FUNCTION Across blocks FILE Across functions PROGRAM Across files FILE2.C { . . . }

DSPLIB • Optimized DSP Function Library for C programmers using C62x/C67x and C64x devices • These routines are typically used in computationally intensive real-time applications where optimal execution speed is critical. • By using these routines, you can achieve execution speeds considerably faster than equivalent code written in standard ANSI C language. And these ready-to-use functions can significantly shorten your development time. • The DSP library features: • C-callable • Hand-coded assembly-optimized • Tested against C model and existing run-time-support functions

EE 445S Real-Time Digital Signal Processing Lab Fall 2011