530 likes | 744 Views
DSP—Why So Hard?. November 2010. Who ?. Peter.Eastty@Oxford-Digital.com Design and sell processor cores and matching programming environments. Program strange algorithms onto stranger processors with the strangest tools. Customers NDAs Client lists. You. Why ?.
E N D
DSP—Why So Hard? November 2010
Who ? • Peter.Eastty@Oxford-Digital.com • Design and sell processor cores and matching programming environments. • Program strange algorithms onto stranger processors with the strangest tools. • Customers • NDAs • Client lists. • You
Why ? • People expensive, Silicon cheap. • People slow, Silicon fast. • People slow, Computers fast. • Programmer efficiency is everything, it gets you time to market which gets you the market.
Why ? • Targets move, right up to the last minute. • Never, ever build a fixed function device. • Three stories. • Third party algorithms will have to be adopted whether public domain or highly secret.
What is an audio signal? • Known bandwidth • Known resolution • Known number of channels • So why don’t I enumerate them
Where ? • Large mixing consoles • Cell Phones • Hi Fi • TVs • iPod docs • PCs • etc.
What processing do we want to do to the audio? • Continuously varying value against time • Filtering • Polynomial • Non-linear • Decisions • DSP = Low Delay • Not block structured
What hardware resources are available? • Memory • Multipliers • Connections • Adders etc.
Data types for DSP • See RBJ on headroom and floating point • Want a fixed point data type. • Word length, 16 – 32 etc. have nothing to do with audio, setup the word-length to suit the audio not a computer.
Languages for DSP • C is not a DSP language, the data types are all wrong and it has no concept of time. • C++ could be a DSP language but it doesn’t want to be one, it too has no concept of time.
Languages for DSP • With modern hardware design and compiler technology there is never a need for assembler. NEVER EVER. • Of course if you’re tied to old hardware for legacy code reasons you still might have to hack in assembler.
Languages for DSP main (…) { ASM(…); ASM(…); ASM(…); } /* This is not C. */
Languages for DSP main (…) { Clear_Acc(); MAC(…); Store_Acc_to_Register(…); } /* This isn’t either. */
Languages for DSP main (…) { Multiply_by_Coefficient(); Biquad(…); Do_FFT(…); } /* Neither is this. */
Languages for DSP • Beware of ‘optional extensions’. • They can become mandatory. • There is still at least one University teaching DSP using FORTRAN and assembler … • ...sad to say they apologized about the FORTRAN.
Languages for DSP • I don’t know the perfect DSP language. • But any high level language is better than any machine specific language.
Multiple Memory Banks • If there are multiple memories then memory allocation is NOT the programmers job, the tool-chain should do this for you. • But it might be nice to be able to do some if you want to.
Multiply-Add • Source level individual operations (add, multiply etc.) should be independent, hardware instructions can combine multiple operations (like Multiply-Add). • Make sure the operations in a combined instruction are exactly the same as those in individual instructions.
Limiting • Whatever number system you use it will have a range, even floating point. • Limiting will be required after every operation that can exceed the range, multiply, add, subtract and absolute value. • This includes the multiply in a multiply-add. • -1 x -1 = -1 ????????
Pipelines • User should never have to think about pipelines. • Variable pipelines are wrong. • Pipeline is not a panacea for timing problems, it limits the processing in a loop. • Pushing code through a branch. • Using the pipe for parameter passing.
Pipelines • Definition of pipeline length, count between the instruction that generates an item and an instruction that may use it. • Short circuiting the pipe. Useful, but not very useful. • Can unwind the execution by having pipeline-length prime relative to instruction count, but this adds to delay, which in turn adds to storage requirement.
Branching • If you can find another way avoid branches. • If you have to have jumps and a pipeline keep it all away from the programmer. • If you do have jumps they’ll likely break the guaranteed timing.
Conditional Execution • Conditional execution doesn’t break pipeline etc. • But you’ll need as many condition code stores as you have pipeline length. • Timing is identical for conditional execution and multiplexer.
Multiplexer • y = (a < 0.0) ? b : c; • Timing is identical for conditional execution and multiplexer. • With multiplexer you can use any variable as a control so no condition code store is required. • y = (a <= 0.0) ? b : c;
ABS • For simple bends in an input/output relationship, Absolute Value plus some Addition and Subtraction is more economical than most other methods.
Truncation, Rounding, Dither and Noise Shaping • For every instruction that needs it … … and just for Output • Assume fixed point • Floating point is hard
Truncation, Rounding and Truncation Towards Zero!. • Truncation is easy but has DC offset • Truncation Towards Zero! • ½ LSB offset number systems • Rounding wins and is not much more complex.
Dither • How do we make it? • Truly random, pseudo random, hash? • What colour do we want it to be? • What PDF do we want? • Make sure it’s un-correlated. • Want repeatability for test. • Problems with infinite gain components. • Rounding wins.
Noise shaping • What shape? • What order? • Want repeatability for test. • Problems with infinite gain components. • Rounding wins. • Make sure your instruction set can do dither and noise shaping.
Coefficient Interpolation • Coefficients as a sampled system • SRC called interpolation • HW or SW, 2-3 instructions to feed one. • Only in exceptional circumstances is it worth a hardware solution. • Linear is possible, first order filter is easy and works for many applications.
Coefficient Synchronization • Coefficient synchronisation. • Lots of people ignore it or treat it on a per use basis. • Can be done for linear or first order filters with ease. • This is really a synchronous sampling problem.
Coefficient Synchronization, Synchronization J i t t e r
Scaling, multiprocessors, synchronisation & segmentation • Not all solutions fit in a single processor. • Automatic segmentation of programs across multiple processors is possible. • But it is hard. • If the processors are not identical, and identically connected it’s very, very hard.
Scaling, multiprocessors, synchronisation & segmentation • If you have multiple processors and no branches then you can run them in lockstep, many examples. • For data transfer between processors simply send from one processor and receive by the other at the same time. • Disastrous for assembler, easy for compiler.
Scaling, multiprocessors, synchronisation & segmentation • How do you connect multiple processors, series or parallel? • If you chose either then you can’t do some algorithms. Use mesh or router instead. • Small routers are actually cheap and relatively easy to generate code for. • Multiple processors I/O, dedicated processor connections or is I/O a full member of the clan?
Constant folding and common code removal. • Easy in a compiler, often missed by an assembly language programmer. • Keep everything as source until the last possible moment. • That way common parts can be taken advantage of, constants, but more importantly data and instructions. • Leads to documentation of library functions requiring “at most X data memories and Y instructions”.
Libraries • Binary libraries don’t work well with pipelined processors, the cost of getting into or out of them is usually to great. • A binary library (like a dll) is NOT a secure method of distributing intellectual property. • Encrypted source going through a trusted tool-chain to generate encrypted binaries is the way to go.
Hardware with problems…. • Let’s just have one continuous data type (and maybe one integer type). • Different widths for different memories makes horrible problems. • Private instruction sets and ‘Useful’ instructions.
Hardware with problems…. • Do not chisel a digital analogue of an analogue circuit out of digits. • Sample rate to silicon clock ratio
Hardware with problems…. • Bi-quad coefficient ranges. • Feedback coefficients ranges need to be big enough. • Feed-forward coefficient ranges are not limited, they can get big. If there’s nowhere in your system to make gain, you’re in trouble.
Hardware with problems…. • The accumulator is dead. When hardware was expensive and DSP engineers were cheap it made sense to get performance this way, but that is no longer true. • Most of today’s algorithms aren’t sums of products anyway. • And it makes a high level description difficult.
Hardware with problems…. • Double precision is probably not the right thing for LF filters. • Choosing the right filter structure and adding a few bits is a financially better solution.
Hardware with problems…. • If you must have an accumulator make sure you can load and store it!
Hardware with problems…. • Shifting is required to get gain into the system. • There are few reasons for a shift of greater than 2^7 and very few for more than 2^15. • Shift after the multiplier, it’s the only place where there are the bits to shift. • Shift in the wrong place is common.
Hardware with problems…. • If a standard 5 coefficient bi-quad takes more than 5 instructions there’s something wrong. • A simple Z-1 delay, and cascades thereof should not consume instructions. • Simple rotating memory, and language support.
Hardware with problems…. • A pipeline needs to be started cleanly. • This is not always easy.
Debuggnig • Source level debugging is perfectly standard in almost every general purpose processor toolset, why is it missing from DSP toolsets?
Debuggnig • If you do add a debugger, remember that the objects you are processing are signals, thus they vary with time. • A numerical display of a signal is generally useless, like using a DVM to analyse audio, necessary but not sufficient. • Provide a scope and signal generator.
Debuggnig • Debugging Input or Output is a signal. • Easiest done by the instruction NOT the location.
How do we make DSP easier? • Get the algorithm away from the hardware • Use DSP that is compiler compatible