1 / 22

Ch. 11 Digital Signal Processing Using General-Purpose Processors

Ch. 11 Digital Signal Processing Using General-Purpose Processors. Kathy Grimes. Signals. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog Repeatability Tolerances

torgny
Download Presentation

Ch. 11 Digital Signal Processing Using General-Purpose Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch. 11 Digital Signal Processing Using General-Purpose Processors Kathy Grimes

  2. Signals • Signals • Electrical • Mechanical • Acoustic • Most real-world signals are Analog – they vary continuously over time • Many Limitations with Analog • Repeatability • Tolerances • Difficulty storing information or implementing certain operations Leads us to DSP…

  3. Digital Signal Processing (DSP) • Represent signals by sequences of numbers • Pros • Repeatable • Accuracy can be controlled • Time-varying operations are easier to implement • Cons • Sampling cause loss of information • Round-off errors • A/D and D/A mixed-signal hardware

  4. Digital Signal Processing (DSP) • Analog to Digital Converter • Continuous to Discrete time signal • 11.1 shows the sampling of a signal • Common Signals • Step Discontinuity (Figure 11.2) Impulse (Figure 11.3) FIGURE 11.1 Discrete Time Signals. FIGURE 11.2 Step Function. FIGURE 11.3 Impulse Function.

  5. DSP Building Blocks • Based off of three basic functions: • Delay • Add • Multiply • Raw Performance for DSP algorithm is usually by # of ops needed to execute FIGURE 11.6 Delay Function. FIGURE 11.5 Multiply Function. FIGURE 11.4 Add Function.

  6. DSP Building Blocks • These two systems in combination can be used to develop any discrete difference equation FIGURE 11.8 Feedback System. FIGURE 11.7 Feedforward System.

  7. Fixed-Point and Floating-Point Implementations • Floating-Point DSP perform Integer Operation • Dynamic operating range • Fixed-Point DSP perform Integer and Floating Operation • Fixed range – 16 bit = 65536 max range • Analog world signals = infinite precision • Floating-point mimic the “infinite” range better • Easier to implement, avoids rounding and overflow errors • Why not always use Floating-point? • Cost, Availability, Price, and Performance • Precision Floating Point is good for smaller values but is poorer at larger values using same number of bits

  8. Single Instruction Multiple Data • SIMD Microarchitecture and Instructions • One clock cycle for 4 data x(1 instruction)x 1 value • Increase of performance for low-level DSP functions (MAC) FIGURE 11.10 SIMD Instruction.

  9. Microarchitecture Considerations • Processor Clockspeed • Cache size • Usually DSP architectures manually partition the memory space in order to reduce number of accesses to external memory • Latency = costly in terms of time and resources • Intel architectures have large amounts of cache and can overcome the fast/slow memory, however, all memory starts in “far” caches • Output data should be generated sequentially Accessing memory in a scattered pattern (while using threads) should be avoided

  10. Implementation Options for Intel • Intrinsic • Vectorization • Intel Performance Primitives

  11. Intrinsics and Data Types • C code that calls special built-in compiler capabilities that map closely to underlying SSE instruction set • Added Data Types • _m64, _m128, _m128d, _m128i • Intrinsic Operation Types • Arithmetic (fixed- and floating-point) • Shift • Logical • Compare • Set • Shuffle • Concatenation Adds four FP values packed into a and b and performs four additions in one instruction

  12. Vectorization • Use compiler to apply vectorization techniques to loops within data processing iteration looks for opportunities to convert loops from single set to vector-based implementation (so that multiple operands can be operated at the same time) • Like GCC -- >aligned with SIMD instruction set • Use #pragma directives to guide compiler to avoid overheads such as data dependces Listing 11.7 Memory Alignment Property and Discarding Assumed Data Dependences. Listing 11.4 Explicitly Don’t Vectorize Loop.

  13. Vectorization • Comparisons on Performance • This performance would be vastly different if the memory was not already aligned

  14. Performance Primitives • Intel Libraries – highly optimized implementations for many different applications (include audio codecs, image processing, data compression, etc…) • Libraries take full advantage of CPU and SIMD (and most are written for performance) • Libraries are threaded and can obtain performance gains by parallelizing the algorithm • Libraries that take advantage are: • Signal Processing – Convolution and correlation, Finite impulse response (FIR) filter, FIR coefficints generation function, Infinite response filter (IIR), Transforms • Image Processing • Small Matrices and Realistic Rendering • Cryptography

  15. Finite Impulse Response Filter • FIR filter equation • Y[n] = a.x[n] + b.x[n-1] + c.x[n-2] Listing 11.9 FIR Using Intel Performance Primitives. Listing 11.8 FIR Filter C Code Example

  16. FIR Ex: Intel SSE • Loop Unrolling to get rid of data dependences • By changing the data elements, we can reduce the number of times we need to read data

  17. Medical Ultrasound Imaging • Computation intensive • Needs a significant amount of embedded computational performance • Same basic algorithmic pattern even though physical configurations, parameters, and functionality are different • Beam forming • Envelope Extraction • Polar-to-Cartesian coordinate translation

  18. FIGURE 11.12 Block Diagram of a Typical Ultrasound Imaging Application.

  19. Envelope Detector FIGURE 11.15 Block Diagram of the Envelope Detector.

  20. Envelope Detector FIGURE 11.16 Polar-to-Cartesian Conversion of a Hypothetically Scanned Rectangular Object. Listing 11.11 Code Sample for Envelope Detector.

  21. Performance Results • Why such a large difference?

  22. Summary • Digital Signal Processing in general-purpose processors • Extend Processing Capabilities • Simplifies overall application when platforms require Control, Communications, and General-purpose processing w/DSP • Many ways to improve an Intel system by implementing special C code, vectorization, and specific libraries • Performance is greatly enhanced when DSP is implemented properly

More Related