1 / 23

Introduction to Digital Signal Processors (DSPs)

Introduction to Digital Signal Processors (DSPs). Outline/objectives. Identify the most important DSP processor architecture features and how they relate to DSP applications Understand the types of code appropriate for DSP implementation. What is a DSP?.

jbuffington
Download Presentation

Introduction to Digital Signal Processors (DSPs)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Digital Signal Processors (DSPs)

  2. Outline/objectives • Identify the most important DSP processor architecture features and how they relate to DSP applications • Understand the types of code appropriate for DSP implementation

  3. What is a DSP? • A specialized microprocessor for real-time DSP applications • Digital filtering (FIR and IIR) • FFT • Convolution, Matrix Multiplication etc

  4. Hardware used in DSP

  5. Common DSP features • Harvard architecture • Dedicated single-cycle Multiply-Accumulate (MAC) instruction (hardware MAC units) • Single-Instruction Multiple Data (SIMD) Very Large Instruction Word (VLIW) architecture • Pipelining • Saturation arithmetic • Zero overhead looping • Hardware circular addressing • Cache • DMA

  6. Harvard Architecture • Physically separate memories and paths for instruction and data

  7. Single-Cycle MAC unit Can compute a sum of n-products in n cycles

  8. Single Instruction - Multiple Data (SIMD) • A technique for data-level parallelism by employing a number of processing elements working in parallel

  9. Very Long Instruction Word (VLIW) • A technique for instruction-level parallelism by executing instructions without dependencies (known at compile-time) in parallel • Example of a single VLIW instruction: F=a+b; c=e/g; d=x&y; w=z*h;

  10. CISC vs. RISC vs. VLIW

  11. Pipelining • DSPs commonly feature deep pipelines • TMS320C6x processors have 3 pipeline stages with a number of phases (cycles): • Fetch • Program Address Generate (PG) • Program Address Send (PS) • Program ready wait (PW) • Program receive (PR) • Decode • Dispatch (DP) • Decode (DC) • Execute • 6 to 10 phases

  12. Saturation Arithmetic • fixed range for operations like addition and multiplication • normal overflow and underflow produce the maximum and minimum allowed value, respectively • Associativity and distributivity no longer apply • 1 signed byte saturation arithmetic examples: • 64 + 69 = 127 • -127 – 5 = -128 • (64 + 70) – 25 = 122 ≠ 64 + (70 -25) = 109

  13. Examples • Perform the following operations using one-byte saturation arithmetic • 0x77 + 0x99 = • 0x4*0x42= • 0x3*0x51=

  14. Zero Overhead Looping • Hardware support for loops with a constant number of iterations using hardware loop counters and loop buffers • No branching • No loop overhead • No pipeline stalls or branch prediction • No need for loop unrolling

  15. Hardware Circular Addressing • A data structure implementing a fixed length queue of fixed size objects where objects are added to the head of the queue while items are removed from the tail of the queue. • Requires at least 2 pointers (head and tail) • Extensively used in digital filtering y[n] = a0x[n]+a1x[n-1]+…+akx[n-k]

  16. Direct Memory Access (DMA) • The feature that allows peripherals to access main memory without the intervention of the CPU • Typically, the CPU initiates DMA transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller once the operation is complete. • Can create cache coherency problems (the data in the cache may be different from the data in the external memory after DMA) • Requires a DMA controller

  17. Cache memory • Separate instruction and data L1 caches (Harvard architecture) • Cache coherence protocols required, since most systems use DMA

  18. DSP Harvard Architecture VLIW/SIMD (parallel execution units) No bit level operations Hardware MACs DSP applications Microcontroller Mostly von Neumann Architecture Single execution unit Flexible bit-level operations No hardware MACs Control applications DSP vs. Microcontroller

  19. Examples • Estimate how long will the following code fragment take to execute on • A general purpose processor with 1 GHz operating frequency, five-stage pipelining and 5 cycles required for multiplication, 1 cycle for addition • A DSP running at 500 MHz, zero overhead looping and 6 independent ALUs and 2 independent single-cycle MAC units? for (i=0; i<8; i++) { a[i] = 2*i + 3; b[i] = 3*i + 5; }

  20. Review Questions • Which of the following code fragments is appropriate for SIMD implementation? a[0]=b[0]+c[0]; a[0]=b[0]&c[0]; a[2]=b[2]+c[2]; a[0]=b[0]%c[0]; a[4]=b[4]+c[4]; a[0]=b[0]+c[0]; a[6]=b[6]+c[6]; a[0]=b[0]/c[0]; • Can the following instructions be merged into one VLIW instruction? If not in how many? • a=b+c; • d=c/e; • f=d&a; • g=b%c;

  21. Review Questions • Which of the following is not a typical DSP feature? • Dedicated multiplier/MAC • Von Neumann memory architecture • Pipelining • Saturation arithmetic • Which implementation would you choose for lowest power consumption? • ASIC • FPGA • General-Purpose Processor • DSP

  22. Examples • How many VLIW instructions does the following program fragment require if there two independent data paths (a,b), with 3 ALUs and 1 MAC available in each and 8 instructions/word? How many cycles will it take to execute if they are the first instructions in the program and all instructions require 1 cycle, assuming the pipelining architecture of slide 10 with 6 phases of execution? ADD a1,a2,a3 ;a3 = a1+a2 SUB b1,b3,b4 ;b4 = b1-b3 MUL a2,a3,a5 ;a5 = a2-a3 MUL b3,b4,b2 ;b2 = b3*b4 AND a7,a0,a1 ;a1 = a7 AND a0 MUL a3,a4,a5 ;a5 = a3*a4 OR a6,a3,a2 ;a2 = a6 OR a3

  23. References • DR. Chassaing, “DSP Applications using C and the TMS320C6x DSK”, Wiley, 2002 • Texas Instruments, TMS320C64x datasheets • Analog Devices, ADSP-21xx Processors

More Related