1 / 50

Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes

Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes. SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic. Outline. FIR filter on ADPS-21x DSP Requirements Fast Multiply-Accumulates (Data-path)

ekeller
Download Presentation

Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic

  2. Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization

  3. ADSP -21x Copied from [Kester03]

  4. CALCULATING OUTPUTS OF 4-TAP FIR FILTER USING A CIRCULAR BUFFER Memory Location 0 1 2 3 Read x(0) x(1) x(2) x(3) Write x(4) Read x(4) x(1) x(2) x(3) Write x(5) Read x(4) x(5) x(2) x(3) y(3) = h(0) x(3) + h(1) x(2) + h(2) x(1) + h(3) x(0) y(4) = h(0) x(4) + h(1) x(3) + h(2) x(2) + h(3) x(1) y(5) = h(0) x(5) + h(1) x(4) + h(2) x(3) + h(3) x(2) Copied from [Kester03]

  5. FIR filter steps 1. Obtain a sample with the ADC; generate an interrupt 2. Detect and manage the interrupt 3. Move the sample into the input signal's circular buffer 4. Update the pointer for the input signal's circular buffer 5. Zero the accumulator 6. Control the loop through each of the coefficients 7. Fetch the coefficient from the coefficient's circular buffer 8. Update the pointer for the coefficient's circular buffer 9. Fetch the sample from the input signal's circular buffer 10. Update the pointer for the input signal's circular buffer 11. Multiply the coefficient by the sample 12. Add the product to the accumulator 13. Move the output sample (accumulator) to a holding buffer 14. Move the output sample from the holding buffer to the DAC Copied from [Kester03]

  6. Single Cycle Instruction FIR filter steps (cont.) ADSP21xx Example code: CNTR = N-1; DO convolution UNTIL CE; convolution: MR = MR + MX0 * MY0(SS), MX0 = DM(I0,M1), MY0 = PM(I4,M5); Copied from [Kester03]

  7. Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization

  8. Copied from [Takala05]

  9. Copied from [Takala05]

  10. Motorola DSP5600X Copied from [Takala05]

  11. Copied from [Takala05]

  12. Copied from [Takala05]

  13. ADSP -21x MAC www.analog.com/dsp

  14. Copied from [Takala05]

  15. SHARC Architecture ADSP-2106X Copied from [Takala05]

  16. Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization

  17. Copied from [Takala05]

  18. Copied from [Takala05]

  19. Copied from [Takala05]

  20. Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization

  21. Copied from [Takala05]

  22. Copied from [Takala05]

  23. Hardware loops • Software loop: MOVE #16,B Initialize loop counter B LOOP: MAC (R0)+,(R4)+,A Register-indirect addressing with post-increment DEC B JNE LOOP • Hardware loops: no time is spent on • Decrementing counters • Checking to see if the loop is finished • Branching back to the top of the loop RPT #16 MAC (R0)+,(R4)+,A [Lapsley97]

  24. Copied from [Kester03]

  25. ADI General Purpose DSP Product Families TigerSHARC High-Performance $35 - $200 • Upto 4800MMACS (16-bit) • or 1200MMACS (32-bit) • 2.5G/3G Infrastructure • Medical Imaging • Industrial Imaging • Multiprocessing • Upto 160MMACS • Wired Voice • Wireless Voice • VOIP/VON • Industrial Control Blackfin Media Enabled $5 - $30 SHARC Low-Cost Floating Point $10 - $100 Performance • Upto 600MMACS (32-bit) • Audio • Infotainment • Industrial ADSP-218x/9x Power Efficient $5 - $10 • Upto 3000MMACS • Image compression • Digital Still/Video Camera • MMOIP • Telematics • Biometrics www.analog.com/dsp

  26. Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization

  27. SHARC Architecture Copied from [Smith97]

  28. SHARC Architecture - Features • The Super Harvard ARChitecture • 100MHz Core / 300 MFLOPS Peak • Parallel Operation of: Multiplier, ALU, 2 Address Generators & Sequencer • No Arithmetic Pipeline; All Computations Are Single-Cycle • High Precision and Extended Dynamic Range • 32/40-Bit IEEE Floating-Point Math • 32-Bit Fixed-Point MAC’s with 64-Bit Product & 80-Bit Accumulation • Single-Cycle Transfers with Dual-Ported Memory Structures • Supported by Cache Memory and Enhanced HarvardArchitecture • Glueless Multiprocessing Features • JTAG Test and Emulation Port • DMA Controller, Serial Ports, Link Ports, External Bus, SDRAM Controller, Timers www.analog.com/dsp

  29. ADSP-2106x Core Architecture CACHE JTAG TEST & MEMORY EMULATION 32 x 48 FLAGS DAG 1 DAG 2 PROGRAM 8 x 4 x 32 8 x 4 x 24 SEQUENCER TIMER 24 PMA BUS PMA DMA BUS 32 DMA 48 PMD BUS PMD BUS CONNECT DMD BUS 40 DMD REGISTER FLOATING & FIXED-POINT FILE 32-BIT FLOATING-POINT MULTIPLIER, 16 x 40 BARREL & FIXED-POINT FIXED-POINT SHIFTER ALU ACCUMULATOR www.analog.com/dsp

  30. C code Example- Dot product Copied from [Smith97]

  31. Example- Dot product - Assembly Copied from [Smith97]

  32. Example- Dot product - Assembly Copied from [Smith97]

  33. C or Assembly • How complicated is the program? • Are you pushing the maximum speed of the DSP? • How many programmers will be working together? • Which is more important, product cost or development cost? • What is your background? • What does the DSP's manufacturer suggest you use? Copied from [Smith97]

  34. Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization

  35. Address Arithmetic Unit SP FP L3 I3 B3 M3 P5 L2 B2 I2 M2 P4 DAG1 DAG0 I1 M1 B1 L1 P3 I0 M0 B0 L0 P2 P1 P0 Sequencer R7 R6 16 16 R5 8 8 8 8 R4 R3 R2 R1 Barrel Shifter R0 40 40 Acc0 Acc1 Data Arithmetic Unit BLACKfin Processor Core Two 16-bit Multipliers Two 40-bit ALUs, Four 8-bit Video ALUs Barrel Shifter Sixteen 16-bit /Eight 32-bit Math Registers Two DAGs, byte addressing Eight 32-bit pointer registers Four Sets of 32-bit Index, Modify, Length, Base 16-bit Instructions, 32-bit Instructions Multi-Issue, 64-bit Instructions Interlocked Pipeline Micro Signal Architecture, developed with Intel www.analog.com/dsp

  36. ADSP-BF535 BLACKfin Processor Architecture Great Performance Value • Highest Frequency (350 MHz) • 1.0V to 1.6V • 260 PBGA High System Integration • Address range 768Mbytes • SPORTs support 8 Channels of I2S Audio • (532Mbps) I/O Bandwidth, DMA Bandwidth & Memory Bandwidth • Microcontroller features include WDT, PCI, USB1.1 SDRAM controller User Peripherals System Peripherals Dynamic Power Management USB 1.1 To 350 MHz BLACKfin Processor Core SPORTs 2 PLL SPI 2 Watchdog UART 2 JTAG Real Time Clock Timers 3 (32bit) Memory GPIO 16 308 Kbytes On-Chip SRAM 48 Kbytes On-Chip Cache 264Kbytes On-Chip SRAM PCI Interfaces FLASH/SRAM DMA SDRAM www.analog.com/dsp

  37. Seminars about Blackfin

  38. Seminars about Blackfin

  39. Seminars about Blackfin

  40. Seminars about Blackfin

  41. Seminars about Blackfin

  42. Seminars about Blackfin

  43. Seminars about Blackfin

  44. Seminars about Blackfin

  45. Seminars about Blackfin

  46. Seminars about Blackfin

  47. Seminars about Blackfin

  48. Seminars about Blackfin

  49. Seminars about Blackfin

  50. Seminars about Blackfin

More Related