500 likes | 519 Views
Explore the architecture, instruction set, and addressing modes of a DSP device. Learn about FIR filtering, multiply-accumulates, circular buffering, and more. Understand Analog Devices' SHARC and Blackfin architectures.
E N D
Architectural Analysis of a DSP Device, the Instruction Set and the Addressing Modes SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic
Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization
ADSP -21x Copied from [Kester03]
CALCULATING OUTPUTS OF 4-TAP FIR FILTER USING A CIRCULAR BUFFER Memory Location 0 1 2 3 Read x(0) x(1) x(2) x(3) Write x(4) Read x(4) x(1) x(2) x(3) Write x(5) Read x(4) x(5) x(2) x(3) y(3) = h(0) x(3) + h(1) x(2) + h(2) x(1) + h(3) x(0) y(4) = h(0) x(4) + h(1) x(3) + h(2) x(2) + h(3) x(1) y(5) = h(0) x(5) + h(1) x(4) + h(2) x(3) + h(3) x(2) Copied from [Kester03]
FIR filter steps 1. Obtain a sample with the ADC; generate an interrupt 2. Detect and manage the interrupt 3. Move the sample into the input signal's circular buffer 4. Update the pointer for the input signal's circular buffer 5. Zero the accumulator 6. Control the loop through each of the coefficients 7. Fetch the coefficient from the coefficient's circular buffer 8. Update the pointer for the coefficient's circular buffer 9. Fetch the sample from the input signal's circular buffer 10. Update the pointer for the input signal's circular buffer 11. Multiply the coefficient by the sample 12. Add the product to the accumulator 13. Move the output sample (accumulator) to a holding buffer 14. Move the output sample from the holding buffer to the DAC Copied from [Kester03]
Single Cycle Instruction FIR filter steps (cont.) ADSP21xx Example code: CNTR = N-1; DO convolution UNTIL CE; convolution: MR = MR + MX0 * MY0(SS), MX0 = DM(I0,M1), MY0 = PM(I4,M5); Copied from [Kester03]
Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization
Motorola DSP5600X Copied from [Takala05]
ADSP -21x MAC www.analog.com/dsp
SHARC Architecture ADSP-2106X Copied from [Takala05]
Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization
Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization
Hardware loops • Software loop: MOVE #16,B Initialize loop counter B LOOP: MAC (R0)+,(R4)+,A Register-indirect addressing with post-increment DEC B JNE LOOP • Hardware loops: no time is spent on • Decrementing counters • Checking to see if the loop is finished • Branching back to the top of the loop RPT #16 MAC (R0)+,(R4)+,A [Lapsley97]
ADI General Purpose DSP Product Families TigerSHARC High-Performance $35 - $200 • Upto 4800MMACS (16-bit) • or 1200MMACS (32-bit) • 2.5G/3G Infrastructure • Medical Imaging • Industrial Imaging • Multiprocessing • Upto 160MMACS • Wired Voice • Wireless Voice • VOIP/VON • Industrial Control Blackfin Media Enabled $5 - $30 SHARC Low-Cost Floating Point $10 - $100 Performance • Upto 600MMACS (32-bit) • Audio • Infotainment • Industrial ADSP-218x/9x Power Efficient $5 - $10 • Upto 3000MMACS • Image compression • Digital Still/Video Camera • MMOIP • Telematics • Biometrics www.analog.com/dsp
Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization
SHARC Architecture Copied from [Smith97]
SHARC Architecture - Features • The Super Harvard ARChitecture • 100MHz Core / 300 MFLOPS Peak • Parallel Operation of: Multiplier, ALU, 2 Address Generators & Sequencer • No Arithmetic Pipeline; All Computations Are Single-Cycle • High Precision and Extended Dynamic Range • 32/40-Bit IEEE Floating-Point Math • 32-Bit Fixed-Point MAC’s with 64-Bit Product & 80-Bit Accumulation • Single-Cycle Transfers with Dual-Ported Memory Structures • Supported by Cache Memory and Enhanced HarvardArchitecture • Glueless Multiprocessing Features • JTAG Test and Emulation Port • DMA Controller, Serial Ports, Link Ports, External Bus, SDRAM Controller, Timers www.analog.com/dsp
ADSP-2106x Core Architecture CACHE JTAG TEST & MEMORY EMULATION 32 x 48 FLAGS DAG 1 DAG 2 PROGRAM 8 x 4 x 32 8 x 4 x 24 SEQUENCER TIMER 24 PMA BUS PMA DMA BUS 32 DMA 48 PMD BUS PMD BUS CONNECT DMD BUS 40 DMD REGISTER FLOATING & FIXED-POINT FILE 32-BIT FLOATING-POINT MULTIPLIER, 16 x 40 BARREL & FIXED-POINT FIXED-POINT SHIFTER ALU ACCUMULATOR www.analog.com/dsp
C code Example- Dot product Copied from [Smith97]
Example- Dot product - Assembly Copied from [Smith97]
Example- Dot product - Assembly Copied from [Smith97]
C or Assembly • How complicated is the program? • Are you pushing the maximum speed of the DSP? • How many programmers will be working together? • Which is more important, product cost or development cost? • What is your background? • What does the DSP's manufacturer suggest you use? Copied from [Smith97]
Outline • FIR filter on ADPS-21x DSP Requirements • Fast Multiply-Accumulates (Data-path) • Extended Precision Accumulator Register (Data-path) • Dual Operand Fetch (Memory) • Circular Buffering (Addressing) • Zero-Overhead Looping (Instruction set) Analog Devices Architectures and Programming • SHARC • Blackfin • Performance Optimization
Address Arithmetic Unit SP FP L3 I3 B3 M3 P5 L2 B2 I2 M2 P4 DAG1 DAG0 I1 M1 B1 L1 P3 I0 M0 B0 L0 P2 P1 P0 Sequencer R7 R6 16 16 R5 8 8 8 8 R4 R3 R2 R1 Barrel Shifter R0 40 40 Acc0 Acc1 Data Arithmetic Unit BLACKfin Processor Core Two 16-bit Multipliers Two 40-bit ALUs, Four 8-bit Video ALUs Barrel Shifter Sixteen 16-bit /Eight 32-bit Math Registers Two DAGs, byte addressing Eight 32-bit pointer registers Four Sets of 32-bit Index, Modify, Length, Base 16-bit Instructions, 32-bit Instructions Multi-Issue, 64-bit Instructions Interlocked Pipeline Micro Signal Architecture, developed with Intel www.analog.com/dsp
ADSP-BF535 BLACKfin Processor Architecture Great Performance Value • Highest Frequency (350 MHz) • 1.0V to 1.6V • 260 PBGA High System Integration • Address range 768Mbytes • SPORTs support 8 Channels of I2S Audio • (532Mbps) I/O Bandwidth, DMA Bandwidth & Memory Bandwidth • Microcontroller features include WDT, PCI, USB1.1 SDRAM controller User Peripherals System Peripherals Dynamic Power Management USB 1.1 To 350 MHz BLACKfin Processor Core SPORTs 2 PLL SPI 2 Watchdog UART 2 JTAG Real Time Clock Timers 3 (32bit) Memory GPIO 16 308 Kbytes On-Chip SRAM 48 Kbytes On-Chip Cache 264Kbytes On-Chip SRAM PCI Interfaces FLASH/SRAM DMA SDRAM www.analog.com/dsp