200 likes | 818 Views
A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit. Based on a ENEL619.23 white paper prepared by Darrell Anklovitch. Overview. Architecture Overview Register Map ALU features and sample instructions Multiplier features and sample instructions
E N D
A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit Based on a ENEL619.23 white paperprepared by Darrell Anklovitch Blackfin Compute Unit REV B
Overview • Architecture Overview • Register Map • ALU features and sample instructions • Multiplier features and sample instructions • Shifter features and sample instructions Blackfin Compute Unit REV B
References • ADSP-BF535 Blackfin Processor Hardware Reference, Rev 2, April 2004, Analog Devices. – Section 2 • Blackfin Processor Instruction Set Reference, Rev 2, May 2003, Analog Devices. – Sections 8 ~ 10, 14 & 15 • A number of the figures in this presentation are based on figures found in the ADSP-BF535 Blackfin Processor Hardware Reference. Blackfin Compute Unit REV B
ADSP-2106x Core Architecture CACHE JTAG TEST & MEMORY EMULATION 32 x 48 FLAGS DAG 1 DAG 2 PROGRAM 8 x 4 x 32 8 x 4 x 24 SEQUENCER TIMER 24 PMA BUS PMA DMA BUS 32 DMA 48 PMD BUS PMD BUS CONNECT DMD BUS 40 DMD REGISTER FLOATING & FIXED-POINT FILE 32-BIT FLOATING-POINT MULTIPLIER, 16 x 40 BARREL & FIXED-POINT FIXED-POINT SHIFTER ALU ACCUMULATOR Blackfin Compute Unit REV B
Register File and COMPUTE Units • Key issues • 5 data paths FROM COMPUTE units • 5 data paths TO COMPUTE units • Highly parallel operations UNDER THE RIGHT CONDITIONS Blackfin Compute Unit REV B
BF533 Memory Accesses Under the right conditions -- 4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time PLUS background DMA activity Blackfin Compute Unit REV B
Compute Unit Architecture Register File 2 Multipliers 1 set of Video ALUs 1 Shifter 2 ALUs Blackfin Compute Unit REV B
8 x 32 bit OR 16 x 16 bit 2 x 40 bit accumulators Register File • DATA REGISTER SYNTAX: • R0, R1 etc refer to 32 bit registers • R0.L refers to the low 16 bits of the R0 32 bit reg • R0.H refers to the high 16 bits of the R0 register • ACCUMULATOR SYNTAX: • A0.L => low 16 bits • A0.H => next 16 bits • A0.W => least significant 32 bit word • A0.X => MS 8 bit extension SHARC – 16 32-bit data registers, integer and floatThere is a pair of SHARC accumulator registers too Blackfin Compute Unit REV B
ALU Data Flow 2 x 32 bit paths to dual Multiplier/ALU units 2 x 32 bit paths back to register file Blackfin Compute Unit REV B
Sample instructions Blackfin Compute Unit REV B
Dual 16 bit OPS: Can be : ALU Features Single 16 bit OPS: 31 Rm Rp Rn Dual 16 bit Cross: Single 32 bit OPS: 31 Rm Rp Rn Blackfin Compute Unit REV B
Quad 16 bit ops: Dual 32 bit ops: C A B D A B ALU Sample Instructions Single 16 bit ops: Dual 16 bit ops: Single 32 bit ops: Does not work in parallel Must have this option Operator order is important + must come before - • A & B registers must stay on the same side of the ‘|’ for both • Instructions • For dual and quad 16 bit operations the (CO) option causes the • destination registers to cross Blackfin Compute Unit REV B
Multiply Data Flow 2 x 32 bit paths to dual Multiplier/ALU units Multiplier share the same operand/result buses as the ALU 2 x 40 bit accumulator 2 x 32 bit paths back to register file Blackfin Compute Unit REV B
H H L L H L H L Multiply Features • Multiplies are signed fractional by default • Signed fractional multiply result is automatically left • shifted 1 bit. • Signed fractional multiply != signed integer multiply • Rounding available on fractional number multiplies and • special option of integer number multiplies Blackfin Compute Unit REV B
31 Rm 31 Rp 32 bit result 0x8000 0x8000 top 16 bits go to destination register top 16 bits go to destination register 31 31 Rd Rd Rounding 2 cases: Rounding adds 0x8000 to the 32 bit multiplier result or accumulator value before extracting a 16 bit value to the destination register Blackfin Compute Unit REV B
Fractional Multiply Fractional Multiply != Integer Multiply Fractional Multiply != Integer Multiply • When extracting a 16 bit fractional value from an accumulator • the high 16 bits is taken • Where in the destination register it goes depends on which • accumulator is being extracted from Blackfin Compute Unit REV B
Integer Multiply Fractional Multiply != Integer Multiply • When extracting a 16 bit integer value from an accumulator • the low 16 bits is taken. • Where in the destination register the 16 bit value goes depends • on which accumulator is being extracted from Blackfin Compute Unit REV B
Multiply Sample Instructions 16 bit extraction from ACC 0 16 bit extraction from ACC 1 Multi-issue MAC Instruction Examples 32 bit extraction A1 += R1.H * R2.L , A0 += R1.L * R2.L; R3.H = (A1 += R1.H * R2.L) , R3.L = (A0 += R1.L * R2.L); Any combination of .H and .L in the 2 operands is allowed R3 = (A1 += R1.H*R2.L), R2 = (A0 += R1.L * R2.L); Where destination registers must be paired as follows: R[1,0], R[3,2], R[5,4] and R[7,6] R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L; Blackfin Compute Unit REV B
Arithmetic shift 3 op Reg shift 3 op Immediate shift 2 operator Register shifts 2 operator Immediate shifts Shifter Sample Instructions Blackfin Compute Unit REV B
Parallel Instruction Examples • In general there are 16 and 32 bit versions of the arithmetic instructions • Most of the 32 bit instructions can be executed in parallel with 2 x 16 bit memory/index operations • Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands • || means parallel • Examples: • A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\ • R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0]; Blackfin Compute Unit REV B