DSP C5000

DSP C5000 Chapter 5 Assembly Language

Assembly Language • Two Main types of assembly language : • Algebraic • Mnemonic • Both C54x and C55x can use either type of assembly language. C54 C55 Running C54 code on the C55

C54x Assembly Language • The instruction set is divided into four basic types : • Arithmetic • Logic • Load, Store & Move • Program control • C54x has a fixed length instruction word • Instruction must be encoded in one 16-bit word in order to be executed in one cycle

Instructions and Operands • General syntax of an instruction : • Instr Op1,[Op2,[Op3,[…]]] • For Instr field refer to TI documentation or following slides • The Op1,[Op2,[Op3,[…]]] field syntax is specified in instruction documentation and specifies the way (type of addressing mode) you could use for the operands.

Operand Syntax

Arithmetic Instructions • General purpose Arithmetic : • Addition/subtraction • Multiply (and accumulate) • Square • Divide • Application Specific Arithmetic : • Miscellaneous • Polynomial evaluation • Distance computation • Specific filters • Butterfly computation (Viterbi)

General Purpose Arithmetic • Addition/subtraction • field4=field3 field1*2field2 • Result is stored in field4 if present else in field3, shift is done according to field2 if present. • Shift field is detailed on the next slide …

Shift Field • Many instructions use shift on one operand. This shift is specified in operands field and could be : • Immediate if specified by the keywords: • -16  SHIFT 15 • 0  SHIFT1 15 • Register indirect: • -16  ASM 15 (Accumulator Shift Mode field of ST1) • -16  TS 31 (TS are ths 6 LSBs of T register)

General Purpose Arithmetic • Addition/subtraction (special cases) • Arithmetic with unsigned operand : • field2=field2 unsigned(field1) • Direct computation on memory : • With : SXM=1 field2=field2 +signed(field1) -32768  lk 32767 SXM=0 field2=field2 +unsigned(field1) 0 lk 65535

General Purpose Arithmetic • Addition/subtraction (extended precision) : • 32 bits : • 1 : field2=field2 field1 • 2 : field2=field1-field2 • If C16=0 field1 and field2 are considered as 32-bit operands and 32 bits  is realized. • If C16=1 field1 and field2 are considered as a pair of 16-bit operands and SIMD computation take place. • 64 bits : • field2=field2 unsigned(field1) carry/borrow

General Purpose Arithmetic • Addition/subtraction (extended precision contd) • 64 bits addition/subtraction : is realized as follows: (Look at a code exemple)

General Purpose Arithmetic • Multiply • 1,3 : field3=field1TREG2 • 2,4 : field3=field1field2 • Multiply and Accumulate/Subtract • 1 : field3=field3  (field1TREG) 2 • 3 : field4=field3  (field1TREG)1 • 2,4 : field4=field3  (field1field2) 1, 2 1 Result is stored in field4 if present else in field3 2 [R] : rounding result on the 16bits MSB of dst, 16bits LSB are zeroed

General Purpose Arithmetic • Multiply and Accumulate (with program memory) • 1,2 : field4=field3  (field1field2) • 2 : contents of data memory pointed by field1 operand (Smem) is copied in next following data memory address. • Delay of data together with scalar product are needed for FIR filter computation • 1 : TREG=field1 and contents of data memory pointed by field1 is copied in the next following data memory address.

General Purpose Arithmetic • Multiply, Accumulate and Delay • In all cases x(n-k) are in data memory • Case 1: h(k) are in program memory • Case 2: h(k) are in data memory RPT #N-1 MACD *AR1-,coef,A RPTB endLoop-1 LTD *AR1- MAC *AR2+,A endLoop:

General Purpose Arithmetic • Multiply (with accumulator) • 1 : B=field1 A(32-16)1 • 2 : field1=TREGA(32-16)1 • Multiply and Accumulate/Subtract(with accumulator) • 1 : B=B (field1 A(32-16))1,2 • 2 : field3=field2  (TREGA(32-16))1,2,3 1A(32-16) stands for the 16bits MSB of accumulator A, B stands for accumulator B. 2 [R] : rounding result on the 16bits MSB of dst, 16bits LSB are zeroed 3 Result is stored in field3 if present else in field2

General Purpose Arithmetic • Extended precision multiplication • 1: field3=unsigned(field1) unsigned(TREG) • 2: field3=field3 +(unsigned(field1) signed(field2)) • MPYU is equivalent to MPY syntax 1, but with unsigned operands. • MACSU is equivalent to MAC syntax 2, but with field1 operand unsigned.

General Purpose Arithmetic • Extended precision multiplication • Principle : • Look at a code exemple MPYU MACSU MACSU MAC

General Purpose Arithmetic • Square • 1: field2=field1 field1 • 2: field2=A(32-16) A(32-16) 1 • 3: field2=field2 (field1field1) 1A(32-16) stands for the 16bits MSB of accumulator A

General Purpose Arithmetic • Divide • Division is implemented by using repeated conditional subtraction. • Perform a single cycle 1-bit unsigned divide instruction: • Dividend (numerator) is in LSB of src and divisor in Smem, then : • after operation the quotient is LSB of src and remainder in MSB of src. (src) - (Smem) << 15 --> ALU output If ALU output 0 Then(ALU output) << 1 + 1 --> src Else(src) << 1 --> src

General Purpose Arithmetic • Division Routine (More examples) • LD @den,16,A • MPYA @numB = num*den (tells sign) • ABS A Strip sign of denominator • STH A,@den • LD @num,A • ABS A Strip sign of numerator • RPT #1516 iterations • SUBC @den,A1-bit divide • XC 1,BLTIf result needs to be negative • NEG A Invert sign • STL A,@quotStore negative result

Miscellaneous Arithmetic • ABS src,[dst] ; compute the absolute value of src and store it in dst if specified src otherwise (dst=|src|). • NEG src,[dst] ; store the 2s complement of src in dst if specified, src otherwise (dst=-src). • MAX(dst),MIN(dst) ; store in dst the greatest (resp. the lowest) between A and B accumulator (dst=MAX(A,B), dst=MIN(A,B)).

Miscellaneous Arithmetic • EXP/NORM :Tools for fixed point to (block)-floating point conversion1 • Store the high part of the accumulator (A or B) in « Mantissa*2^Exponent » form. • EXP src ; compute the number of shift necessary to normalize the high part of the accumulator src and store it in T register (T=EXP(src)) . Here T=3 after operation. Because of the guard bits, T could be negative after operation. High part low part Guard bits …/… 1 see ch13 «Numerical Issues » for in depth explanation of floating point format

48D0 Mantissa Expo 0012 Miscellaneous Arithmetic • NORM src,[dst] ; The contents of the accumu-lator src is shifted according to the value in T reg ans stored in dst if specified, src otherwise . (dst=src<<TS). Accumulator after operation : Guard bits High part low part .bss Mantissa,2,1 Expo .set Mantissa+1 .text ; A = 1234h format: LD #Mantissa,DP EXP A NORM A ST T,@Expo STH A,@Mantissa

Miscellaneous Arithmetic • Rounding and saturation are intended for finite precision and finite dynamic number representation : • RND src,[dst] ; the high part of the accumulator src is rounded up and stored either in dst if specified or in src (dst=rnd(src)). • SAT src ; If src is greater than 32767 then src is set to 32767 (007FFFh). If src is lower than –32768 then src is set to –32768 (FF8000h) (SATURATE(src)). Before rounding After rounding

Polynomial Evaluation • Considering the 3rd order polynomial: • It can be computed as: in an iterative way: …/…

Polynomial Evaluation • Before using POLY instruction, we have to load T reg. with the proper value for x. • POLY Smem ; The high part of accumulator A is multiplied by T reg. then added with the high part of B and stored in A. The high part of accumula-tor B is loaded with the contents of Smem (current coefficient) (POLY(Smem)). coef .sect “COEF” .word 1234h,3456h; .word 4567h,5678h; .bss y,1 .text ; A(15-0) = 7FFCh (x) PoEval: STLM A,T STM #COEF,AR1 LD *AR1+,16,A LD *AR1+,16,B RPT #2 POLY *AR1+ STH A,*(y)

Distance Computation • ABDST Xmem,Ymem ; computes the L1 norm of the distance between 2 vectors according to (ABDST(Xmem,Ymem)): .bss X,10 .bss Y,10 .bss D,1 .text dist: STM #X,AR2 STM #Y,AR3 RPT #10 ABDST *AR2+,*AR3+ STH B,*(D)

Distance Computation • SQDST Xmem,Ymem ; computes the squared L2 norm of the distance between 2 vectors according to (SQDST(Xmem,Ymem)): .bss X,10 .bss Y,10 .bss D,1 .text dist: STM #X,AR2 STM #Y,AR3 RPT #10 SQDST *AR2+,*AR3+ STH B,*(D)

Specific Filters Intructions • Symmetric FIR filters1: An even length symetric FIR filter can be computed according to: which yields N multiplications, because of the symmetry of h(k), the equation can be rewritten: yielding only N/2 multiplications. This optimization is handled by FIRS instruction. …/… 1 see ch14 «FIR filter implementation » for a full treatment of this topic

Specific Filters Intructions • FIRS Xmem,Ymem,pmad ;The high part of accumulator A is multiplied by the content of pmad and accumulated in accumulator B. Xmem and Ymem are added together and stored in the high part of accumulator A (FIRS (Xmem,Ymem,pmad)). • At each step, FIRS do the following computation: where y(n) is in accumulator B and tmp in accumulator A.

Specific Filters Intructions • LMS Algorithm1: LMS Adaptive filtering require to update coeffi-cients of the filter according to an error signal e(n) while computing the output of the filter y(n). This involves the following computations: At each step we have two computations: one for the filter tap and update of the coefficient tap. …/… 1 see ch16 «Adaptive Filter Implementation » for a full treatment of this topic

Specific Filters Intructions • LMS Xmem,Ymem ; Xmem is accumulated to the high part of accumulator A with rounding while Xmem and Ymem are multiplied and accumula-ted into accumulator B (LMS(Xmem,Ymem)). • At each step LMS do the following computations: where y(n) is in accumulator B and tmp in accumu-lator A. In addition others instructions have to store in accumulator A the error times the adaptation step and store in Xmem the updated coefficient value (ST||MPY).

N+1 N d -d Butterfly Computation • These instructions are only useful in dual 16 bits mode (C16=1) • 1: dst(31-16)=Lmem(31-16)+TREG dst(15-0)=Lmem(15-0)-TREG • 2: dst(31-16)=Lmem(31-16)-TREG dst(15-0)=Lmem(15-0)+TREG • 3: dst(31-16)=Lmem(31-16)-TREG dst(15-0)=Lmem(15-0)-TREG N+1 N -d d 1 see ch22 «Viterbi Algorithm» for in depth explanation and CMPS for other Viterbi related instructions

Logic Instructions • Overview • Logic • Comparison and bit test • Shift and rotate AND ORXORCMPL ANDM ORMXORM CMPMCMPRCMPS BIT BITF SFTLSFTA SFTC RORROLROLTC

Logic Instructions • Logic operations on accumulators • 1,2,3: field4=field3 [ + ] field1*2field2 • Result is stored in field4 if present else in field3, shift is done according to field2 if present. • 4: field4=field4 [ + ] field1*2field2 • field4 is used if present else field1is used instead, shift is done according to field2 if present. • Shift field is recalled on the this slide … • field2=field11 • Result is stored in field2 if present else in field1 1bit complement

Logic Instructions • Logic with memory • field2=field1 [ + ] field2 • About ANDM look at BITF

Logic Instructions • Comparison (memory) • Equality test • TC=1 if field1==filed2, else TC=0 • Comparison (auxiliary register) • Versatile comparison • ARx is compared against AR0 according to CC (field1) and TC is set if compare success

N N+1 x ? y Logic Instructions • Compare, select, store (and remember) Intended for Viterbi algorithm (see Chapter 22 for an in depth treatment, see DSADT and DADST for other Viterbi related instructions) Two paths arrive to a node of stage N+1 from stage N. Only one will be retained according to its weight x or y. We guess that src(32-16)=x and src(15-0)=y then Else src(32-16)  src(15-0) then Smem=src(15-0) TRN=(TRN) << 1 TRN(0)=1 TC=1 If src(32-16) > src(15-0) then Smem=src(31-16) TRN=(TRN) << 1 TRN(0)=0 TC=0 Transition register

Logic Instructions • Bit test • BIT and BITT set TC according to a bit value in a word specified by the operand in field1. The bit number is specified either by BITC in case of syntax 1 or by T[3..0] register in case of syntax 2. • Bit numbering is in reverse order, with 0 corresponding to the MSB and 15 to the LSB. • Bit field test • TC is set according the result of (field1  field2) • For this instruction look also at ANDM

Logic Instructions • Shift and rotate • field3=field1  2field2 • Field 1 is left or right shifted according to the sign of SHIFT and stored • in field3 if present, field 1 otherwise • SFTL stands for LOGICAL shift : input bits are equal to 0 • SFTA stands for ARITHMETIC shift : input low order bits are 0 in • case of positive SHIFT. Input high order bits are equal to the sign bits • (if SXM1 is set) when SHIFT is negative. • Shift conditionaly (SFTC) apply to signed, one left shift is done to remove one • redundant sign bit (TC is then set) otherwise nothing is done and TC is reset. For shift field content information see this slide 1Sign Extension Mode

Logic Instructions • Shift and rotate • ROR perform one right rotate through the carry C on src (guard bits=0,src(31)=C,C=src(0)). • ROL perform one left rotate through the carry C on src (guard bits=0,src(0)=C,C=src(31)). • ROLTC perform one left rotate with TC as input and C as ouput. (guard bits=0,src(0)=TC,C=src(31)).

Load, Store & Move Instructions • Load & Store accumulator • field3=field1* 2field2. • shift is done according to field2 if present. • Recall on SHIFT field can be found at this place

Load, Store & Move Instructions • Load & Store accumulator • 1: field2=field1*216+ 215 or dst(32-16)=field1+0.5 • 2: field2=unsigned(field1) or dst(32-16)=0 and dst(15-0)=field1. • 3: particular case of syntax1 LD for Memory Map Register.

Load, Store & Move Instructions • Load & Store accumulator • STL store src(15-0) and STH src(32-16) • field3= field1* 2field2 • shift is done according to field2 if present. • Recall on SHIFT field can be found at this place • Same as syntax1 STL above except that field3 is a Memory Map Register.

Load, Store & Move Instructions • Load & Store other registers • Allows initialization of T, DP or ASM either from memory (Smem) or from an immediate value. • #k3, #k5 and #k9 stand respectively for 3, 5 and 9 bits immediate value. • ARP is only intended for ‘C25 compatibility mode and is not of interest in native ‘C54x software.

Load, Store & Move Instructions • Save others registers or write immediate to memory. • field1=field2. • Syntax 3 allows initialization of any data memory location with an immediate value. • Write an immediate 16 bit value into any Memory Map Register.

Load, Store & Move Instructions • Direct transfer from memory to memory Destination Space Source space

Load, Store & Move Instructions • Data space  IO space • field2=field11 • Data space  Prog. space • 1,3: field2=field11 • 2: source prog. memory address is specified by A(15-0) • 4: destination prog. memory address is specified by A(15-0) 1 0  PA  65535, 0  pmad  65535

Load, Store & Move Instructions • Data space  Data space • Data space  MMR • MMR  MMR1 1MMR1,MMR2:AR0-AR7, SP only

Program Control Instructions † Values for words (W) and cycles assume the use of DARAM for data. ‡ Conditions true § Condition false ¶ Delayed instruction

DSP C5000

DSP C5000

Presentation Transcript

DSP C5000

DSP BOARD

DSP ARCHITECTURE

DSP markets

DSP C5000

DSP Lab.

DSP

DSP Processors

DSP C5000

DSP TMS320F2812

DSP C5000

DSP C5000

DSP C5000

DSP C5000

DSP C5000