1.57k likes | 1.6k Views
DSP C5000. Chapter 5 Assembly Language. Assembly Language. Two Main types of assembly language : Algebraic Mnemonic Both C54x and C55x can use either type of assembly language. C54. C55. Running C54 code on the C55. C54x Assembly Language.
E N D
DSP C5000 Chapter 5 Assembly Language
Assembly Language • Two Main types of assembly language : • Algebraic • Mnemonic • Both C54x and C55x can use either type of assembly language. C54 C55 Running C54 code on the C55
C54x Assembly Language • The instruction set is divided into four basic types : • Arithmetic • Logic • Load, Store & Move • Program control • C54x has a fixed length instruction word • Instruction must be encoded in one 16-bit word in order to be executed in one cycle
Instructions and Operands • General syntax of an instruction : • Instr Op1,[Op2,[Op3,[…]]] • For Instr field refer to TI documentation or following slides • The Op1,[Op2,[Op3,[…]]] field syntax is specified in instruction documentation and specifies the way (type of addressing mode) you could use for the operands.
Arithmetic Instructions • General purpose Arithmetic : • Addition/subtraction • Multiply (and accumulate) • Square • Divide • Application Specific Arithmetic : • Miscellaneous • Polynomial evaluation • Distance computation • Specific filters • Butterfly computation (Viterbi)
General Purpose Arithmetic • Addition/subtraction • field4=field3 field1*2field2 • Result is stored in field4 if present else in field3, shift is done according to field2 if present. • Shift field is detailed on the next slide …
Shift Field • Many instructions use shift on one operand. This shift is specified in operands field and could be : • Immediate if specified by the keywords: • -16 SHIFT 15 • 0 SHIFT1 15 • Register indirect: • -16 ASM 15 (Accumulator Shift Mode field of ST1) • -16 TS 31 (TS are ths 6 LSBs of T register)
General Purpose Arithmetic • Addition/subtraction (special cases) • Arithmetic with unsigned operand : • field2=field2 unsigned(field1) • Direct computation on memory : • With : SXM=1 field2=field2 +signed(field1) -32768 lk 32767 SXM=0 field2=field2 +unsigned(field1) 0 lk 65535
General Purpose Arithmetic • Addition/subtraction (extended precision) : • 32 bits : • 1 : field2=field2 field1 • 2 : field2=field1-field2 • If C16=0 field1 and field2 are considered as 32-bit operands and 32 bits is realized. • If C16=1 field1 and field2 are considered as a pair of 16-bit operands and SIMD computation take place. • 64 bits : • field2=field2 unsigned(field1) carry/borrow
General Purpose Arithmetic • Addition/subtraction (extended precision contd) • 64 bits addition/subtraction : is realized as follows: (Look at a code exemple)
General Purpose Arithmetic • Multiply • 1,3 : field3=field1TREG2 • 2,4 : field3=field1field2 • Multiply and Accumulate/Subtract • 1 : field3=field3 (field1TREG) 2 • 3 : field4=field3 (field1TREG)1 • 2,4 : field4=field3 (field1field2) 1, 2 1 Result is stored in field4 if present else in field3 2 [R] : rounding result on the 16bits MSB of dst, 16bits LSB are zeroed
General Purpose Arithmetic • Multiply and Accumulate (with program memory) • 1,2 : field4=field3 (field1field2) • 2 : contents of data memory pointed by field1 operand (Smem) is copied in next following data memory address. • Delay of data together with scalar product are needed for FIR filter computation • 1 : TREG=field1 and contents of data memory pointed by field1 is copied in the next following data memory address.
General Purpose Arithmetic • Multiply, Accumulate and Delay • In all cases x(n-k) are in data memory • Case 1: h(k) are in program memory • Case 2: h(k) are in data memory RPT #N-1 MACD *AR1-,coef,A RPTB endLoop-1 LTD *AR1- MAC *AR2+,A endLoop:
General Purpose Arithmetic • Multiply (with accumulator) • 1 : B=field1 A(32-16)1 • 2 : field1=TREGA(32-16)1 • Multiply and Accumulate/Subtract(with accumulator) • 1 : B=B (field1 A(32-16))1,2 • 2 : field3=field2 (TREGA(32-16))1,2,3 1A(32-16) stands for the 16bits MSB of accumulator A, B stands for accumulator B. 2 [R] : rounding result on the 16bits MSB of dst, 16bits LSB are zeroed 3 Result is stored in field3 if present else in field2
General Purpose Arithmetic • Extended precision multiplication • 1: field3=unsigned(field1) unsigned(TREG) • 2: field3=field3 +(unsigned(field1) signed(field2)) • MPYU is equivalent to MPY syntax 1, but with unsigned operands. • MACSU is equivalent to MAC syntax 2, but with field1 operand unsigned.
General Purpose Arithmetic • Extended precision multiplication • Principle : • Look at a code exemple MPYU MACSU MACSU MAC
General Purpose Arithmetic • Square • 1: field2=field1 field1 • 2: field2=A(32-16) A(32-16) 1 • 3: field2=field2 (field1field1) 1A(32-16) stands for the 16bits MSB of accumulator A
General Purpose Arithmetic • Divide • Division is implemented by using repeated conditional subtraction. • Perform a single cycle 1-bit unsigned divide instruction: • Dividend (numerator) is in LSB of src and divisor in Smem, then : • after operation the quotient is LSB of src and remainder in MSB of src. (src) - (Smem) << 15 --> ALU output If ALU output 0 Then(ALU output) << 1 + 1 --> src Else(src) << 1 --> src
General Purpose Arithmetic • Division Routine (More examples) • LD @den,16,A • MPYA @numB = num*den (tells sign) • ABS A Strip sign of denominator • STH A,@den • LD @num,A • ABS A Strip sign of numerator • RPT #1516 iterations • SUBC @den,A1-bit divide • XC 1,BLTIf result needs to be negative • NEG A Invert sign • STL A,@quotStore negative result
Miscellaneous Arithmetic • ABS src,[dst] ; compute the absolute value of src and store it in dst if specified src otherwise (dst=|src|). • NEG src,[dst] ; store the 2s complement of src in dst if specified, src otherwise (dst=-src). • MAX(dst),MIN(dst) ; store in dst the greatest (resp. the lowest) between A and B accumulator (dst=MAX(A,B), dst=MIN(A,B)).
Miscellaneous Arithmetic • EXP/NORM :Tools for fixed point to (block)-floating point conversion1 • Store the high part of the accumulator (A or B) in « Mantissa*2^Exponent » form. • EXP src ; compute the number of shift necessary to normalize the high part of the accumulator src and store it in T register (T=EXP(src)) . Here T=3 after operation. Because of the guard bits, T could be negative after operation. High part low part Guard bits …/… 1 see ch13 «Numerical Issues » for in depth explanation of floating point format
48D0 Mantissa Expo 0012 Miscellaneous Arithmetic • NORM src,[dst] ; The contents of the accumu-lator src is shifted according to the value in T reg ans stored in dst if specified, src otherwise . (dst=src<<TS). Accumulator after operation : Guard bits High part low part .bss Mantissa,2,1 Expo .set Mantissa+1 .text ; A = 1234h format: LD #Mantissa,DP EXP A NORM A ST T,@Expo STH A,@Mantissa
Miscellaneous Arithmetic • Rounding and saturation are intended for finite precision and finite dynamic number representation : • RND src,[dst] ; the high part of the accumulator src is rounded up and stored either in dst if specified or in src (dst=rnd(src)). • SAT src ; If src is greater than 32767 then src is set to 32767 (007FFFh). If src is lower than –32768 then src is set to –32768 (FF8000h) (SATURATE(src)). Before rounding After rounding
Polynomial Evaluation • Considering the 3rd order polynomial: • It can be computed as: in an iterative way: …/…
Polynomial Evaluation • Before using POLY instruction, we have to load T reg. with the proper value for x. • POLY Smem ; The high part of accumulator A is multiplied by T reg. then added with the high part of B and stored in A. The high part of accumula-tor B is loaded with the contents of Smem (current coefficient) (POLY(Smem)). coef .sect “COEF” .word 1234h,3456h; .word 4567h,5678h; .bss y,1 .text ; A(15-0) = 7FFCh (x) PoEval: STLM A,T STM #COEF,AR1 LD *AR1+,16,A LD *AR1+,16,B RPT #2 POLY *AR1+ STH A,*(y)
Distance Computation • ABDST Xmem,Ymem ; computes the L1 norm of the distance between 2 vectors according to (ABDST(Xmem,Ymem)): .bss X,10 .bss Y,10 .bss D,1 .text dist: STM #X,AR2 STM #Y,AR3 RPT #10 ABDST *AR2+,*AR3+ STH B,*(D)
Distance Computation • SQDST Xmem,Ymem ; computes the squared L2 norm of the distance between 2 vectors according to (SQDST(Xmem,Ymem)): .bss X,10 .bss Y,10 .bss D,1 .text dist: STM #X,AR2 STM #Y,AR3 RPT #10 SQDST *AR2+,*AR3+ STH B,*(D)
Specific Filters Intructions • Symmetric FIR filters1: An even length symetric FIR filter can be computed according to: which yields N multiplications, because of the symmetry of h(k), the equation can be rewritten: yielding only N/2 multiplications. This optimization is handled by FIRS instruction. …/… 1 see ch14 «FIR filter implementation » for a full treatment of this topic
Specific Filters Intructions • FIRS Xmem,Ymem,pmad ;The high part of accumulator A is multiplied by the content of pmad and accumulated in accumulator B. Xmem and Ymem are added together and stored in the high part of accumulator A (FIRS (Xmem,Ymem,pmad)). • At each step, FIRS do the following computation: where y(n) is in accumulator B and tmp in accumulator A.
Specific Filters Intructions • LMS Algorithm1: LMS Adaptive filtering require to update coeffi-cients of the filter according to an error signal e(n) while computing the output of the filter y(n). This involves the following computations: At each step we have two computations: one for the filter tap and update of the coefficient tap. …/… 1 see ch16 «Adaptive Filter Implementation » for a full treatment of this topic
Specific Filters Intructions • LMS Xmem,Ymem ; Xmem is accumulated to the high part of accumulator A with rounding while Xmem and Ymem are multiplied and accumula-ted into accumulator B (LMS(Xmem,Ymem)). • At each step LMS do the following computations: where y(n) is in accumulator B and tmp in accumu-lator A. In addition others instructions have to store in accumulator A the error times the adaptation step and store in Xmem the updated coefficient value (ST||MPY).
N+1 N d -d Butterfly Computation • These instructions are only useful in dual 16 bits mode (C16=1) • 1: dst(31-16)=Lmem(31-16)+TREG dst(15-0)=Lmem(15-0)-TREG • 2: dst(31-16)=Lmem(31-16)-TREG dst(15-0)=Lmem(15-0)+TREG • 3: dst(31-16)=Lmem(31-16)-TREG dst(15-0)=Lmem(15-0)-TREG N+1 N -d d 1 see ch22 «Viterbi Algorithm» for in depth explanation and CMPS for other Viterbi related instructions
Logic Instructions • Overview • Logic • Comparison and bit test • Shift and rotate AND ORXORCMPL ANDM ORMXORM CMPMCMPRCMPS BIT BITF SFTLSFTA SFTC RORROLROLTC
Logic Instructions • Logic operations on accumulators • 1,2,3: field4=field3 [ + ] field1*2field2 • Result is stored in field4 if present else in field3, shift is done according to field2 if present. • 4: field4=field4 [ + ] field1*2field2 • field4 is used if present else field1is used instead, shift is done according to field2 if present. • Shift field is recalled on the this slide … • field2=field11 • Result is stored in field2 if present else in field1 1bit complement
Logic Instructions • Logic with memory • field2=field1 [ + ] field2 • About ANDM look at BITF
Logic Instructions • Comparison (memory) • Equality test • TC=1 if field1==filed2, else TC=0 • Comparison (auxiliary register) • Versatile comparison • ARx is compared against AR0 according to CC (field1) and TC is set if compare success
N N+1 x ? y Logic Instructions • Compare, select, store (and remember) Intended for Viterbi algorithm (see Chapter 22 for an in depth treatment, see DSADT and DADST for other Viterbi related instructions) Two paths arrive to a node of stage N+1 from stage N. Only one will be retained according to its weight x or y. We guess that src(32-16)=x and src(15-0)=y then Else src(32-16) src(15-0) then Smem=src(15-0) TRN=(TRN) << 1 TRN(0)=1 TC=1 If src(32-16) > src(15-0) then Smem=src(31-16) TRN=(TRN) << 1 TRN(0)=0 TC=0 Transition register
Logic Instructions • Bit test • BIT and BITT set TC according to a bit value in a word specified by the operand in field1. The bit number is specified either by BITC in case of syntax 1 or by T[3..0] register in case of syntax 2. • Bit numbering is in reverse order, with 0 corresponding to the MSB and 15 to the LSB. • Bit field test • TC is set according the result of (field1 field2) • For this instruction look also at ANDM
Logic Instructions • Shift and rotate • field3=field1 2field2 • Field 1 is left or right shifted according to the sign of SHIFT and stored • in field3 if present, field 1 otherwise • SFTL stands for LOGICAL shift : input bits are equal to 0 • SFTA stands for ARITHMETIC shift : input low order bits are 0 in • case of positive SHIFT. Input high order bits are equal to the sign bits • (if SXM1 is set) when SHIFT is negative. • Shift conditionaly (SFTC) apply to signed, one left shift is done to remove one • redundant sign bit (TC is then set) otherwise nothing is done and TC is reset. For shift field content information see this slide 1Sign Extension Mode
Logic Instructions • Shift and rotate • ROR perform one right rotate through the carry C on src (guard bits=0,src(31)=C,C=src(0)). • ROL perform one left rotate through the carry C on src (guard bits=0,src(0)=C,C=src(31)). • ROLTC perform one left rotate with TC as input and C as ouput. (guard bits=0,src(0)=TC,C=src(31)).
Load, Store & Move Instructions • Load & Store accumulator • field3=field1* 2field2. • shift is done according to field2 if present. • Recall on SHIFT field can be found at this place
Load, Store & Move Instructions • Load & Store accumulator • 1: field2=field1*216+ 215 or dst(32-16)=field1+0.5 • 2: field2=unsigned(field1) or dst(32-16)=0 and dst(15-0)=field1. • 3: particular case of syntax1 LD for Memory Map Register.
Load, Store & Move Instructions • Load & Store accumulator • STL store src(15-0) and STH src(32-16) • field3= field1* 2field2 • shift is done according to field2 if present. • Recall on SHIFT field can be found at this place • Same as syntax1 STL above except that field3 is a Memory Map Register.
Load, Store & Move Instructions • Load & Store other registers • Allows initialization of T, DP or ASM either from memory (Smem) or from an immediate value. • #k3, #k5 and #k9 stand respectively for 3, 5 and 9 bits immediate value. • ARP is only intended for ‘C25 compatibility mode and is not of interest in native ‘C54x software.
Load, Store & Move Instructions • Save others registers or write immediate to memory. • field1=field2. • Syntax 3 allows initialization of any data memory location with an immediate value. • Write an immediate 16 bit value into any Memory Map Register.
Load, Store & Move Instructions • Direct transfer from memory to memory Destination Space Source space
Load, Store & Move Instructions • Data space IO space • field2=field11 • Data space Prog. space • 1,3: field2=field11 • 2: source prog. memory address is specified by A(15-0) • 4: destination prog. memory address is specified by A(15-0) 1 0 PA 65535, 0 pmad 65535
Load, Store & Move Instructions • Data space Data space • Data space MMR • MMR MMR1 1MMR1,MMR2:AR0-AR7, SP only
Program Control Instructions † Values for words (W) and cycles assume the use of DARAM for data. ‡ Conditions true § Condition false ¶ Delayed instruction