400 likes | 560 Views
Chapter 3-1 ARM ISA. ARM Instruction Set Architecture Next Lecture ARM program examples. ARM processors. Used in low-power and low-cost embedded applications Cell phones, PDAs, modems Various simulation models available for embedded system design as well as low-power design
E N D
Chapter 3-1ARM ISA • ARM Instruction Set Architecture • Next Lecture • ARM program examples
ARM processors • Used in low-power and low-cost embedded applications • Cell phones, PDAs, modems • Various simulation models available for embedded system design as well as low-power design • Support both Big-endian and Little-endian • All arithmetic and logic instructions operate only on data in processor registers • Pipelining: 3 or 5 stages • Instruction Fetch (IF), Decode (ID), Execute (EX), Memory Access (Mem) and Write-back (WB) • http://www.heyrick.co.uk/assembler/
Data Sizes and Instruction Sets • The ARM is a 32-bit RISC architecture. • When used in relation to the ARM: • Byte means 8 bits • Halfword means 16 bits (two bytes) • Word means 32 bits (four bytes) • Most ARMs implement two instruction sets • 32-bit ARM Instruction Set • 16-bit Thumb Instruction Set • for tiny systems • Jazelle cores can also execute Java bytecode
The ARM Register Set Current Visible Registers Current Visible Registers Current Visible Registers Current Visible Registers Current Visible Registers Current Visible Registers r0 r0 r0 r0 r0 r0 r0 Abort Mode SVC Mode Undef Mode FIQ Mode User Mode IRQ Mode r1 r1 r1 r1 r1 r1 r1 r2 r2 r2 r2 r2 r2 r2 Banked out Registers Banked out Registers Banked out Registers Banked out Registers Banked out Registers Banked out Registers r3 r3 r3 r3 r3 r3 r3 r4 r4 r4 r4 r4 r4 r4 r5 r5 r5 r5 r5 r5 r5 User User User User User FIQ FIQ FIQ FIQ FIQ FIQ IRQ IRQ IRQ IRQ IRQ IRQ SVC SVC SVC SVC SVC SVC Undef Undef Undef Undef Undef Undef Abort Abort Abort Abort Abort Abort r6 r6 r6 r6 r6 r6 r6 r7 r7 r7 r7 r7 r7 r7 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r8 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r9 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r10 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r11 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) r15 (pc) cpsr cpsr cpsr cpsr cpsr cpsr cpsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr spsr
ARM Instruction Set • Registers • 15 general purpose registers (R0-R14), 32 bit wide • R15 is Program Counter (PC) • R14 is used as Link Register (LR), R13 is Stack Pointer (SP) • Status Register (CPSR) holds the condition flags (N,Z,C, and V), the interrupt disable bits and processor mode bits • There are 15 additional general purpose registers called the banked registers, which are used when the processor switches into Supervisor or Interrupt modes 31 30 29 28 7 6 0 4 N C V Z CPSR Processor mode bits Interrupt disable bits
31 28 27 20 19 16 15 12 11 4 3 0 Condition OP code Rn Rd Other info Rm 4 bits 8bits 4 4 8bits 4 ARM Instructions • Each instruction is encoded into 32 bits • Access to memory is through load and store only • In a load, the operand is transferred into the register named in the Rd field • In a store, the operand is transferred from Rd into memory • If the operand is a byte, it is always located in the lower order byte position of the register and on a load the higher order bytes are filled with zeros.
31 28 27 20 19 16 15 12 11 4 3 0 Condition OP code Rn Rd Other info Rm 4 bits 8bits 4 4 8bits 4 Conditional Executions of Instructions • All instructions are conditionally executed • The instruction is executed only if the current state of the processor condition code flags equal the condition specified in bits b31 – b28 • One of the conditions is used to indicate that the instruction is always executed N C V Z CPSR 31 30 29 28
Setting Condition Code • CMP Rn, Rm • Performs the operation [Rn]-[Rm] and sets the condition codes based on the result of the operation • The arithmetic and logic instructions affect the condition code flags only if explicitly specified in the Opcode field • Example • ADDS R0, R1, R2 ; sets the condition code flags • ADD R0, R1, R2 ; does not
Conditional Execution and Flags decrement r1 and set flags if Z flag clear then branch • ARM instructions can be made to execute conditionally by postfixing them with the appropriate condition code field. • This improves code density and performance by reducing the number of forward branch instructions. CMP r3,#0 CMP r3,#0 BEQ skip ADDNE r0,r1,r2 ADD r0,r1,r2skip • By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “S”. CMP does not need “S”. loop … SUBS r1,r1,#1 BNE loop
Condition Codes Suffix Description Flags tested EQ Equal Z=1 NE Not equal Z=0 CS/HS Unsigned higher or same C=1 CC/LO Unsigned lower C=0 MI Minus N=1 PL Positive or Zero N=0 VS Overflow V=1 VC No overflow V=0 HI Unsigned higher C=1 & Z=0 LS Unsigned lower or same C=0 or Z=1 GE Greater or equal N=V LT Less than N!=V GT Greater than Z=0 & N=V LE Less than or equal Z=1 or N=!V AL Always • The possible condition codes are listed below: • Note: AL is the default and does not need to be specified
Examples of conditional execution • Use a sequence of several conditional instructions if (a==0) func(1); CMP r0,#0MOVEQ r0,#1BLEQ func • Set the flags, then use various condition codes if (a==0) x=0;if (a>0) x=1; (else if) CMP r0,#0MOVEQ r1,#0MOVGT r1,#1 • Use conditional compare instructions if (a==4 || a==10) x=0; Pop Quiz? Bonus 1pt on test
31 28 27 20 19 16 15 12 11 4 3 0 Condition OP code Rn Rd Other info Rm 4 bits 8bits 4 4 8bits 4 Basic Addressing Modes • Basic load instruction: LDR Rd, [Rn, #offset] • Offset: a signed number in the immediate mode • EA = a signed offset + the contents of register Rn • Operation: Rd [[Rn]+offset] • The destination register listed first • The magnitude of the offset is a 12 bit immediate value contained in the lower 12 bits of the instruction • LDR Rd, [Rn,Rm] performs Rd [[Rn]+[Rm]] • The magnitude is the content of a third register Rm • LDR Rd, [Rn] performs Rd [[Rn]] offset
Addressing Modes: Store • STR Rd, [Rn] performs [[Rn]] [Rd] • i.e., transfers a word into the memory • The STRB instruction transfers the byte contained in the low-order end of Rd • Note the order of operands
Addressing Modes: Pre-indexed • [Rn, #offset] or [Rn, ±Rm, shift] • EA = [Rn] + offset, or EA = [Rn] ± [Rm] shifted • Calculate operand address first, and then perform operation Word (4 bytes) 1000 R5 STR R3, [R5,R6] Base register 100 R6 1000 Offset register Offset 100 Operand 1100
Addressing Modes: Pre-indexed example • STR R3, [R5, R10, LSL #2] • EA = [R5] + [R10 * 4] C[0] Word (4 bytes) int C[100]; … for (i=0; i++; i<N) C[i] = A[i] + B[i] 1000 R5 STR R3, [R5, R10, LSL #2] Base register i 25 R10 Offset register C[ ] 1000 Offset 100 Operand 1100
Pre-indexed with Write Back • [Rn, #offset]! or [Rn, ±Rm, shift]! • EA = [Rn] + offset • EA = [Rn] ± [Rm] shifted • Then, EA is written back into Rn • Example: • STR R0, [Rbase, Rindex]! • Store R0 at Rbase + Rindex, and write back new address Rbase + Rindex to Rbase. • ! in the Pre-indexed mode means that a write back is to be performed Q?
Pre-indexed Addressing with write-back Push instruction: STR R0, [R5, #-4]! • STR R0, [R5, #-4]! • EA = R5 – 4, i.e., R5 R5 – 4 • Perform operation Store • R5 is used as the stack pointer • R5 initially contains the address 2012 of the current TOS • The immediate offset -4 is added to the content (2012) of R5 and written back into R5 • This new TOS location is used as the EA (2008) to store the contents of R0, 27 2012 R5 Base register (Stack pointer) 27 R0 2008 27 2012 After execution of the Push instruction
Addressing Modes: Post-indexed • The EA of the operand is the contents of Rn • Perform operation first with the operand • Then add the offset to Rn (i.e., the result is written back into Rn) • The post-indexed mode always involves a write back • The pre-indexed and post-indexed are distinguished by the way the square brackets are used. • [Rn, #offset] vs. [Rn], #offset • The offset may be given as an immediate value (range +/- 4095) or as the contents of the third register Rm
Post-indexed Example: used to access a column of elements of a 25x25 matrix LDR R1, [R2], R10, LSL #2 Word (4 bytes)/element R2 1000 1000 6 Base register 25 R10 100 = 25x4 Offset register 1100 -17 1000 1100 100 = 25x4 1200 1..00 321 1200 for (i=1; i++; i≤N) sum += D[i,1];
Pre or Post Indexed Addressing? • Pre-indexed: STR r0,[r1,#12] r0 Offset SourceRegisterfor STR 0x5 12 0x5 0x20c r1 BaseRegister 0x200 0x200 Write-back (auto-update) form:STR r0,[r1,#12]! • Post-indexed: STR r0,[r1],#12 int *ptr; x = *ptr++; r1 Offset UpdatedBaseRegister 0x20c 12 0x20c r0 SourceRegisterfor STR 0x5 OriginalBaseRegister r1 0x5 0x200 0x200
Recap: Pre, Post-indexed Modes • LDR R0, [R1, -R2]! • R0 [[R1] –[R2]]; R1 [R1] – [R2] • When the offset is given in a register, it may be scaled by a power of 2 by shifting to the right or to the left. • This is indicated with either LSL or LSR and the shift amount • The shift amount is in the range 0 to 31 • LDR R0, [R1, -R2, LSL #4]! • R0 [[R1] – 16 x [R2]]; R1 [R1] – 16 x [R2] • The PC may be used as the base register Rn. The assembler determines the immediate offset as the signed distance between the address of the operand and the contents of the PC. (relative addressing mode)
Relative Addressing Mode • When the effective address is calculated at instruction execution time, the contents of the PC will have been updated to the address two words (8 bytes) forward from the current instruction Memory Address Word (4 bytes) LDR R1, ITEM 1000 Why? 1004 Updated [PC] = 1008 1008 52 = offset • The offset calculated by the assembler is 52 because the updated PC = 1008 • EA = 1060 = 1008 + 52 ITEM= 1060 Operand
Multiple Load and Store • The ARM can also load multiple operands • Called block transfer • LDM: load multiple • STM: store multiple • The offset is always 4; thus it is not specified explicitly • Assume R10 is the base register and it contains 1000 • LDMIA R10!, {R0,R1,R6,R7} • transfers the words from locations 1000, 1004, 1008, 1012 into registers R0, R1, R6 and R7 • The suffix IA indicates increment after • IB: Increment Before, DA: Decrement After, DB: Decrement Before
LDM / STM operation • Syntax: <LDM|STM>{<cond>}<addressing_mode> Rb{!}, <register list> • 4 addressing modes: LDMIA / STMIA increment after LDMIB / STMIB increment before LDMDA / STMDA decrement after LDMDB / STMDB decrement before DA DB IA IB LDMxx r10, {r0,r1,r4} STMxx r10, {r0,r1,r4} r4 r4 r1 r1 r0 DecreasingAddress Base Register (Rb) r10 r0 r0 r1 r0 Pop Quiz? r4 r1 r4
Move Instructions • MOV Rd, Rm • Rd [Rm] • MOV R0, #76 • R0 #76
Branch instructions • Branch : B{<cond>} label • Branch with Link : BL{<cond>} subroutine_label • The processor core shifts the offset field left by 2 positions, sign-extends it and adds it to the PC • ± 32 Mbyte range • How to perform longer branches? 31 28 27 25 24 23 0 Cond 1 0 1 L Offset Link bit 0 = Branch 1 = Branch with link Condition field
Conditional Branch Instructions • Conditional branch instructions contain 2’s complement 24 bit offset, whihc is first left-shifted by 2 and then added to the updated contents of the PC to generate the branch target. 31 28 27 24 23 0 condition offset OPcode At the time the branch target address is calculated, the content of the PC has been updated to contain the address of the instruction that is two words beyond the branch instruction. 1000 BEQ LOCATION 1004 Updated [PC] =1008 Offset = 92 LOCATION = 1100 Branch Target
ARM Branches and Subroutines • B <label> • PC relative. ±32 Mbyte range. • BL <subroutine> • Stores return address in LR • Returning implemented by restoring the PC from LR • For non-leaf functions, LR will have to be stacked func1 func2 : : BL func1 : : STMFD sp!,{regs,lr} : BL func2 : LDMFD sp!,{regs,pc} : : : : : MOV pc, lr
Data processing Instructions • Consist of : • Arithmetic: ADD ADC SUB SBC RSB RSC • Logical: AND ORR EOR BIC • Comparisons: CMP CMN TST TEQ • Data movement: MOV MVN • These instructions only work on registers, NOT memory. • Syntax: <Operation>{<cond>}{S} Rd, Rn, Operand2 • Comparisons set flags only - they do not change Rd • Data movement does not change Rn • Second operand is sent to the ALU via barrel shifter.
Arithmetic Instructions • Opcode Rd, Rn, Rm • ADD R0, R2, R4 • R0 [R2] + [R4] • SUB R0, R6, R5 • R0 [R6] – [R5] • ADD R0, R3, #17 • R0 [R3] + 17 • ADD R0, R1, R5, LSL #4 • R0 [R1] + 16 x [R5] • MUL R0, R1, R2 • R0 [R1] x [R2] • Places the low-order 32 bits of the product in a third register • High order bits of the product are discarded • MLA R0, R1, R2, R3 • R0 [R1] x [R2] + [R3]; multiply accumulate
Logic Instructions • AND Rd, Rn, Rm • Rd [Rn] AND [Rm] ; logical bitwise AND • Example ; R0 02FA62CA and R1 0000FFFF • AND R0, R0, R1 ;R0 000062CA • BIC Rd, Rn, Rm • Bit clear, complements each bit in Rm and then performs AND with the bits in Rn • Example ; R0 02FA62CA and R1 0000FFFF • BIC R0, R0, R1 ;R0 02FA0000 • MVN complements the bits of the source operand and places the result in Rd • R3 0F0F0F0F • MVN R0, R3 ;R0 F0F0F0F0
Shift Operations LSL : Logical Shift Left ASR: Arithmetic Shift Right Destination Destination CF CF 0 Multiplication by a power of 2 Division by a power of 2, preserving the sign bit LSR : Logical Shift Right ROR: Rotate Right Destination Destination CF CF ...0 Division by a power of 2 Bit rotate with wrap aroundfrom LSB to MSB RRX: Rotate Right Extended A barrel shifter is a hardware device that can shift a data word left or right by any number of bits in a single operation Destination CF Single bit rotate with wrap aroundfrom CF to MSB
Using the Barrel Shifter:The Second Operand Operand 2 Operand 1 BarrelShifter ALU Result Register, optionally with shift operation • Shift value can be either be: • 5 bit unsigned integer • Specified in bottom byte of another register. • Used for multiplication by constant Immediate value • 8 bit number, with a range of 0-255. • Rotated right through even number of positions • Allows increased range of 32-bit constants to be loaded directly into registers
Immediate Constants (1) • No ARM instruction can contain a 32 bit immediate constant • All ARM instructions are fixed as 32 bits long • The data processing instruction format has 12 bits available for operand2 • 4 bit rotate value (0-15) is multiplied by two to give range 0-30 in steps of 2 • Rule to remember is “8-bits shifted by an even number of bit positions”. 11 8 7 0 rot immed_8 Quick Quiz: MOV r0,#255,8 x2 ShifterROR
Immediate Constants (2) • Examples: • The assembler converts immediate values to the rotate form: • MOV r0,#4096 ; uses 0x40 ror 26 • ADD r1,r2,#0xFF0000 ; uses 0xFF ror 16 • The bitwise complements can also be formed using MVN: • MOV r0, #0xFFFFFFFF ; assembles to MVN r0,#0 • Values that cannot be generated in this way will cause an error. 31 0 ror #0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0x000000ff step 0x00000001 ror #8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0xff000000 step 0x01000000 ror #30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0x000003fc step 0x00000004
Loading 32 Bit Constants • To allow larger constants to be loaded, the assembler offers a pseudo-instruction: • LDR rd, =const • This will either: • Produce a MOV or MVN instruction to generate the value (if possible). or • Generate a LDR instruction with a PC-relative address to read the constant from a literal pool (Constant data area embedded in the code). • For example • LDR r0,=0xFF=>MOV r0,#0xFF • LDR r0,=0x55555555=>LDR r0,[PC,#Imm12]… … DCD 0x55555555 • This is the recommended way of loading constants into a register
Multiply • Syntax: • MUL{<cond>}{S} Rd, Rm, Rs Rd = Rm * Rs • MLA{<cond>}{S} Rd,Rm,Rs,Rn Rd = (Rm * Rs) + Rn • [U|S]MULL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := Rm*Rs • [U|S]MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := (Rm*Rs)+RdHi,RdLo • Cycle time • Basic MUL instruction • 2-5 cycles on ARM7TDMI • 1-3 cycles on StrongARM/XScale • 2 cycles on ARM9E/ARM102xE • +1 cycle for ARM9TDMI (over ARM7TDMI) • +1 cycle for accumulate (not on 9E though result delay is one cycle longer) • +1 cycle for “long” • Above are “general rules” - refer to the TRM for the core you are using for the exact details
Single register data transfer LDR STR Word LDRB STRB Byte LDRH STRH Halfword LDRSB Signed byte load LDRSH Signed halfword load • Memory system must support all access sizes • Syntax: • LDR{<cond>}{<size>} Rd, <address> • STR{<cond>}{<size>} Rd, <address> e.g. LDREQB
Address Accessed • Address accessed by LDR/STR is specified by a base register plus an offset • For word and unsigned byte accesses, offset can be • An unsigned 12-bit immediate value (ie 0 - 4095 bytes).LDR r0,[r1,#8] • A register, optionally shifted by an immediate valueLDR r0,[r1,r2] LDR r0,[r1,r2,LSL#2] • This can be either added or subtracted from the base register:LDR r0,[r1,#-8] LDR r0,[r1,-r2] LDR r0,[r1,-r2,LSL#2] • For halfword and signed halfword / byte, offset can be: • An unsigned 8 bit immediate value (ie 0-255 bytes). • A register (unshifted). • Choice of pre-indexed or post-indexed addressing