1.8k likes | 2.02k Views
嵌入式系统架构软体设计. 嵌入式系統架構軟體設計 ---using ARM Day #3,#4,#5 Modules Outline. 課程介紹. Day #3 Simple RISC Assembly Language ARM Assembly Language ARM Development Suite 使用練習 Day #4 Arm Instruction set Important ASM Programming Skills ARM/THUMB/C Interworking Day #5 ARM Exception Handler
E N D
嵌入式系统架构软体设计 嵌入式系統架構軟體設計 ---using ARM Day #3,#4,#5 Modules Outline
課程介紹 • Day #3 • Simple RISC Assembly Language • ARM Assembly Language • ARM Development Suite 使用練習 • Day #4 • Arm Instruction set • Important ASM Programming Skills • ARM/THUMB/C Interworking • Day #5 • ARM Exception Handler • Build ARM ROM Image • Use NET-Start! ucLinux BSP
Steve Furber, ARM system-on-chip Architecture, 2nd ed. • Seal, ARM architecture reference manual, 2nd ed. • ARM Development Suite-Getting Started • ARM Development Suite-Developer Guide • ARM Development Suite-Assembler Guide • http://www.uclinux.org/ • 2002嵌入式系統開發經驗 • Building powerful platform with Windows CE • Software Engineering, A practitioner’s Approach 3rd ed. • Professional Symbian Programming
嵌入式系統架構軟體設計 ---using ARM Module #3-1: Simple RISC Assembly Concept
RISC精简指令集vs.CISC复杂指令集 Hardware instruction decode logic Pipeline execution Single -cycle execution Large microcode ROMs to decode instruction Allow little pipeline Many cycles to completer a single instruction • A smaller die size • A shorter development time • A higher performance • Poor code density
MUO 一個簡單的處理器 硬體單元 功能 PC Program Counter ACC Accumulator ALU Arithmetic logic unit IR Instruction register
指令 Opcode 功能 MUO指令集與資料路徑 LDA S 0000 ACC=mem[S] STO S 0001 mem[S]=ACC ADD S 0010 ACC=ACC+mem[S] SUB S 0011 ACC=ACC-mem[S] JMP S 0100 PC=S JGE S 0101 If ACC>= PC=S JNE S 0110 If ACC!=0 PC=S STP 0111 stop 指令規則
指令執行範例 • ADD 0x16A ACC:=ACC+mem[0x16A]
指令 Opcode 功能 運算範例 LDA S 0000 ACC=mem[S] STO S 0001 mem[S]=ACC ADD S 0010 ACC=ACC+mem[S] SUB S 0011 ACC=ACC-mem[S] JMP S 0100 PC=S JGE S 0101 If ACC>= PC=S JNE S 0110 If ACC!=0 PC=S STP 0111 stop C function: Main() { C=A+B; } MUO 機器指令 LDA 0x100 ADD 0x104 STO 0x108
指令 Opcode 功能 練習: MUO微處理器的運算 LDA S 0000 ACC=mem[S] STO S 0001 mem[S]=ACC ADD S 0010 ACC=ACC+mem[S] SUB S 0011 ACC=ACC-mem[S] JMP S 0100 PC=S JGE S 0101 If ACC>= PC=S JNE S 0110 If ACC!=0 PC=S STP 0111 stop 0x000 LDA 0x100 0x002 SUB 0x104 0x004 STO 0x100 0x006 JNE 0x000 0x008 STP 請描述此段程式的動作,暫存器值的變化、與資料流。請用C語言來寫出這段程式碼。
嵌入式系統架構軟體設計---using ARM Module #3-2: ARM Assembly Language
ARM7TDMI資料流 e.g.r3:=r4+(r4,,2) ADD r3,r4,r4,LSL#2 A bus B bus
ARM 的暫存器 • 30 general-purpose, 32 bits registers • 1 Program Counter (PC) • 1 Current Program Status Register (CPSR) • 5 Saved Program Status Registers (SPSR) r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr User mode FIQ mode irq mode SVC mode abort mode undefined mode
Program Status Register • CPSR: Current Program Status Register • SPSR: Saved Program Status Register • T bit • Architecture xT only • T=0: ARM state • T=1: Thumb state • Condition code flags • N: Negative result from ALU • Z: Zero result from ALU • C: ALU operation Carried out • V: ALU operation overflowed • Interrupt Disable bits • I: disable the IRQ • F: Disable the FIQ 31 30 29 28 27 24 7 6 5 4 0 N Z C V Q J undefined I F T mode • Q: Sticky Overflow flag • Architecture 5TE only • QADD, QSUB… • Mode bits • Specify the processor mode • 10000 User • 10001 FIQ • 10010 IRQ • 10011 SVC • 10111 Abort • 11011 Undef • 11111 System • J: Processor in Jazelle state • Architecture 5TEJ only
Program Counter –R15 • ARM state: • All ARM instructions are four bytes long (one 32-bit word) and are always aligned on a word boundary. • The PC value is stored in bits [31:2] with bits [1:0] undefined. • In Thumb state: • All instructions are 16 bits wide, and halfword aligned • The PC value is stored in bits[31:1] with bits [0] undefined. • In Jazelle state: • All instructions are 8 bits wide. • The processor performs a word access to read 4 instructions at once.
Link Register –R14 • Register 14 is the Link Register (LR). • This register holds the address of the next instruction after a Branch and Link (BL) instruction, which is the instruction used to make a subroutine call. • At all other times, R14 can be used as a general-purpose register
Other Register R0-R13 • The remaining 15 registers have no special hardware purpose. • Their uses are defined purely by software. • By convention, ARM assembly language use R13 as Stack Pointer. • C and C++ compilers always use R14 as the Stack Pointer(SP).
Structure of ARM Assembly Language Module AREA Sectionname{,attr}{,attr}… Start of New code or data section. CODE: contain machine instructions. READONLY: section should not be written to. Other attr: DATA, NOINIT, READWRITE,… Declares an entry point to a program. Labels. Declares the end of the source file.
Calling Subroutines Uses BL • BL destination • destination is the label on the first instruction of the subroutine. • BL does: • place the return address in the link register (R14) • sets PC to the address of the subroutine. • In the subroutine • we can use “MOV pc,lr” to return. • By convention, R0-R3 are used to pass parameters.
Calling Subroutines Example ; name this block of code ; mark first instruction ; to execute ; Set up parameters ; Call subroutine ; angel_SWI reason_report Exception ; ADP_Stopped_ApplicationExit ; ARM semihosting SWI ; Subroutine code ; Return from subroutine. ; Mark end of file
Constant Data Types • Numbers Numeric constants are accepted in three forms: • Decimal, for example, 123 • Hexadecimal, for example, 0x7B • n_XXX where: • n is as base between 2 and 9 • xxx is a number in that base. • Boolean TRUE and FALSE must be written as {TRUE} and {FALSE}. • Characters constants consist of opening and closing single quotes ‘X’, enclosing either a single character or an escaped character, using the standard C escape characters. • Strings consist of opening and closing double quotes “XXXX”. If double quotes or dollar signs are used within a string as literal text characters, they must be represented by a pair of the appropriate character. • For example, you must use $$ if you require a single $ in the string. The standard C escape sequences can be used within string constants.
Almost all ARM instructions can be conditionally executed. e.g. ADDS r0, r1, r2 ADDEQ r0, r1, r2 Execute if the N, Z, C and V flags in the CPSR satisfy a condition specified in the instruction, otherwise, NOP. Conditional ARM Instructions 指令名稱 條件 XXXCC
Almost every ARM instruction can be executed conditionally on the state of the ALU state flags in the CPSR. Add an S suffix to an ARM data processing instruction to make it update the ALU state flags in the CPSR E.g. ADDS r0, r1, r2 ; r0= r1+ r2 and update ALU status in CPSR. In ARM state, you can: update the ALU status flags in the PSR on the result of a data operation execute several other data operation without updating the flags execute following instructions or not, according to the state of the flags updated in the first operation. In Thumb state most data operations always update the flags and conditional execution can only be achieved using the conditional branch instruction (B). Do not use the S suffix with CMP, CMN, TST, or TEQ. These comparison instructions always update the flags. Conditional Execution
ALU Status Register in CPSR • N Set when the result of the operation was Negative. • Z Set when the result of the operation was Zero. • C when the result of the operation was Carry. • A carry occurs if the result of an addition is greater than or equal to 232 • If the result of a instruction is positive, • or as the result of an inline barrel shifter operation in a move or logical instruction. • V Set when the operation caused oVerflow. • Overflow occurs if the result of an add, subtract, or compare is greater than or equal to 231, or less than – 231. • Q ARM architecture v5Eonly. Sticky flag. • Used to detect saturation in special saturating arithmetic instructions (e.g. QAD, ASUB, QDADD, and QDSUB), • Or overflow in certain multiply instructions (SMLAxy and SMLAWy)
Conditional Code Examples • ADD r0, r1, r2 ;r0 = r1 + r2, don’t update flags • ADDS r0, r1, r2 ;r0 = r1 + r2, and update flags • ADDCSS r0, r1, r2 ;if C flag set then r0 = r1 + r2, and update flags • CMP r0, r1 ;update flags based on r0-r1. • Example code sequence: MOV R0, #0 LOOP ADD R0, R0, #1 CMP R0, #10 BNE LOOP SUB R1, R1, R0
Write Efficient and small size Code by Conditional Instruction
Exercise Write program by ARM assembly, & evaluate the execution cost in clock. A Branch needs 3 cycles, others cost 1. 註:唯需透過CMP, SUB, B這三個指令,加上條件式, 就可以完成。 While (r1!=r2) do { if (r1>r2) r1=r1-r2; else r2=r2-r1; }
嵌入式系統架構軟體設計 ---using ARM Module #3-3: ARM Development Suite使用練習
ARM ADS 1.2 Others: • C & C++ Libraries • ARM firmware suite • AM application library • RealMonitor: for real time debug monitor
Implementation Integrationby command line, makefile, CodeWarrior
Pre-configured Project Stationary Files • Debug • This build target is configured to built output binaries that are fully debuggable, at the expense of optimization. • Release • This build target is configured to output binaries that are fully optimized, at the expense of debug information. • DebugRel • This build target is configured to build output binaries that provide adequate optimization, and give a good debug view.
Reference • ARM Developer Suite Version 1.2 Getting Started • 請依Chapter 3練習使用ADS。
嵌入式系統架構軟體設計 ---using ARM Module #3-4: ARM Instruction Set
ARM 指令集特點 • 所有指令為32 bits • ADD r0, r1, r2; r0:=r1+r2 • 大部分的指令,可以在一個週期內執行完成 • 指令皆可為有條件式執行 • Load/store架構
Thumb指令集 • Thumb指令長度為16 bits • 針對程式碼的密度最佳化, 約為65%的ARM code size • 適合小記憶體系統 • Thumb指令支援的功能為ARM指令集的一部分 • 執行期間必須切換到Thumb模式 ADDS r1,r1,#3 ADD r1,#3
Jazelle • Jazelle技術可以讓ARM執行8-bit Java Bytecode • 硬體可以支援到95%的bytecodes • 速度約為一般軟體JVM的五倍
ARM指令集分類 • Branch instructions • Data-processing instructions • Load and store instructions • Status register transfer instructions • Coprocessor instructions • Exception-generating instructions.
Branch Instructions • B Branch • BL Branch with link • Store the return address to r14 • e.g. • CMP r2, #0 • BLEQ function • … function … MOV PC, r14
Branch Instruction Encoding • The range of the branch instruction is +/- 32 Mbytes • ‘L’: the branch and link variant. Assembly Format: B{L}{<cond>}{S} Rm B{L}{<cond>}{S} <Target address>
Branch instructions example • e.g. C if (a=0) function 1 (1); Else… c function 1(){ function2(); …} function2(){ return;} • ASM function 1 • STMFD r13!, {r0-r4, r14} • BL function2 • … • LDMFD r13!, {r0-r4, pc} • function2 • … • MOV pc, r14
Data-processing Instructions Encoding Assembly Format: <op>{<cond>}{S} Rd, Rn,#<32-bit immediate> <op>{<cond>}{S} Rd, Rn,Rm, {shift}
Data Processing Opcode Assembly Format: <op>{<cond>}{S} Rd, Rn #<32-bit immediate> <op>{<cond>}{S} Rd, Rn, Rm, {<shift>} Opcode Mnemonic Meaning Effect [24:21] 0000 AND Logical bit-wise AND Rd:=Rn & Op2 0001 EOR Logical bit-wise excusive OR Rd:=Rn EOR Op2 0010 SUB Subtract Rd:=Rn-Op2 0011 RSB Reverse subtract Rd:=Op2-Rn 0100 ADD Add Rd:=Rn+Op2 0101 ADC Add with carry Rd:=Rn+Op2+C 0110 SBC Subtract with carry Rd:=Rn-Op2+C-1 0111 RSC Reverse subtract with carry Rd:= Op2-Rn+C-1 1000 TST Test Scc on Rn&Op2 1001 TEQ Test equivalence Scc on Rn EOR Op2 1010 CMP Compare Scc on Rn-Op2 1011 CMN Compare negated Scc on Rn+Op2 1100 ORR Logical bit-wise OR Rd:=Rn | Op2 1101 MOV Move Rd:=Op2 1110 BIC Bit clear Rd:=Rn AND NOT Op2 1111 MVN Move negated Rd:=NOT Op2
Example Data-processing Instructions • Arithmetic operations • ADD r0,r1,r2 ; r0=r1+r2 • SUB r0,r1,r2 ; r0=r1-r2 • RSB r0,r1,r2 ; r0=r2-r1 • Bit-wise logical operations • AND r0,r1,r2 ; r0 = r1&r2 • ORR r0,r1,r2 ; r0 = r1| r2 • EOR r0,r1,r2 ; r0 = r1 xor r2 • BIC r0,r1,r2 ; r0 = and not r2; bit clear
Example Data-processing Instructions (cont.) • Register movement operations • MOV r0,r2 ; r0=r2 • MVN r0,r2 ; r0=not r2 • Comparison operations (set condition code bits N, Z, C, V) • CMP r1,r2 ; set cc on r1-r2 • Immediate operands • ADD r3,r3,#1 ; r3=r3+1 • AND r8,r7, #&ff ; r8=r7[7:0] • & : base 16
Shifter • LSL: Logical Left Shift (X2) • LSR: Logical Shift Right (/2) • ASR: Arithmetic Right Shift • ROR: Rotate Right
Shifter Applications e.g. #1 ADD r3,r2,r1, LSL #3; r3:= r2+8*r1 e.g. #2 r0=r1*5 r0=r1+(r1*4) ADD r0 ,r1, r1, LSL #2
Multiply Instruction Binary Encoding Assembly Format MUL{<cond>}{S} Rd, Rm, Rs MLA{<cond>}{S} Rd, Rm, Rs, Rn <mul>{<cond>}{S} RdHi, RdLo, Rm, Rs RdHi: the most significant 32 bits of 64-bit format number RdLo: the least significant 32 bits of 64-bit format number Opcode Mnemonic Meaning Effect [23:21] 000 MUL Multiply (32-bit result) Rd:=(Rm*Rs)[31:0] 001 MLA Multiply-accumulate (32-bit result) Rd:=(Rm*Rs+Rn)[31:0] 100 UMULL Unsigned multiply long RdHi:RdLo:=Rm*Rs 101 UMLAL Unsigned multiply-accumulate long RdHi:RdLo+=Rm*Rs 110 SMULL Signed multiply long RdHi:RdLo:=Rm*Rs 111 SMLAL Signed multiply-accumulate long RdHi:RdLo+=Rm*Rs