650 likes | 841 Views
嵌入式處理器架構與 程式設計. 王建民 中央研究院 資訊所 2008 年 7 月. Contents. Introduction Computer Architecture ARM Architecture Development Tools GNU Development Tools ARM Instruction Set ARM Assembly Language ARM Assembly Programming GNU ARM ToolChain Interrupts and Monitor.
E N D
嵌入式處理器架構與程式設計 王建民 中央研究院 資訊所 2008年 7月
Contents • Introduction • Computer Architecture • ARM Architecture • Development Tools • GNU Development Tools • ARM Instruction Set • ARM Assembly Language • ARM Assembly Programming • GNU ARM ToolChain • Interrupts and Monitor
Outline • Coprocessor and Thumb Instructions • Assembly Language • Runtime Environment
Coprocessors1 • The ARM architecture supports 16 coprocessors • System coprocessor • Floating-point coprocessor • Application-specific coprocessor • A coprocessor may be implemented • in hardware • in software (via the undefined instruction exception) • in both (common cases in hardware, the rest in software) • Each coprocessor instruction set occupies part of the ARM instruction set.
Coprocessors2 • There are three types of coprocessor instruction • Coprocessor data processing • Coprocessor (to/from ARM) register transfers • Coprocessor memory transfers (load and store to/from memory) • Assembler macros can be used to transform custom coprocessor mnemonics into the generic mnemonics understood by the processor.
Coprocessor Data Processing • This instruction initiates a coprocessor operation • The operation is performed only on internal coprocessor state • For example, a Floating point multiply, which multiplies the contents of two registers and stores the result in a third register • Syntax: • CDP{<cond>} <cp_num>,<opc_1>,CRd,CRn,CRm,{<opc_2>}
Coprocessor Register Transfers • Instructions • MRC : Move to ARM Register from Coprocessor • MCR : Move to Coprocessor from ARM Register • An operation may also be performed on the data as it is transferred • Ex. a Floating Point Convert to Integer instruction can be implemented as a register transfer to ARM. • Syntax <MRC|MCR>{<cond>} <cp_num>,<opc_1>,Rd,CRn,CRm,<opc_2> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 7 5 4 3 0 Cond 1 1 1 0 opc_1 L CRn Rd cp_num opc_2 1 CRm ARM Source/Dest Register Opcode Coprocesor Source/Dest Registers Transfer To/From Coprocessor Condition Code Specifier Opcode
31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 7 0 Cond 1 1 0 P U N W L Rn CRd cp_num Offset Source/Dest Register Address Offset Base Register Load/Store Base Register Writeback Condition Code Specifier Transfer Length Add/Subtract Offset Pre/Post Increment Coprocessor Memory Transfers1 • Load from memory to coprocessor registers • Store to memory from coprocessor registers.
Coprocessor Memory Transfers2 • Syntax <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<address> • PC relative offset generated if possible, else causes an error. <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn,offset]{!}> • Pre-indexed form, with optional writeback of the base register <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn],offset> • Post-indexed form • <L> when present causes a “long” transfer to be performed (N=1) else causes a “short” transfer to be performed (N=0). • Effect of this is coprocessor dependant.
Thumb1 • Thumb is a 16-bit instruction set • Optimized for code density from C code (~65% of ARM code size) • Improved performance from narrow memory (~160% of an equivalent ARM connected to 16-bit memory system) • Subset of the functionality of the ARM instruction set • Core has additional execution state - Thumb • It can switch back and forth between 16-bit and 32-bit instructions • Switch between ARM and Thumb using BX instruction
Thumb2 • For most instructions generated by compiler: • Conditional execution is not used • Source and destination registers identical • Only Low registers used • Constants are of limited size • Inline barrel shifter not used 31 ADDS r2,r2,#1 0 32-bit ARM Instruction 15 ADD r2,#1 0 16-bit Thumb Instruction
Outline • Coprocessor and Thumb Instructions • Assembly Language • Runtime Environment
The Programmer’s Model1 • We will not be using the Thumb instruction set. • Memory Formats • We will be using the Little Endian format • the lowest numbered byte of a word is considered the word’s least significant byte, and the highest numbered byte is considered the most significant byte . • Instruction Length • All instructions are 32-bits long. • Data Types • 8-bit bytes and 32-bit words.
The Programmer’s Model2 • Processor Modes (of interest) • User: the “normal” program execution mode. • IRQ: used for general-purpose interrupt handling. • Supervisor: a protected mode for the operating system. • The Register Set • Registers R0-R15 + CPSR • R13: Stack Pointer • R14: Link Register • R15: Program Counter where bits 0:1 are ignored (why?)
The Programmer’s Model3 • Program Status Registers • CPSR (Current Program Status Register) • holds info about the most recently performed ALU operation • contains N (negative), Z (zero), C (Carry) and V (oVerflow) bits • controls the enabling and disabling of interrupts • sets the processor operating mode • SPSR (Saved Program Status Registers) • used by exception handlers • Exceptions • reset, undefined instruction, SWI, IRQ.
Assembly Language Basics1 • “Load/store” architecture • 32-bit instructions • 32-bit and 8-bit data types • 32-bit addresses • 37 registers (30 general-purpose registers, 6 status registers and a PC) • only a subset is accessible at any point in time • No instruction to move a 32-bit constant to a register (why?)
Assembly Language Basics2 • Conditional execution • Barrel shifter • scaled addressing, multiplication by a small constant, and ‘constant’ generation • Loading constants into registers • Loading addresses into registers • Load and Store Multiple instructions • Jump tables • Co-processor instructions (we will not use these)
GNU ARM Assembler • You can assemble the contents of any ARM assembly language source file by executing the arm-elf-as program. • arm-elf-as –mno-fpu –o filename.o filename.s • Though you can use the GNU Linker to create the final executable, it is preferred to use the GNU Compiler Collection to create an executable file. • arm-elf-gcc –o filename.elf filename.s • To execute an ARM executable file • arm-elf-run filename.elf
Assembly Language Syntax • Each assembly line has the following format [<label:>] [<instruction or directive>] @ comment • A label can be any valid symbol followed by a : • Only use the alphabetic characters A-Z and a-z, the digits 0-9, as well as “_”, “.”, and “$” • An instruction to assemble into machine language code. • Begins with a letter • A directive to guide the work of the assembler • Begins with a . • A comment is anything that follows a @ • C-style comments (using “/*” and “*/”) are also allowed
Assembler Directives • Starting a new section .section name • Defining code section of program .text • Defining data initialized data section of program .data • Defining un-initialized data section of program .bss • End of the assembly file (optional) .end
Assembler Directives • Making a symbol available to other partial programs that are linked with it .global symbol • Declaring a symbol as externally defined (optional) .extern symbol • Aligning the address to a particular storage boundary which is a power of 2. .align expression • Declaring a common symbol that may be merged .comm symbol,length,alignment
Assembler Directives • Defining / initializing storage locations .word expression @ 32 bits .hword expression @ 16 bits .byte expression @ 8 bits • Defining / initializing a string .ascii “string” .asciz “string” • Defining memory space .skip size .space size
Assembler Directives • Directives similar to the statements that begin with “#” in the C programming language .include “file” .equ symbol, expression .set symbol, expression .if expression .ifdef expression .ifndef expression .else .endif
Chunks of code or data manipulated by the linker Minimum required block (why?) First instruction to be executed The Structure of an Assembly Code .file "sum2.s" .section .text @ the code section .align 2 @ aligns the address @ to 4 bytes .global sum2 @ give the symbol @ an external linkage sum2: add r0, r0, r1 @ add input arguments mov pc, lr @ return from subroutine .end @ end of program
Example #1: Finding the Large One #include <stdio.h> extern int max2(int a, int b); int main() { int a = 12345; int b = 6789; printf("The maximum of %d and %d is %d\n",a,b,max2(a,b)); } .text .align 2 .global max2 max2: cmp r0, r1 @ compare two numbers bge done @ if R0 contains the maximum mov r0, r1 @ otherwise overwrite R0 done: mov pc, lr @ return from subroutine
Example #2: Finding the Largest #include <stdio.h> extern int maxn(int *a, int n); int a[6] = { 123, 34, 45, 56, 678, 9 }; int main() { printf("The maximum of all numbers is %d\n", maxn(a,6)); } .text .align 2 .global maxn maxn: mov r2, r0 mov r3, r1 ldr r0, [r2], #4 loop: subs r3, r3, #1 @ reduce the count by 1 beq done @ test if finished ldr r1, [r2], #4 @ put next number in R1 cmp r0, r1 @ if R0 contains the larger movlt r0, r1 @ otherwise overwrite R0 b loop @ continue done: mov pc, lr @ return from subroutine
Does this work? • Instead of computing the larger number by itself, it may call max2 in Example #1 to find the larger number .text .align 2 .global maxn maxn: mov r2, r0 mov r3, r1 ldr r0, [r2], #4 loop: subs r3, r3, #1 @ reduce the count by 1 beq done @ test if finished ldr r1, [r2], #4 @ put next number in R1 bl max2 @ call max2 to find the larger b loop @ continue done: mov pc, lr @ return from subroutine
Calling Another Function • Be careful with the registers used in a function, especially the link register! .text .align 2 .global maxn maxn: mov r2, r0 mov r3, r1 mov r5, lr @ save the link register ldr r0, [r2], #4 loop: subs r3, r3, #1 @ reduce the count by 1 beq done @ test if finished ldr r1, [r2], #4 @ put next number in R1 bl max2 @ call max2 to find the larger b loop @ continue done: mov lr, r5 @ restore the link register mov pc, lr @ return from subroutine
Example #3: Computing Factorial #include <stdio.h> extern int factor(int n); int main() { int n = 7; printf("The factorial of %d is %d\n", n, factor(n)); } .text .align 2 .global factor factor: stmfd sp!, {r0, lr} @ push register on stack subs r0, r0, #1 moveq r0, #1 @ (n-1)! = 1 if n-1 == 0 blne factor @ compute (n-1)! if n-1 != 0 ldmfd sp!, {r1, lr} @ pop registers from stack mul r0, r0, r1 @ compute n! = n * (n-1)! done: mov pc, lr @ return from subroutine
Assembly codes for if-statements if cond then t1 = cond if not t1 goto else_label then_statements codes for then_statements goto endif_label else else_label: else_statements codes for else_statements end if; endif_label:
Assembly codes for else-if parts • For each alternative, place in code the current else_label, and generate a new one. if cond then s1 t1 = cond1 if not t1 goto else_label1 codes for s1 goto endif_label else if cond2 then s2 else_label1: t2 = cond2 if not t2 goto else_label2 codes for s2 goto endif_label else s4 else_label2: codes for s4 end if; endif_label:
Assembly codes for while loops • Create two labels: start_loop, end_loop while (cond) { start_loop: if (!cond) goto end_loop s1; codes for s1 if (cond2) break; if (cond2) goto end_loop s2; codes for s2 if (cond3) continue; if (cond3) goto start_loop: s3; codes for s3 }; goto start_loop end_loop:
Assembly codes for numeric loops • Semantics: loop not executed if range is null, so must test before first pass. for J in expr1..expr2 loop J = expr1 start_label: if J > expr2 goto end_label S1 codes for S1 end loop; J = J + 1 goto start_label end_label:
Codes for short-circuit expressions • Short-circuit expressions are treated as control structures • if B1 or else B2 then S1… -- if (B1 || B2) { S1.. if B1 goto then_label if not B2 goto else_label then_label: codes for S1 goto endif_label else_label: • Inherit target labels from enclosing control structure • Create additional labels for composite short-circuits
Assembly codes for case statements • If range is small and most cases are defined, create jump table as array of code addresses, and generate indirect jump. table label1, label2 … case x is jumpi x table when up: y := 0; label1: y = 0 goto end_case when down : y := 1; label2: y = 1 goto end_case end case; end_case:
Outline • Coprocessor and Thumb Instructions • Assembly Language • Runtime Environment
Runtime Environment • To understand the environment in which your final output will be running. • How a program is laid out in memory: • Code • Data • Stack • Heap • How function callers and callees pass info
High memory Runtime stack (not to scale) Dynamic data (heap) Global data Static data Code Low memory Executable Layout in Memory
stack heap globl static code Overall Program Layout • From low memory up: • Code (text segment, instructions) • Static (constant) data • Global data • Dynamic data (heap) • Runtime stack (procedure calls) • Review of what’s in each section:
Text Segment (Executable Code)1 • Actual machine instructions • Arithmetic / logical • Comparison • Branch (short distances) • Jump (long distances) • Load / store • Data movement • Constant manipulation (immediate)
Text Segment (Executable Code)2 • Code segment write-protected, so running code can’t overwrite itself. • (Debugger can overwrite it.) • You’ll create the precursor for the code in this segment by emitting assembly code. • Assembler will build final text.
Data Segment1 • Data Objects • Whose size is known at compile time • Whose lifetime is the full run of the program (not just during a function invocation) • Static data includes things that won’t change (can be write-protected): • Virtual-function dispatching tables • String literals used in instructions • Arithmetic literals could be, but more likely incorporated into instructions.
Data Segment2 • Global data (other than static) • Variables declared global • Local variables declared static (in C) • Declared local to a function. • Retain values even between invocations of that function (lifetime is whole run). • Semantic analysis ensures that static locals are not referenced outside their function scope.
Dynamic Data (Heap)1 • Data created by malloc or New. • Heap data lives until deallocated or until program ends. (Sometimes longer than you want, if you lose track of it.) • Garbage collection / reference counting are ways of automatically de-allocating dead storage in the heap.
*p3 *p2 *p4 0x1000000 *p1 Dynamic Data (Heap)2 • Heap allocation starts at bottom of heap (lower addresses) and allocates upward. • Requirements of alignment, specifics of allocation algorithm may cause storage to be allocated out of (address) order. p1 = new Big(); p2 = new Medium(); p3 = new Big(); p4 = new Tiny(); • So (int)p2 > (int)p1 • But (int)p4 < (int)p3 • Compare pointers for equality, not < or >.
Runtime Stack1 • Data used for function invocation: • Variables declared local to functions (including main) aka “automatic” data. • Except for statics (in data segment) • Variables declared in anonymous blocks inside function. • Arguments to function (passed by caller). • Temporaries used by generated code (not representing names in source). • Possibly value returned by callee to caller.
Runtime Stack2 • Types of data that can be allocated on runtime stack: • In C, all kinds of data: simple types, structs, arrays. • C++: stack can hold objects declared as class type, as well as pointer type. • Some languages don’t allow arrays on stack.
Top Base Stack Terminology1 A stack is an abstract data type. Push new value onto Top; pop value off Top. Higher elements are more recent, lower elements are older.
Stack Terminology2 • Stack implementation can grow any direction. • MIPS stack grows downward (from higher memory addresses to lower). • Possible difficulty with terminology. • Some people (and documents) talk about going “up” and “down” the stack. • Some use the abstraction, where “up” means “more recent”, towards Top. • Some (including gdb) say “up” meaning “towards older entries”, toward Base.