970 likes | 1.24k Views
嵌入式處理器架構與 程式設計. 王建民 中央研究院 資訊所 2008 年 7 月. Contents. Introduction Computer Architecture ARM Architecture Development Tools GNU Development Tools ARM Instruction Set ARM Assembly Language ARM Assembly Programming GNU ARM ToolChain Interrupts and Monitor. Lecture 4 Development Tools.
E N D
嵌入式處理器架構與程式設計 王建民 中央研究院 資訊所 2008年 7月
Contents • Introduction • Computer Architecture • ARM Architecture • Development Tools • GNU Development Tools • ARM Instruction Set • ARM Assembly Language • ARM Assembly Programming • GNU ARM ToolChain • Interrupts and Monitor
Outline • Compilers • Assemblers • Linkers and Loaders • Runtime Environment
What is a compiler? • A program translator • Source language • E.g., C, C++, Java, Pascal • Target language • E.g., assembly language for x86, MIPS, ARM
Historical Background • Machine language first • 1957: First FORTRAN compiler • 18 programmer-years of effort • Extremely ad hoc • Today’s techniques were created in response to the difficulties of implementing early compilers
Phases of a Compiler • Analysis (“front end”) • Lexical Analysis • Syntax Analysis • Semantic Analysis • Synthesis (“back end”) • Intermediate Code Generation • Intermediate Code Optimization • Target Code Generation/Optimization • Front & back ends share symbol table
Lexical Analysis • Aka “scanning”, transform characters into tokens • Example: TDOUBLE (“double”)TIDENT (“f”)TOP (“=“)TIDENT (“sqrt”) TLPAREN (“(“) TOP (“-”)TINTCONSTANT (“1”)TRPAREN (“)”)TSEP (“;”) double f = sqrt(-1);
Syntax Analysis • Aka “parsing” • Uses context-free grammars • Structural validation • Creates parse tree or derivation
Derivation of “sqrt(-1)” Expression -> FuncCall -> TIDENT TLPAREN Expression TRPAREN -> TIDENT TLPAREN UnaryExpression TRPAREN -> TIDENT TLPAREN TOP Expression TRPAREN -> TIDENT TLPAREN TOP TINTCONSTANT TRPAREN Expression -> UnaryExpressionExpression -> FuncCallExpression -> TINTCONSTANTUnaryExpression -> TOP ExpressionFuncCall -> TIDENT TLPAREN Expression TRPAREN
Parse Tree of “sqrt(-1)” Expression FuncCall Expression UnaryExpression Expression TIDENT TLPAREN TOP TINTCONSTANT TRPAREN
Semantic Analysis • “Does it make sense”? • Checking semantic rules, such as • Is variable declared? • Are operand types compatible? • Do function arguments match function declarations? • Types
Intermediate Code Generation • A program for an abstract machine • Requirements • Easy to generate from parse tree • Easy to translate into target code • A variety of forms • Quadruple or three-address code • Register transfer language
Intermediate Code Example • Three-address code (TAC) j = 2 * i + 1;if (j >= n) j = 2 * i + 3;return a[j]; t1 = 2 * i t2 = t1 + 1 j = t2 t3 = j < n if t3 goto L0 t4 = 2 * i t5 = t4 + 3 j = t5L0: t6 = a[j] return t6
Intermediate Code Optimization • Inhibiting code generation of unreachable code segments • Getting rid of unused variables • Eliminating multiplication by 1 and addition by 0 • Loop optimization • Common sub-expression elimination • . . ., etc.
Code Optimization Example Before After t1 = 2 * i t2 = t1 + 1 j = t2 t3 = j < n if t3 goto L0 t4 = 2 * i t5 = t4 + 3 j = t5L0: t6 = a[j] return t6 t1 = 2 * i j = t1 + 1 t3 = j < n if t3 goto L0 j = t1 + 3L0: t6 = a[j] return t6
delayed branch Target Code Generation • Example: a in %o0, i in %o1, n in %o2, j in %g2 t1 = 2 * i j = t1 + 1 t3 = j < n if t3 goto L0 j = t1 + 3L0: t6 = a[j] return t6 sll %o1, 1, %o1 add %o1, 1, %g2 cmp %g2, %o2 blt .LL3 nop add %o1, 3, %g2.LL3: sll %g2, 2, %g2 retl ld [%o0+%g2], %o0
Pascal Example: Source Code PROGRAM STATS VAR SUM,SUMSQ,I,VALUE,MEAN,VARIANCE : INTEGER BEGIN SUM := 0; SUMSQ := 0; FOR I := 1 TO 100 DO BEGIN READ(VALUE); SUM := SUM + VALUE; SUMSQ := SUMSQ + VALUE * VALUE END; MEAN := SUM DIV 100; VARIANCE := SUMSQ DIV 100 – MEAN * MEAN; WRITE(MEAN,VARIANCE) END.
TokenCode PROGRAM 1 VAR 2 BEGIN 3 END 4 END. 5 INTEGER 6 FOR 7 READ 8 WRITE 9 TO 10 DO 11 TokenCode ; 12 : 13 , 14 := 15 + 16 - 17 * 18 DIV 19 ( 20 ) 21 id 22 int 23 Pascal Example: Token Coding
LineToken typeToke specifier 1 1 22 ^STATS 2 2 3 22 ^SUM 14 22 ^SUMSQ 14 22 ^I 14 22 ^VALUE 14 22 ^MEAN 14 22 ^VARIANCE 13 6 4 3 5 22 ^SUM 15 LineToken typeToke specifier 23 #0 12 6 22 ^SUMSQ 15 23 #0 12 7 7 22 ^I 15 23 #1 10 23 #100 11 8 3 9 8 20 22 ^VALUE 21 12 Scanner Output: Token Stream I
LineToken typeToke specifier 10 22 ^SUM 15 22 ^SUM 16 22 ^VALUE 12 11 22 ^SUMSQ 15 22 ^SUMSQ 16 22 ^VALUE 18 22 ^VALUE 12 4 12 13 22 ^MEAN 15 22 ^SUM 19 LineToken typeToke specifier 23 #100 12 14 22 ^VARIANCE 15 22 ^SUMSQ 19 23 #100 17 22 ^MEAN 18 22 ^MEAN 12 15 9 20 22 ^MEAN 14 22 ^VARIANCE 21 16 5 Scanner Output: Token Stream II
Pascal Example: BNF Grammar 1 <prog> ::= PROGRAM <prog-name> VAR <decl-list> BEGIN <stmt-list> END. 2 <prog-name> ::= id 3 <decl-list> ::= <dec> | <decl-list> ; <dec> 4 <dec> ::= <id-list> : <type> 5 <type> ::= INTEGER 6 <id-list> ::= id | <id-list> , id 7 <stmt-list> ::= <stmt> | <stmt-list> ; <stmt> 8 <stmt> ::= <assign> | <read> | <write> | <for> 9 <assign> ::= id := <exp> 10 <exp> ::= <term> | <exp> + <term> | <exp> - <term> 11 <term> ::= <factor> | <term> * <factor> | <term> DIV <factor> 12 <factor> ::= id | int | ( <exp> ) 13 <read> ::= READ ( <id-list> ) 14 <write> ::= WRITE ( <id-list> ) 15 <for> ::= FOR <index-exp> DO <body> 16 <index-exp> ::= id := <exp> TO <exp> 17 <body> ::= <stmt> | BEGIN <stmt-list> END
Compiler Issues • Symbol Table Management • Scoping • Error Handling & Recovery • Passes • One-pass vs. multi-pass • Most compilers are one-pass up to code optimization phase • Several passes are usually required for code optimization
source code tokens parse tree IR End-to-End Compilation Lexical analysis Syntax analysis Semantic analysis AST IR gen. IRoptimization Target code generation IR Assembly code object code Linker & Loader executable code Assembler Execution
Outline • Compilers • Assemblers • Linkers and Loaders • Runtime Environment
C versus Assembly Language • C is called a “portable assembly language” • Allows low level operations on bits and bytes • Allows access to memory via use of pointers • Integrates well with assembly language functions • Advantages over assembly code • Easier to read and understand source code • Requires fewer lines of code for same function • Doesn’t require knowledge of the hardware
C versus Assembly Language • Good reasons for learning assembly language • In time-critical sections of code, it is possible to improve performance with assembly language • It is a good way to learn how a processor works • In writing a new operating system or in porting an existing system to a new machine, there are sections of code which must be written in assembly language
Best of Both Worlds • Integrating C and assembly code • Convenient to let C do most of the work and integrate with assembly code where needed • Make our gas routines callable from C • Use C compiler conventions for function calls • Preserve registers that C compiler expects saved
GNU vs. Intel Assembler1 • In this course, we will be using the GNU assembler, referred to as “gas” • Available on UNIX machines as “i386-as” • The GNU assembler uses the AT&T syntax (instead of official Intel/Microsoft syntax) • Text is written using Intel assembly language syntax which is not the same as GNU syntax • Local references will provide gas notes for the text sections with Intel assembly language
GNU vs. Intel Assembler2 • Overall, the follow are the key differences between the Intel and the gas syntax: • The GNU operation codes have a size indicator that is not present on the Intel operation codes • Intel: MOV gas: movb, movw, or movl • The GNU operands are in the opposite order from the Intel operands • Intel: MOV dest, source gas: movb source, dest
GNU vs. Intel Assembler3 • The GNU register names are preceded by a % that is not present on the Intel register names • Intel: MOV AH, AL gas: movb %al, %ah • GNU constants are represented differently from the Intel constant representations • Intel: MOV AL, 0AH gas: movb $0xa, %al • Comments are indicated with # instead of ; • Intel: ; comment here gas: # comment here
GNU vs. Intel Assembler4 • You should familiarize yourself with both Intel and GNU assembly language syntax • Even for GNU assembler, the syntax may not be the same on different platforms. • You may need to use Intel syntax in your professional work someday
The Four Field Format1 • The Label Field • A label is a symbol followed by : • Can be referred to as a representation of the address • The ‘Opcode’ Field • Mnemonic to specify the instruction and size • Unnecessary to remember instruction code values • Directives to guide the work of the assembler • In GNU assembly language, directive begins with .
The Four Field Format2 • The Operand Field(s) • On which the instruction operates • Zero, one, or two operands depending on the instruction • The Comment Field • Comment contains documentation • It begins with a # anywhere and goes to the end of the line
Symbolic Constants • Allow use of symbols for numeric values • Perform same function as #define in C • Format is: SYMBOL = value • Example: NCASES = 8 movl $NCASES, %eax
Assembly Coding for a C Function • General form for a C function in assembly: .globl _mycode .text _mycode: . . . ret .data _mydata: .long 17 .end
Assembler Directives1 • Defining a label for external reference (call) .globl _mycode • Defining code section of program .text • Defining data section of program .data • End of the Assembly Language .end
Assembler Directives2 • Defining / initializing storage locations: .long 0x12345678 # 32 bits .word 0x1234 # 16 bits .byte 0x12 # 8 bits • Defining / initializing a string .ascii “Hello World\n\0” .asciz “Hello World\n”
C Function Coding Conventions • Same function name as used in the calling C program except with a leading _ • Use only %eax, %ecx, and %edx to avoid using registers that the C compiler expects to be preserved across a call and return • Save/restore other registers on stack as needed • Return value in %eax before “ret” instruction
Example #1: Sum Two Numbers • C “driver” to execute sum2.s is called sum2c.c extern int sum2(void); int main(void) { printf(“Sum2 returned %d\n”, sum2()); return 0; }
Example #1: Assembly Code • Assembly code for sum2.s #sum2.s -- Sum of two numbers .text .globl _sum2 _sum2: movl $8, %eax addl $3, %eax ret #number in eax
How to pass parameters? • How would you modify the source code so that sum2(int a, int b) returns the sum of two integer parameters a and b? extern int sum2(int a,int b); int main(void) { printf(“3 + 8 = %d\n”, sum2(3,8)); return 0; }
Addressing Memory and I/O • With gas, we’ve already seen operands with: • % for registers (part of the processor itself) • $ for immediate data (part of the instruction itself) • Accessing memory versus I/O Memory I/O Read movb address, %al inb address, %al Write movb %al, address outb %al, address
Addressing Memory1 • Direct addressing for memory • Intel uses [ ] gas does not use any “operator” • Example: .text movl %eax, 0x1234 movl 0x1234, %edx . . .