430 likes | 531 Views
Assemblers and Linkers. Professor Jennifer Rexford COS 217. Goals of This Lecture. Compilation process Compile, assemble, archive, link, execute Assembling Representing instructions Prefix, opcode, addressing modes, operands Translating labels into memory addresses
E N D
Assemblers and Linkers Professor Jennifer Rexford COS 217
Goals of This Lecture • Compilation process • Compile, assemble, archive, link, execute • Assembling • Representing instructions • Prefix, opcode, addressing modes, operands • Translating labels into memory addresses • Symbol table, and filling in local addresses • Connecting symbolic references with definitions • Relocation records • Specifying the regions of memory • Generating sections (data, BSS, text, etc.) • Linking • Concatenating object files • Patching references
Compilation Pipeline .c • Compiler (gcc): .c .s • High-level to assembly language • Assembler (as): .s .o • Assembly to machine language • Archiver (ar): .o .a • Object files into single library • Linker (ld): .o + .a a.out • Builds an executable file • Execution (execlp) • Loads executable and starts it Compiler .s Assembler .o Archiver .a Linker/Loader a.out Execution
Assembler • Assembly language • A symbolic representation of machine instructions • Machine language • Contains everything needed to link, load, and execute the program • Assembler • Translates assembly language into machine language • Translate instruction mnemonics into op-codes • Translate symbolic names for memory locations • Stores the result in an object file (.o)
General IA32 Instruction Format • Prefixes: we won’t worry about these for now • Opcode • ModR/M and SIB (scale-index-base): for memory operands • Displacement and immediate: depending on opcode, ModR/M and SIB • Note: byte order is little-endian (low-order byte of word at lower addresses) Instruction prefixes Opcode ModR/M SIB Displacement Immediate Up to 4prefixes of 1 byte each (optional) 1, 2, or 3 byteopcode 1 byte (if required) 1 byte (if required) 0, 1, 2, or 4 bytes 0, 1, 2, or 4 bytes 7 6 5 3 2 0 7 6 5 3 2 0 Mod Reg/Opcode R/M Scale Index Base
Example: Push on to Stack • Assembly language:pushl %edx • Machine code: • IA32 has a separate opcode for push for each register operand • 50: pushl %eax • 51: pushl %ecx • 52: pushl %edx • … • Results in a one-byte instruction • Observe: sometimes one assembly language instruction can map to a group of different opcodes 0101 0010
Example: Load Effective Address • Assembly language:leal (%eax,%eax,4), %eax • Machine code: • Byte 1: 8D (opcode for “load effective address”) • Byte 2: 04 (dest %eax, with scale-index-base) • Byte 3: 80 (scale=4, index=%eax, base=%eax) 1000 1101 0000 0100 1000 0000 Load the address %eax + 4 * %eax into register %eax
4 4 4 4 8 2 2 3 3 3 3 1000 1000 1000 1000 00001100 01 11 011 011 001 001 Mod=01 table [EAX]+disp8 0 [ECX]+disp8 1 [EDX]+disp8 2 [EBX]+disp8 3 [--][--]+disp8 4 [EBP]+disp8 5 [ESI]+disp8 6 [EDI]+disp8 7 movl 12(%ecx), %ebx 12 ebx (ecx) modedisp8(%_), %_ Example: Movl (Opcode 44) ModR/M SIB Mod=11 table EAX 0 ECX 1 EDX 2 EBX 3 ESP 4 EBP 5 ESI 6 EDI 7 Instruction prefixes Opcode Displacement Immediate M o d reg R/M B I S movl %ecx, %ebx ebx ecx mov r/m32,r32 mode%_, %_ Reference: IA-32 Intel Architecture Software Developer’s Manual, volume 2, page 2-1, page 2-6, and page 3-441
4 4 8 32 2 3 3 01100001 00000000000000000000001111100111 1100 0110 00 000 101 Example: Mov Immediate to Memory ModR/M SIB Instruction prefixes Opcode Displacement Immediate M o d reg R/M S I B mov r/m8,imm8 movb $97, 999 999 97 modedisp32 Mod=00 table [EAX] 0 [ECX] 1 [EDX] 2 [EBX] 3 [--][--] 4 disp32 5 [ESI] 6 [EDI] 7
4 4 8 32 2 3 3 01100001 00000000000000000000001111100111 1100 0110 00 000 101 Encoding as Byte String ModR/M SIB Instruction prefixes Opcode Displacement Immediate M o d reg R/M S I B movb $97, 999 C6 05 E7 03 00 0061 little-endian
4 4 8 32 2 3 3 00000000000000000000001111100111 01100001 1100 0110 00 000 101 Assembly Language movb $97, 999 C6 05 E7 03 00 00 61 char grade = 67; … grade = ‘a’; .globl grade .data grade: .byte 67 .text . . . movb $97, grade . . . located at address 999
.text ... movl count, %eax ... .data count: .word 0 ... .globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmp loop done: Symbol Manipulation Create labels and remember their addresses Deal with the “forward reference problem”
Dealing with Forward References • Most assemblers have two passes • Pass 1: symbol definition • Pass 2: instruction assembly • Or, alternatively, • Pass 1: instruction assembly • Pass 2: patch the cross-reference
symbol table instruction data section instruction text section instruction bss section instruction Implementing an Assembler .s file .o file input assemble output disk in memory structure in memory structure disk
symbol table instruction data section instruction text section instruction bss section instruction Input Functions • Read assembly language and produce list of instructions .s file .o file input assemble output
Input Functions • Lexical analyzer • Group a stream of characters into tokens add %g1 , 10 , %g2 • Syntactic analyzer • Check the syntax of the program <MNEMONIC><REG><COMMA><REG><COMMA><REG> • Instruction list producer • Produce an in-memory list of instruction data structures instruction instruction
... loop: cmpl%edx, %eax jge done pushl%edx call foo jmp loop done: D 0 3 9 Instruction Assembly [0] 1 byte [2] disp? 7 D 4 bytes [4] 5 2 [5] disp? E 8 [10] disp? E 9 [15] How to compute the address displacements?
.globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmp loop done: type def loop label 0 address D 0 3 9 Symbol Table loop done disp_s done 2 disp_l foo 5 disp_l loop 10 def done 15 [0] [2] disp_s 7 D [4] 5 2 [5] disp_l E 8 [10] disp_l E 9 [15]
.globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmploop done: type def loop label 0 address D 0 3 9 Symbol Table loop done disp_s done 2 disp_l foo 5 disp_l loop 10 def done 15 [0] [2] disp_s 7 D [4] 5 2 [5] disp_l E 8 [10] disp_l E 9 [15]
.globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmploop done: type def loop label 0 address D 0 3 9 Symbol Table loop done disp_s done 2 disp_l foo 5 disp_l loop 10 def done 15 [0] [2] disp_s 7 D +13 [4] 5 2 [5] disp_l E 8 [10] disp_l E 9 [15]
.globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmploop done: type def loop label 0 address D 0 3 9 Symbol Table loop done disp_s done 2 disp_l foo 5 disp_l loop 10 def done 15 [0] [2] disp_s 7 D +13 [4] 5 2 [5] disp_l E 8 [10] disp_l E 9 [15]
.globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmploop done: type label address D 0 3 9 Filling in Local Addresses loop def loop 0 done disp_s done 2 disp_l foo 5 disp_l loop 10 def done 15 [0] [2] +13 7 D [4] 5 2 [5] disp_l E 8 [10] -10 E 9 [15]
.globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmploop done: type label address D 0 3 9 Filling in Local Addresses loop def loop 0 done disp_s done 2 disp_l foo 5 disp_l loop 10 def done 15 [0] [2] +13 7 D [4] 5 2 [5] disp_l E 8 [10] -10 E 9 [15]
.globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmploop done: type label address D 0 3 9 Filling in Local Addresses loop def loop 0 done disp_s done 2 disp_l foo 5 disp_l loop 10 def done 15 [0] [2] +13 7 D [4] 5 2 [5] disp_l E 8 [10] -10 E 9 [15]
... .globl loop loop: cmpl %edx, %eax jge done pushl %edx call foo jmploop done: D 0 3 9 Relocation Records def loop 0 disp_l foo 5 [0] +13 7 D [2] 5 2 [4] disp_l E 8 [5] -10 E 9 [10] [15]
Assembler Directives • Delineate segments • .section • Allocate/initialize data and bss segments • .word .half .byte • .ascii .asciz • .align .skip • Make symbols in text externally visible • .global
symbol table instruction data section instruction text section instruction bss section instruction Assemble into Sections • Process instructions and directives to produce object file output structures .s file .o file input assemble output
symbol table instruction data section instruction text section instruction bss section instruction Output Functions • Machine language output • Write symbol table and sections into object file .s file .o file input assemble output
optional for .o files ELF Header Program Hdr Table Section 1 . . . optional for a.out files Section n Section Hdr Table ELF: Executable and Linking Format • Format of .o and a.out files • Output by the assembler • Input and output of linker
output (also in “.o” format, but no undefined symbols) library (contains more .o files) Invoking the Linker • ld bar.o main.o –l libc.a –o a.out compiled program modules Invoked automatically by gcc, but you can call it directly if you like.
D 0 3 9 Multiple Object Files main.o bar.o start main 0 def foo 8 def loop 0 disp_l loop 15 disp_l foo 5 [0] [0] [2] [2] +13 7 D [4] [4] 5 2 [7] [5] disp_l E 8 [8] [10] -10 E 9 [12] [15] [15] disp_l [20]
D 0 3 9 Step 1: Pick An Order main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] disp_l E 8 [23] [10] -10 E 9 [27] [15] [30] disp_l [35]
D 0 3 9 Step 1: Pick An Order main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] disp_l E 8 [23] [10] -10 E 9 [27] [15] [30] disp_l [35]
D 0 3 9 Step 2: Patch main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] 15+8-5=18 E 8 [23] [10] -10 E 9 [27] [15] [30] 0-(15+15)=-30 disp_l [35]
D 0 3 9 Step 2: Patch main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] 15+8-5=18 E 8 [23] [10] -10 E 9 [27] [15] [30] 0-(15+15)=-30 disp_l [35]
D 0 3 9 Step 2: Patch main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] 15+8-5=18 E 8 [23] [10] -10 E 9 [27] [15] [30] 0-(15+15)=-30 disp_l [35]
D 0 3 9 Step 2: Patch main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] 15+8-5=18 E 8 [23] [10] -10 E 9 [27] [15] [30] 0-(15+15)=-30 disp_l [35]
D 0 3 9 Step 2: Patch main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] 15+8-5=18 E 8 [23] [10] -10 E 9 [27] [15] [30] 0-(15+15)=-30 [35]
D 0 3 9 Step 2: Patch main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] 15+8-5=18 E 8 [23] [10] -10 E 9 [27] [15] [30] 0-(15+15)=-30 [35]
D 0 3 9 Step 2: Patch main.o bar.o start main 15+0 def foo 15+8 def loop 0 disp_l loop 15+15 disp_l foo 5 [15] [0] [17] [2] +13 7 D [19] [4] 5 2 [22] [5] 15+8-5=18 E 8 [23] [10] -10 E 9 [27] [15] [30] 0-(15+15)=-30 [35]
Step 3: Concatenate a.out start main 15+0 [0] D 0 3 9 [2] +13 7 D [4] 5 2 [5] +18 E 8 [10] -10 E 9 [15] [17] [19] [22] [23] [27] [30] -30 [35]
Summary • Assember • Read assembly language • Two-pass execution (resolve symbols) • Produce object file • Linker • Order object codes • Patch and resolve displacements • Produce executable