290 likes | 483 Views
IKI10230 Pengantar Organisasi Komputer Kuliah no. 09: Compiling-Assembling-Linking. Sumber : 1. Paul Carter, PC Assembly Language 2. Hamacher. Computer Organization , ed-5 3. Materi kuliah CS61C/2000 & CS152/1997, UCB. 21 April 2004
E N D
IKI10230Pengantar Organisasi KomputerKuliah no. 09: Compiling-Assembling-Linking Sumber:1. Paul Carter, PC Assembly Language2. Hamacher. Computer Organization, ed-53. Materi kuliah CS61C/2000 & CS152/1997, UCB 21 April 2004 L. Yohanes Stefanus (yohanes@cs.ui.ac.id)Bobby Nazief (nazief@cs.ui.ac.id) bahan kuliah: http://www.cs.ui.ac.id/kuliah/POK/
lib.o Steps to Starting a Program C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o Linker Executable(mach lang pgm): foo.exe Loader Memory
Example: C Asm Obj Exe Run #include <stdio.h> int main (int argc, char *argv[]) { int i; int sum = 0; for (i = 0; i <= 100; i = i + 1) sum = sum + i * i; printf ("The sum from 0 .. 100 is %d\n", sum); }
Compiler • Input: High-Level Language Code (e.g., C, Java) • Output: Assembly Language Code(e.g., Intel x86) • Note: Output may contain directives & pseudoinstructions
Example: C Asm Obj Exe Run L5: inc dword [ebp-4] jmp L3 L4: add esp,-8 mov eax,[ebp-8] push eax push dwordLC0 call _printf add esp,16 L2: mov esp,ebp pop ebp ret segment .text LC0: db "The sum from 0 .. 100 is %d",0xa,0 _main: push ebp mov ebp,esp sub esp,24 mov dword [ebp-8],0 mov dword [ebp-4],0 L3: cmp dword [ebp-4],100 jle L6 jmp L4 L6: mov eax,[ebp-4] imul eax,[ebp-4] add [ebp-8],eax
lib.o Where Are We Now? C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o Linker Executable(mach lang pgm): a.out Loader Memory
Assembler • Reads and Uses Directives • Replace Pseudoinstructions • Produce Machine Language • Creates Object File
Producing Machine Language • Simple Case • Arithmetic, Logical, Shifts, and so on. • All necessary info is within the instruction already. • What about Branches? • PC-Relative • So once pseudoinstructions are replaced by real ones, we know by how many instructions to branch. • What about jumps? • Some require absolute address. • What about references to data? • These will require the full 32-bit address of the data. • Addresses can’t be determined yet, so we create two tables…
Symbol Table • List of “items” in this file that may be used by other files. • What are they? • Labels: function calling • Data: anything in the .data section; variables which may be accessed across files • First Pass: record label-address pairs • Second Pass: produce machine code • Result: can jump to a later label without first declaring it
Relocation Table • List of “items” for which this file needs the address. • What are they? • Any label jumped to: jmp or call • internal • external (including lib files) • Any piece of data
Object File Format • object file header: size and position of the other pieces of the object file • text segment: the machine code • data segment: binary representation of the data in the source file • relocation information: identifies lines of code that need to be “handled” • symbol table: list of this file’s labels and data that can be referenced • debugging information
Example: C Asm Obj Exe Run 0x4c: inc dword [ebp-4] jmp 0xffffffe0 (0x34) 0x54: add esp,-8 mov eax,[ebp-8] push eax push 0x0 call 0x0 add esp,16 0x6e: mov esp,ebp pop ebp ret segment .text 0x0: db "The sum from 0 .. 100 is %d",0xa,0 0x1d: push ebp mov ebp,esp sub esp,24 mov dword [ebp-8],0 mov dword [ebp-4],0 0x34: cmp dword [ebp-4],100 jle 0x05 (0x42) jmp 0x00000012 (0x54) 0x42: mov eax,[ebp-4] imul eax,[ebp-4] add [ebp-8],eax
Symbol Table Entries • Symbol Table • Label Address LC0: 0x00000000 main: 0x0000001d L3: 0x00000034 L6: 0x00000042 L5: 0x0000004c L4: 0x00000054 L2: 0x0000006e • Relocation Information • Offset Type Value 0x0000005f dir32 .text (LC0: offset 0 of .text segment) 0x00000064 DISP32 _printf
lib.o Where Are We Now? C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o Linker Executable(mach lang pgm): a.out Loader Memory
Link Editor/Linker • Step 1: Take text segment from each .o file and put them together. • Step 2: Take data segment from each .o file, put them together, and concatenate this onto end of text segments. • Step 3: Resolve References • Go through Relocation Table and handle each entry • That is, fill in all absolute addresses
Four Types of Addresses • PC-Relative Addressing (beq, bne): never relocate • Absolute Address (jmp, call): always relocate • External Reference (usually call): always relocate • Data Reference: always relocate
Resolving References • Linker assumes first word of first text segment is at address 0x00000000. • Linker knows: • length of each text and data segment • ordering of text and data segments • Linker calculates: • absolute address of each label to be jumped to (internal or external) and each piece of data being referenced • To resolve references: • search for reference (data or label) in all symbol tables • if not found, search library files (for example, for printf) • once absolute address is determined, fill in the machine code appropriately • Output of linker: executable file containing text and data (plus header)
Example: C Asm Obj Exe Run 0x160c: inc dword [ebp-4] jmp 0xe0 (0x15f4) 0x1614: add esp,-8 mov eax,[ebp-8] push eax push 0x000015c0 call 0x00001778 (0x2da0)* add esp,16 0x162e: mov esp,ebp pop ebp ret *0x1628 + 0x1778 = 0x2da0 segment .text 0x15c0: db "The sum from 0 .. 100 is %d",0xa,0 0x15dd: push ebp mov ebp,esp sub esp,24 mov dword [ebp-8],0 mov dword [ebp-4],0 0x15f4: cmp dword [ebp-4],100 jle 0x05 (0x1602) jmp 0x12 (0x1614) 0x1602: mov eax,[ebp-4] imul eax,[ebp-4] add [ebp-8],eax
00000000 ... 000015C0 00001631 ... 0000B000 ... 0000BB04 Peta Memori .EXE Obj lainnya Foo.o .text Obj lainnya (..., _printf, ...) .data
lib.o Where Are We Now? C program: foo.c Compiler Assembly program: foo.s Assembler Object(mach lang module): foo.o Linker Executable(mach lang pgm): a.out Loader Memory
Loader (1/3) • Executable files are stored on disk. • When one is run, loader’s job is to load it into memory and start it running. • In reality, loader is the operating system (OS) • loading is one of the OS tasks
Loader (2/3) • So what does a loader do? • Reads executable file’s header to determine size of text and data segments • Creates new address space for program large enough to hold text and data segments, along with a stack segment • Copies instructions and data from executable file into the new address space (this may be anywhere in memory)
Loader (3/3) • Copies arguments passed to the program onto the stack • Initializes machine registers • Most registers cleared, but stack pointer assigned address of 1st free stack location • Jumps to start-up routine that copies program’s arguments from stack to registers and sets the PC • If main routine returns, start-up routine terminates program with the exit system call
Example: C Asm Obj Exe Run 0x000015c0:0x206568540x206d75730x6d6f72660x2e203020 0x000015d0:0x3031202e0x736920300x0a6425200xe5895500 0x000015e0:0x0018ec810x45c700000x000000f80xfc45c700 0x000015f0:0x000000000x64fc7d810x7e0000000x0012e905 0x00001600:0x458b00000x45af0ffc0xf84501fc0xe9fc45ff 0x00001610:0xffffffe00xfff8c4810x458bffff0xc06850f8 0x00001620:0xe80000150x000017780x0010c4810xec890000 0x00001630:0x0000c35d 0x000015c0:54 68 65 20 73 75 62 20 66 72 6f 6d 20 30 20 2e T h e s u m f r o m 0 . 000015dd: 55push ebp 000015de:89e5mov ebp,esp 000015e0: 81ec18000000sub esp,0x18 000015e6: c745f800000000mov [ebp-8],0 000015ed: c745fc00000000mov [ebp-4],0 000015f4: 817dfc64000000cmp [ebp-4],0x64 000015fb: 7e05jle 0x1602 000015fd: e912000000jmp 0x1614
.ASM, .O, & .EXE (FORMAT COFF)
Example: C Asm Obj Exe Run L6: movl -4(%ebp),%eax imull -4(%ebp),%eax addl %eax,-8(%ebp) L5: incl -4(%ebp) jmp L3 L4: addl $-8,%esp movl -8(%ebp),%eax pushl %eax pushl LC0 call _printf addl $16,%esp L2: movl %ebp,%esp popl %ebp ret .text LC0: .ascii "The sum from 0 .. 100 is %d\12\0" main: pushl %ebp movl %esp,%ebp subl $24,%esp movl $0,-8(%ebp) movl $0,-4(%ebp) L3: cmpl $100,-4(%ebp) jle L6 jmp L4
Example: C Asm Obj Exe Run 0x40: movl -4(%ebp),%eax imull -4(%ebp),%eax addl %eax,-8(%ebp) 0x4a: incl -4(%ebp) jmp -0x1b (0x34) 0x50: addl $-8,%esp movl -8(%ebp),%eax pushl %eax pushl0x0 call 0x0 (undefined) addl $16,%esp 0x64: movl %ebp,%esp popl %ebp ret .text 0x0: .ascii "The sum from 0 .. 100 is %d\12\0" 0x20: pushl %ebp movl %esp,%ebp subl $24,%esp movl $0,-8(%ebp) movl $0,-4(%ebp) 0x34: cmpl $100,-4(%ebp) jle 6 (0x40) jmp 0x14 (0x50)
Symbol Table Entries • Symbol Table • Label Address LC0: 0x00000000 L2: 0x00000064 L3: 0x00000034 L4: 0x00000050 L5: 0x0000004a L6: 0x00000040 main: 0x00000020 • Relocation Information • Address Instr. Type Dependency • 0x0000005c call printf
Example: C Asm Obj Exe Run 0x1600: movl -4(%ebp),%eax imull -4(%ebp),%eax addl %eax,-8(%ebp) 0x160a: incl -4(%ebp) jmp -0x1b (0x15f4) 0x1610: addl $-8,%esp movl -8(%ebp),%eax pushl %eax pushl0x15c0 call 0x2d90 addl $16,%esp 0x1624: movl %ebp,%esp popl %ebp ret .text 0x15c0: .ascii "The sum from 0 .. 100 is %d\12\0" 0x15e0: pushl %ebp movl %esp,%ebp subl $24,%esp movl $0,-8(%ebp) movl $0,-4(%ebp) 0x15f4: cmpl $100,-4(%ebp) jle 6 (0x1600) jmp 0x14 (0x1610)