240 likes | 384 Views
Computer Systems. Linking. GNU Compiler Collection. Compiler driver coordinates all steps in the translation and linking process. Invokes preprocessor ( cpp ), compiler ( cc1 ), assembler ( as ), and linker ( ld ). Passes command line arguments to appropriate phases.
E N D
Computer Systems Linking Computer Systems – linking
GNU Compiler Collection • Compiler driver coordinates all steps in the translation and linking process. • Invokes preprocessor (cpp), compiler (cc1), assembler (as), and linker (ld). • Passes command line arguments to appropriate phases bass> gcc -O2 -v -o p m.c a.c cpp [args] m.c /tmp/cca07630.i cc1 /tmp/cca07630.i m.c -O2 [args] -o /tmp/cca07630.s as [args] -o /tmp/cca076301.o /tmp/cca07630.s <similar process for a.c> ld -o p [system obj files] /tmp/cca076301.o /tmp/cca076302.o bass> Computer Systems – linking
Linking • Static linking • Dynamic linking • Application loading and linking Computer Systems – linking
What Does a Linker Do? • Merges object files • Merges multiple relocatable (.o) object files into a single executable object file • Resolves external references; reference to a symbol (function or global variable) defined in another object file. • Relocates symbols • Relocates symbols from their relative locations in the .o files to new absolute positions in the executable. • Updates all references to these symbols to reflect their new positions. • code: a(); /* reference to symbol a */ • data: int *xp=&x; /* reference to symbol x */ Computer Systems – linking
Why Linkers? • Modularity • Program can be written as a collection of smaller source files, rather than one monolithic mass. • Efficiency • Time: • Change one source file, compile, and then relink. • No need to recompile other source files. • Space: • Libraries of common functions can be aggregated • Executable files and running memory images contain only code for the functions they actually use. Computer Systems – linking
Executable and Linkable Format (ELF) • Standard binary format for object files • Derives from AT&T System V Unix • Later adopted by BSD Unix variants and Linux • One unified format for • Relocatable object files (.o), • Executable object files • Shared object files (.so) • Generic name: ELF binaries • Tools: file, readelf -h Computer Systems – linking
ELF header Program header table (required for executables) .text section .data section .bss section .symtab .rel.txt .rel.data .debug Section header table (required for relocatables) ELF Object File Format 0 • Elf header • Magic number, type (.o, exec, .so), machine, byte ordering, etc. • .text section • Code • .data section • Initialized data (!static global var) • .bss section • Uninitialized data (!static global var) • .rodata section • Read-only data (const var) Computer Systems – linking
Example C Program m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); } Computer Systems – linking
Merging Object Files into an Executable Relocatable Object Files Executable Object File system code 0 .text headers .data system data system code main() .text a() main() .text m.o more system code .data int e = 7 system data .data int e = 7 a() int *ep = &e .text int x = 15 .bss a.o .data int *ep = &e uninitialized data int x = 15 .symtab .debug .bss int y Computer Systems – linking
Ref to external symbol e Def of local symbol ep Defs of local symbols x and y Ref to external symbol a Def of local symbol a Refs of local symbols ep,x,y Definitions and External References • Symbols are lexical entities that name functions and variables. • Each symbol has a value (typically a memory address). • Code consists of symbol definitions and references. • References can be either localorexternal. m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); } Def of local symbol e Ref to external symbol exit (defined in libc.so) Computer Systems – linking
m.o Relocation Info m.c int e=7; int main() { int r = a(); exit(0); } Disassembly of main (source gdb): 0x80483e4 <main>: push %ebp 0x80483e5 <main+1>: mov %esp,%ebp 0x80483e7 <main+3>: sub $0x8,%esp 0x80483ea <main+6>: call 0x80483fc <a> 0x80483ef <main+11>: add $0xfffffff4,%esp 0x80483f2 <main+14>: push $0x0 0x80483f4 <main+16>: call 0x80482f0 <exit> Disassembly of section .rel.text: (source objdump –r) RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 00000007 R_386_PC32 a 00000011 R_386_PC32 exit Disassembly of section .text (source objdump –d m.o) 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: e8 fc ff ff ff call 7 <main+0x7> 7: R_386_PC32 a Computer Systems – linking
Executable m info Disassembly of section .text (source objdump –d m.o) 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: e8 fc ff ff ff call 7 <main+0x7> 7: R_386_PC32 a • ff ff ff fc (-4) translated into 0d (13) • 7 translated into 80483fc Disassembly of section .text (source objdump –d m) 80483e4: 55 push %ebp 80483e5: 89 e5 mov %esp,%ebp 80483e7: 83 ec 08 sub $0x8,%esp 80483ea: e8 0d 00 00 00 call 80483fc <a> Computer Systems – linking
Relocation algorithm typedef struct { int offset; /* offset of the reference to relocate */ int symbol:24, /* symbol the reference should point to */ type:8; /* relocation type */ } Elf32_Rel; foreach section s { foreach relocation entry r { if (r.type == R_386_PC32) { refaddr = ADDR(s) + r.offset; *refptr += ADDR(r.symbol) – refaddr; } } } 0xfffffffc →0xd 0x7 → 0x80483fc ADDR(S) = main: 0x80483e4 r.type = R_386_PC32 r.symbol = a: 0x80483fc r.offset 0x7 *refptr 0xfcfffff refaddr = 0x80483e4 + 0x7 = 0x80483eb // line after call *refptr = 0xfffffffc + 0x80483fc - 0x80483eb = 0xd PC <- PC + 0xd = 0x80483eb + 0xd = 0x80483fc // 13 lines more Computer Systems – linking
Loading Info 0xc0000000 LOAD off 0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12 filesz 0x0000047c memsz 0x0000047c flags r-x LOAD off 0x0000047c vaddr 0x0804947c paddr 0x0804947c align 2**12 filesz 0x00000114 memsz 0x00000130 flags rw- Loaded from the executable file Executable object file 0x40000000 ELF header Program header table (required for executables) Run-time heap (created by malloc) .init 0x080495ac 0x08048300 .text section Read/write segment (.data, .bss) .fini 0x0804947c Read-only segment (.init, .text, .rodata) .data section 0x08048300 .bss section Unused 0 Computer Systems – linking
How to Package Commonly Used Functions? Awkward, given the linker framework so far: • Option 1: Put all functions in a single source file • Programmers link big object file into their programs • Space and time inefficient • Option 2: Put each function in a separate source file • Programmers explicitly link appropriate binaries into their programs • More efficient, but burdensome on the programmer • Solution: static libraries (.aarchive files) • Concatenate related relocatable object files into a single file with an index (called an archive). • Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. Computer Systems – linking
Static Libraries • Static libraries have the following disadvantages: • Potential for duplicating lots of common code in the executable files on a filesystem. • e.g., every C program needs the standard C library • Potential for duplicating lots of code in the virtual memory space of many processes. • Minor bug fixes of system libraries require each application to explicitly relink Computer Systems – linking
Shared Libraries • Solution: • Shared libraries (dynamic link libraries, DLLs) whose members are dynamically loaded into memory and linked into an application at run-time. • Dynamic linking can occur when executable is first loaded and run. • Common case for Linux, handled automatically by ld-linux.so. • Dynamic linking can also occur after program has begun. • In Linux, this is done explicitly by user with dlopen(). • Basis for High-Performance Web Servers. • Shared library routines can be shared by multiple processes. Computer Systems – linking
Separate region for shared libraries memory invisible to user code kernel virtual memory stack %esp Memory mapped region for shared libraries Linux/x86 process memory image the “brk” ptr runtime heap (via malloc) uninitialized data (.bss) initialized data (.data) program text (.text) forbidden 0 Computer Systems – linking
The Complete Picture m.c a.c Translator Translator m.o a.o libwhatever.a Static Linker (ld) p libc.so libm.so Loader/Dynamic Linker (ld-linux.so) p’ Computer Systems – linking
Power Programmer • The order of linking is very important Computer Systems – linking
Strong and Weak Symbols • Program symbols are either strong or weak • strong: procedures and initialized globals • weak: uninitialized globals p1.c p2.c weak int foo=5; p1() { } int foo; p2() { } strong strong strong Computer Systems – linking
Linker’s Symbol Rules • Rule 1. A strong symbol can only appear once. • Rule 2. A weak symbol can be overridden by a strong symbol of the same name. • references to the weak symbol resolve to the strong symbol. • Rule 3. If there are multiple weak symbols, the linker can pick an arbitrary one. Computer Systems – linking
References to x will refer to the same initialized variable. Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. Linker Puzzles int x; p1() {} p1() {} Link time error: two strong symbols (p1) References to x will refer to the same uninitialized int. Is this what you really want? int x; p1() {} int x; p2() {} int x; int y; p1() {} double x; p2() {} Writes to x in p2 might overwrite y! Evil! int x=7; int y=5; p1() {} double x; p2() {} Writes to x in p2 will overwrite y! Nasty! int x=7; p1() {} int x; p2() {} Computer Systems – linking
Assignment • Problem 7.7: Modify bar5.c to print the correct values Computer Systems – linking