1 / 24

Computer Systems

Computer Systems. Linking. GNU Compiler Collection. Compiler driver coordinates all steps in the translation and linking process. Invokes preprocessor ( cpp ), compiler ( cc1 ), assembler ( as ), and linker ( ld ). Passes command line arguments to appropriate phases.

elke
Download Presentation

Computer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Systems Linking Computer Systems – linking

  2. GNU Compiler Collection • Compiler driver coordinates all steps in the translation and linking process. • Invokes preprocessor (cpp), compiler (cc1), assembler (as), and linker (ld). • Passes command line arguments to appropriate phases bass> gcc -O2 -v -o p m.c a.c cpp [args] m.c /tmp/cca07630.i cc1 /tmp/cca07630.i m.c -O2 [args] -o /tmp/cca07630.s as [args] -o /tmp/cca076301.o /tmp/cca07630.s <similar process for a.c> ld -o p [system obj files] /tmp/cca076301.o /tmp/cca076302.o bass> Computer Systems – linking

  3. Linking • Static linking • Dynamic linking • Application loading and linking Computer Systems – linking

  4. What Does a Linker Do? • Merges object files • Merges multiple relocatable (.o) object files into a single executable object file • Resolves external references; reference to a symbol (function or global variable) defined in another object file. • Relocates symbols • Relocates symbols from their relative locations in the .o files to new absolute positions in the executable. • Updates all references to these symbols to reflect their new positions. • code: a(); /* reference to symbol a */ • data: int *xp=&x; /* reference to symbol x */ Computer Systems – linking

  5. Why Linkers? • Modularity • Program can be written as a collection of smaller source files, rather than one monolithic mass. • Efficiency • Time: • Change one source file, compile, and then relink. • No need to recompile other source files. • Space: • Libraries of common functions can be aggregated • Executable files and running memory images contain only code for the functions they actually use. Computer Systems – linking

  6. Executable and Linkable Format (ELF) • Standard binary format for object files • Derives from AT&T System V Unix • Later adopted by BSD Unix variants and Linux • One unified format for • Relocatable object files (.o), • Executable object files • Shared object files (.so) • Generic name: ELF binaries • Tools: file, readelf -h Computer Systems – linking

  7. ELF header Program header table (required for executables) .text section .data section .bss section .symtab .rel.txt .rel.data .debug Section header table (required for relocatables) ELF Object File Format 0 • Elf header • Magic number, type (.o, exec, .so), machine, byte ordering, etc. • .text section • Code • .data section • Initialized data (!static global var) • .bss section • Uninitialized data (!static global var) • .rodata section • Read-only data (const var) Computer Systems – linking

  8. Example C Program m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); } Computer Systems – linking

  9. Merging Object Files into an Executable Relocatable Object Files Executable Object File system code 0 .text headers .data system data system code main() .text a() main() .text m.o more system code .data int e = 7 system data .data int e = 7 a() int *ep = &e .text int x = 15 .bss a.o .data int *ep = &e uninitialized data int x = 15 .symtab .debug .bss int y Computer Systems – linking

  10. Ref to external symbol e Def of local symbol ep Defs of local symbols x and y Ref to external symbol a Def of local symbol a Refs of local symbols ep,x,y Definitions and External References • Symbols are lexical entities that name functions and variables. • Each symbol has a value (typically a memory address). • Code consists of symbol definitions and references. • References can be either localorexternal. m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); } Def of local symbol e Ref to external symbol exit (defined in libc.so) Computer Systems – linking

  11. m.o Relocation Info m.c int e=7; int main() { int r = a(); exit(0); } Disassembly of main (source gdb): 0x80483e4 <main>: push %ebp 0x80483e5 <main+1>: mov %esp,%ebp 0x80483e7 <main+3>: sub $0x8,%esp 0x80483ea <main+6>: call 0x80483fc <a> 0x80483ef <main+11>: add $0xfffffff4,%esp 0x80483f2 <main+14>: push $0x0 0x80483f4 <main+16>: call 0x80482f0 <exit> Disassembly of section .rel.text: (source objdump –r) RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 00000007 R_386_PC32 a 00000011 R_386_PC32 exit Disassembly of section .text (source objdump –d m.o) 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: e8 fc ff ff ff call 7 <main+0x7> 7: R_386_PC32 a Computer Systems – linking

  12. Executable m info Disassembly of section .text (source objdump –d m.o) 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 08 sub $0x8,%esp 6: e8 fc ff ff ff call 7 <main+0x7> 7: R_386_PC32 a • ff ff ff fc (-4) translated into 0d (13) • 7 translated into 80483fc Disassembly of section .text (source objdump –d m) 80483e4: 55 push %ebp 80483e5: 89 e5 mov %esp,%ebp 80483e7: 83 ec 08 sub $0x8,%esp 80483ea: e8 0d 00 00 00 call 80483fc <a> Computer Systems – linking

  13. Relocation algorithm typedef struct { int offset; /* offset of the reference to relocate */ int symbol:24, /* symbol the reference should point to */ type:8; /* relocation type */ } Elf32_Rel; foreach section s { foreach relocation entry r { if (r.type == R_386_PC32) { refaddr = ADDR(s) + r.offset; *refptr += ADDR(r.symbol) – refaddr; } } } 0xfffffffc →0xd 0x7 → 0x80483fc ADDR(S) = main: 0x80483e4 r.type = R_386_PC32 r.symbol = a: 0x80483fc r.offset 0x7 *refptr 0xfcfffff refaddr = 0x80483e4 + 0x7 = 0x80483eb // line after call *refptr = 0xfffffffc + 0x80483fc - 0x80483eb = 0xd PC <- PC + 0xd = 0x80483eb + 0xd = 0x80483fc // 13 lines more Computer Systems – linking

  14. Loading Info 0xc0000000 LOAD off 0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12 filesz 0x0000047c memsz 0x0000047c flags r-x LOAD off 0x0000047c vaddr 0x0804947c paddr 0x0804947c align 2**12 filesz 0x00000114 memsz 0x00000130 flags rw- Loaded from the executable file Executable object file 0x40000000 ELF header Program header table (required for executables) Run-time heap (created by malloc) .init 0x080495ac 0x08048300 .text section Read/write segment (.data, .bss) .fini 0x0804947c Read-only segment (.init, .text, .rodata) .data section 0x08048300 .bss section Unused 0 Computer Systems – linking

  15. How to Package Commonly Used Functions? Awkward, given the linker framework so far: • Option 1: Put all functions in a single source file • Programmers link big object file into their programs • Space and time inefficient • Option 2: Put each function in a separate source file • Programmers explicitly link appropriate binaries into their programs • More efficient, but burdensome on the programmer • Solution: static libraries (.aarchive files) • Concatenate related relocatable object files into a single file with an index (called an archive). • Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. Computer Systems – linking

  16. Static Libraries • Static libraries have the following disadvantages: • Potential for duplicating lots of common code in the executable files on a filesystem. • e.g., every C program needs the standard C library • Potential for duplicating lots of code in the virtual memory space of many processes. • Minor bug fixes of system libraries require each application to explicitly relink Computer Systems – linking

  17. Shared Libraries • Solution: • Shared libraries (dynamic link libraries, DLLs) whose members are dynamically loaded into memory and linked into an application at run-time. • Dynamic linking can occur when executable is first loaded and run. • Common case for Linux, handled automatically by ld-linux.so. • Dynamic linking can also occur after program has begun. • In Linux, this is done explicitly by user with dlopen(). • Basis for High-Performance Web Servers. • Shared library routines can be shared by multiple processes. Computer Systems – linking

  18. Separate region for shared libraries memory invisible to user code kernel virtual memory stack %esp Memory mapped region for shared libraries Linux/x86 process memory image the “brk” ptr runtime heap (via malloc) uninitialized data (.bss) initialized data (.data) program text (.text) forbidden 0 Computer Systems – linking

  19. The Complete Picture m.c a.c Translator Translator m.o a.o libwhatever.a Static Linker (ld) p libc.so libm.so Loader/Dynamic Linker (ld-linux.so) p’ Computer Systems – linking

  20. Power Programmer • The order of linking is very important Computer Systems – linking

  21. Strong and Weak Symbols • Program symbols are either strong or weak • strong: procedures and initialized globals • weak: uninitialized globals p1.c p2.c weak int foo=5; p1() { } int foo; p2() { } strong strong strong Computer Systems – linking

  22. Linker’s Symbol Rules • Rule 1. A strong symbol can only appear once. • Rule 2. A weak symbol can be overridden by a strong symbol of the same name. • references to the weak symbol resolve to the strong symbol. • Rule 3. If there are multiple weak symbols, the linker can pick an arbitrary one. Computer Systems – linking

  23. References to x will refer to the same initialized variable. Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. Linker Puzzles int x; p1() {} p1() {} Link time error: two strong symbols (p1) References to x will refer to the same uninitialized int. Is this what you really want? int x; p1() {} int x; p2() {} int x; int y; p1() {} double x; p2() {} Writes to x in p2 might overwrite y! Evil! int x=7; int y=5; p1() {} double x; p2() {} Writes to x in p2 will overwrite y! Nasty! int x=7; p1() {} int x; p2() {} Computer Systems – linking

  24. Assignment • Problem 7.7: Modify bar5.c to print the correct values Computer Systems – linking

More Related