1.04k likes | 1.34k Views
Part 2: Advanced Static Analysis. Chapter 4: A Crash Course in x86 Disassembly Chapter 5: IDA Pro Chapter 6: Recognizing C Code Constructs in Assembly. How software works. gcc compiler driver pre-processes, compiles, assembles and links to generate executable
E N D
Part 2: Advanced Static Analysis Chapter 4: A Crash Course in x86 Disassembly Chapter 5: IDA Pro Chapter 6: Recognizing C Code Constructs in Assembly
How software works gcc compiler driver pre-processes, compiles, assembles and links to generate executable • Links together object code (i.e. game.o) and static libraries (i.e. libc.a) to form final executable • Links in references to dynamic libraries for code loaded at load time (i.e. libc.so.1) • Executable may still load additional dynamic libraries at run-time Pre- processor Compiler Assembler Linker hello.c hello.i hello.s hello.o hello Program Source Modified Source Assembly Code Object Code Executable Code
Static libraries Suppose you have utility code in x.c, y.c, and z.c that all of your programs use • Link together individual .o files gcc –o hello hello.o x.o y.o z.o • Create a library libmyutil.a using ar and ranlib and link library in statically libmyutil.a : x.o y.o z.o ar rvu libmyutil.a x.o y.o z.o ranlib libmyutil.a gcc –o hello hello.c –L. –lmyutil • Note: library code copied directly into binary
Dynamic libraries Avoid having multiple copies of common code on disk • Problem: libc • “gcc program.c –lc” creates an a.out with entire libc object code in it (libc.a) • Almost all programs use libc! • Solution: Have binaries compiled with a reference to a library of shared objects versus an entire copy of the library • Libraries loaded at run-time from file system • “ldd <binary>” to see which dynamic libraries a program relies upon • gcc flags “–shared” and “-soname” for handling and generating dynamic shared object files
The linking process (ld) Merges object files • Merges multiple relocatable (.o) object files into a single executable program. Resolves external references • References to symbols defined in another object file. Relocates symbols • Relocates symbols from their relative locations in the .o files to new absolute positions in the executable. • Updates all references to these symbols to reflect their new positions. • References in both code and data • code: a(); /* reference to symbol a */ • data: int *xp=&x; /* reference to symbol x */
Executables Various file formats • Linux = Executable and Linkable Format (ELF) • Windows = Portable Executable (PE)
ELF Standard binary format for object files in Linux One unified format for • Relocatable object files (.o), • Shared object files (.so) • Executable object files Better support for shared libraries than old a.out formats. More complete information for debuggers.
ELF Object File Format 0 ELF header Program header table (required for executables) ELF header • Magic number, type (.o, exec, .so), machine, byte ordering, etc. Program header table • Page size, virtual addresses of memory segments (sections), segment sizes, entry point .text section • Code .data section • Initialized (static) data .bss section • Uninitialized (static) data • “Block Started by Symbol” .text section .data section .bss section .symtab .rel.text .rel.data .debug Section header table (required for relocatables)
ELF Object File Format (cont) 0 ELF header Program header table (required for executables) .symtab section • Symbol table • Procedure and static variable names • Section names and locations .rel.text section • Relocation info for .text section • Addresses of instructions that will need to be modified in the executable • Instructions for modifying. .rel.data section • Relocation info for .data section • Addresses of pointer data that will need to be modified in the merged executable .debug section • Info for symbolic debugging (gcc -g) .text section .data section .bss section .symtab .rel.text .rel.data .debug Section header table (required for relocatables)
PE (Portable Executable) file format Windows file format for executables Based on COFF Format • Magic Numbers, Headers, Tables, Directories, Sections Disassemblers • Overlay Data with C Structures • Load File as OS Loader Would • Identify Entry Points (Default & Exported)
Example C Program m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); }
Merging Relocatable Object Files into an Executable Object File Relocatable Object Files Executable Object File system code 0 .text headers .data system data system code main() .text a() main() .text m.o more system code .data int e = 7 system data .data int e = 7 a() int *ep = &e .text int x = 15 .bss a.o .data int *ep = &e uninitialized data int x = 15 .symtab .debug .bss int y
Program execution Operating system provides • Protection and resource allocation • Abstract view of resources (files, system calls) • Virtual memory • Uniform memory space abstraction for each process • Gives the illusion that each process has entire memory space
How does a program get loaded? The operating system creates a new process. • Including among other things, a virtual memory space • Important: any hardware-based debugger must know OS state in page tables to map accesses to virtual addresses System loader reads the executable file from the file system into the memory space. • Reads executable from file system into memory space • Executable contains code and statically link libraries • Done via DMA (direct memory access) • Executable in file system remains and can be executed again • Loads dynamic shared objects/libraries into memory • Resolves addresses in code given where code/data is loaded Then it starts the thread of execution running
Loading Executable Binaries Executable object file for example program p 0 ELF header Virtual addr Process image Program header table (required for executables) 0x080483e0 init and shared lib segments .text section .data section 0x08048494 .text segment (r/o) .bss section .symtab .rel.text 0x0804a010 .data segment (initialized r/w) .rel.data .debug 0x0804a3b0 Section header table (required for relocatables) .bss segment (uninitialized r/w)
More on relocation Assembly code with relative and absolute addresses • With VM abstraction, old linkers decide layout and can supply definitive addresses • Windows “.com” format • Linker can statically bind the program to virtual addresses • Now, they provide hints as to where they would like to be placed • But….this could also be done at load time (address space layout randomization) • Windows “.exe” format • Loader rewrites addresses to proper offsets • System needs to force position-independent code • Force compiler to make all jumps and branches relative to current location or relative to a base register set at run-time • ELF uses Global Offset Table • Symbol addresses obtained from GOT before access • Can be targetted for hooks! • Implementation determines exploit
Program execution CPU Memory Addresses Registers E I P Object Code Program Data OS Data Programmer-Visible State • EIP - Instruction Pointer • a. k. a. Program Counter • Address of next instruction • Register File • Heavily used program data • Condition Codes • Store status information about most recent arithmetic operation • Used for conditional branching Data Condition Codes Instructions Stack Memory • Byte addressable array • Code, user data, OS data • Includes stack used to support procedures
Run-time data structures 0xffffffff kernel virtual memory (code, data, heap, stack) memory invisible to user code 0xc0000000 user stack (created at runtime) %esp (stack pointer) memory mapped region for shared libraries 0x40000000 brk run-time heap (managed by malloc) read/write segment (.data, .bss) loaded from the executable file read-only segment (.init, .text, .rodata) 0x08048000 unused 0
Registers The processor operates on data in registers (usually) • movl (%eax), %ecx • Fetch data at address contained in %eax • Store in register %ecx • movl $array, %ecx • Move address of variable array into %ecx • Typically, data is loaded into registers, manipulated or used, and then written back to memory The IA32 architecture is “register poor” • Few general purpose registers • Source or destination operand is often memory locations • Makes context-switching amongst processes easy (less register-state to store)
IA32 General Registers 31 15 8 7 0 %ax %ah %al %eax %cx %ch %cl %ecx %dx %dh %dl %edx General purpose registers (mostly) %bx %bh %bl %ebx %esi %si %edi %di Stack pointer %esp %sp Special purpose registers Frame pointer %ebp %bp
Operand types A typical instruction acts on 1 or more operands • addl %ecx, %edx adds the contents of ecx to edx Three general types of operands • Immediate • Like a C constant, but preceded by $ • e.g., $0x1F, $-533 • Encoded with 1, 2, or 4 bytes based on instruction • Register: the value in one of the 8 integer registers • Memory: a memory address • There are many modes for addressing memory
Operand examples using mov Source Destination C Analog • Memory-memory transfers cannot be done with single instruction Reg movl $0x4,%eax temp = 0x4; Imm Mem movl $-147,(%eax) *p = -147; Reg movl %eax,%edx temp2 = temp1; movl Reg Mem movl %eax,(%edx) *p = temp; Mem Reg movl (%eax),%edx temp = *p;
Addressing Modes Immediate and registers have only one mode Memory on the other hand … • Absolute • specify the address of the data • Indirect • use register to calculate address • Base + displacement • use register plus absolute address to calculate address • Indexed • Indexed • Add contents of an index register • Scaled index • Add contents of an index register scaled by a constant
Type Form Operand Value Name Immediate $Imm Imm Immediate Register Ea R[Ea] Register Memory Imm M[Imm] Absolute Memory (Ea) M[R[Ea]] Indirect Memory Imm(Eb) M[Imm + R[Eb] Base + displacment Memory (Eb, Ei) M[R[Eb] + R[Ei]] Indexed Memory Imm(Eb, Ei) M[Imm + R[Eb] + R[Ei]] Indexed Memory (, Ei, s) M[R[Ei] * s] Scaled Indexed Memory Imm(, Ei, s) M[Imm + R[Ei] * s] Scaled Indexed Memory (Eb, Ei, s) M[R[Eb] + R[Ei] * s] Scaled Indexed Memory Imm (Eb, Ei, s) M[Imm + R[Eb] + R[Ei] * s] Scaled Indexed Summary of IA32 Operand Forms
x86 instructions Rules • Source operand can be memory, register or constant • Destination can be memory or register • Only one of source and destination can be memory • Source and destination must be same size Flags set on each instruction • EFLAGS • Conditional branches handled via EFLAGS
What’s the “l” for on the end? addl 8(%ebp),%eax It stands for “long” and is 32-bits It tells the size of the operand. Baggage from the days of 16-bit processors For x86, x86_64 • 8 bits is a byte • 16 bits is a word • 32 bits is a double word • 64 bits is a quad word
C Declaration Intel Data Type GAS Suffix Size in bytes char Byte b 1 short Word w 2 int Double word l 4 unsigned Double word l 4 long int Double word l 4 unsigned long Double word l 4 char * Double word l 4 float Single precision s 4 double Double precision l 8 long double Extended precision t 10/12 IA32 Standard Data Types
Global vs. Local variables Global variables stored in either .data or .bss section of process Local variables stored on stack
Global vs local example void a() { int x = 1; int y = 2; x = x+y; printf("Total = %d\n",x); } int main() {a();} int x = 1; int y = 2; void a() { x = x+y; printf("Total = %d\n",x); } int main(){a();}
Global vs local example void a() { int x = 1; int y = 2; x = x+y; printf("Total = %d\n",x); } int main() {a();} 080483c4 <a>: 80483c4: push %ebp 80483c5: mov %esp,%ebp 80483c7: sub $0x8,%esp 80483ca: mov 0x804966c,%edx 80483d0: mov 0x8049670,%eax 80483d5: lea (%edx,%eax,1),%eax 80483d8: mov %eax,0x804966c 80483dd: mov 0x804966c,%eax 80483e2: mov %eax,0x4(%esp) 80483e6: movl $0x80484f0,(%esp) 80483ed: call 80482dc <printf@plt> 80483f2: leave 80483f3: ret int x = 1; int y = 2; void a() { x = x+y; printf("Total = %d\n",x); } int main(){a();} 080483c4 <a>: 80483c4: push %ebp 80483c5: mov %esp,%ebp 80483c7: sub $0x18,%esp 80483ca: movl $0x1,-0x8(%ebp) 80483d1: movl $0x2,-0x4(%ebp) 80483d8: mov -0x4(%ebp),%eax 80483db: add %eax,-0x8(%ebp) 80483de: mov -0x8(%ebp),%eax 80483e1: mov %eax,0x4(%esp) 80483e5: movl $0x80484f0,(%esp) 80483ec: call 80482dc <printf@plt> 80483f1: leave 80483f2: ret
Arithmetic operations void f(){ int a = 0; int b = 1; a = a+11; a = a-b; a--; b++; } int main() { f();} 08048394 <f>: 8048394: push %ebp 8048395: mov %esp,%ebp 8048397: sub $0x10,%esp 804839a: movl $0x0,-0x8(%ebp) 80483a1: movl $0x1,-0x4(%ebp) 80483a8: addl $0xb,-0x8(%ebp) 80483ac: mov -0x4(%ebp),%eax 80483af: sub %eax,-0x8(%ebp) 80483b2: subl $0x1,-0x8(%ebp) 80483b6: addl $0x1,-0x4(%ebp) 80483ba: leave 80483bb: ret
Machine Instruction Example int sum(int x, int y) { int t = x+y; return t; } C Code • Add two signed integers Assembly • Add 2 4-byte integers • “Long” words in GCC parlance • Same instruction whether signed or unsigned • Operands: x: Register %eax y: Memory M[%ebp+8] t: Register %eax • Return function value in %eax Object Code • 3-byte instruction • Stored at address 0x401046 _sum: pushl %ebp movl %esp,%ebp movl 12(%ebp),%eax addl 8(%ebp),%eax movl %ebp,%esp popl %ebp ret 0x401046: 03 45 08
Condition codes The IA32 processor has a register called eflags (extended flags) Each bit is a flag, or condition code CF Carry Flag SF Sign Flag ZF Zero Flag OF Overflow Flag As programmers, we don’t write to this register and seldom read it directly Flags are set or cleared by hardware depending on the result of an instruction
Condition Codes (cont.) Setting condition codes via compare instruction cmpl b,a Computes a-b without setting destination CF set if carry out from most significant bit Used for unsigned comparisons ZF set if a == b SF set if (a-b) < 0 OF set if two’s complement overflow (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0) Byte and word versions cmpb, cmpw
Condition Codes (cont.) Setting condition codes via test instruction testl b,a Computes a&b without setting destination Sets condition codes based on result Useful to have one of the operands be a mask Often used to test zero, positive testl %eax, %eax ZF set when a&b == 0 SF set when a&b < 0 Byte and word versions testb, testw
if statements void f(){ int x = 1; int y = 2; if (x==y) { printf("x equals y.\n"); } else { printf("x is not equal to y.\n"); } } int main() { f();} 080483c4 <f>: 80483c4: push %ebp 80483c5: mov %esp,%ebp 80483c7: sub $0x18,%esp 80483ca: movl $0x1,-0x8(%ebp) 80483d1: movl $0x2,-0x4(%ebp) 80483d8: mov -0x8(%ebp),%eax 80483db: cmp -0x4(%ebp),%eax 80483de: jne 80483ee <f+0x2a> 80483e0: movl $0x80484f0,(%esp) 80483e7: call 80482d8 <puts@plt> 80483ec: jmp 80483fa <f+0x36> 80483ee: movl $0x80484fc,(%esp) 80483f5: call 80482d8 <puts@plt> 80483fa: leave 80483fb: ret
if statements int a = 1, b = 3, c; if (a > b) c = a; else c = b; 00000018: C7 45 FC 01 00 00 00 mov dword ptr [ebp-4],1 ; store a = 1 0000001F: C7 45 F8 03 00 00 00 mov dword ptr [ebp-8],3 ; store b = 3 00000026: 8B 45 FC mov eax,dword ptr [ebp-4] ; move a into EAX register 00000029: 3B 45 F8 cmp eax,dword ptr [ebp-8] ; compare a with b (subtraction) 0000002C: 7E 08 jle 00000036 ; if (a<=b) jump to line 00000036 0000002E: 8B 4D FC mov ecx,dword ptr [ebp-4] ; else move 1 into ECX register && 00000031: 89 4D F4 mov dword ptr [ebp-0Ch],ecx ; move ECX into c (12 bytes down) && 00000034: EB 06 jmp 0000003C ; unconditional jump to 0000003C 00000036: 8B 55 F8 mov edx,dword ptr [ebp-8] ; move 3 into EDX register && 00000039: 89 55 F4 mov dword ptr [ebp-0Ch],edx ; move EDX into c (12 bytes down)
Loops int factorial_do(int x) { int result = 1; do { result *= x; x = x-1; } while (x > 1); return result; } factorial_do: pushl %ebp movl %esp, %ebp movl 8(%ebp), %edx movl $1, %eax .L2: imull %edx, %eax decl %edx cmpl $1, %edx jg .L2 leave ret
C switch statements Implementation options Series of conditionals testl followed by je Good if few cases Slow if many cases Jump table (example below) Lookup branch target from a table Possible with a small range of integer constants GCC picks implementation based on structure Example: .L3 .L2 .L0 .L1 .L1 .L2 .L0 switch (x) { case 1: case 5: code at L0 case 2: case 3: code at L1 default: code at L2 } 1. init jump table at .L3 2. get address at .L3+4*x 3. jump to that address
Example int switch_eg(int x) { int result = x; switch (x) { case 100: result *= 13; break; case 102: result += 10; /* Fall through */ case 103: result += 11; break; case 104: case 106: result *= result; break; default: result = 0; } return result; }
int switch_eg(int x) { int result = x; switch (x) { case 100: result *= 13; break; case 102: result += 10; /* Fall through */ case 103: result += 11; break; case 104: case 106: result *= result; break; default: result = 0; } return result; } leal -100(%edx),%eax cmpl $6,%eax ja .L9 jmp *.L10(,%eax,4) .p2align 4,,7 .section .rodata .align 4 .align 4 .L10: .long .L4 .long .L9 .long .L5 .long .L6 .long .L8 .long .L9 .long .L8 .text .p2align 4,,7 .L4: leal (%edx,%edx,2),%eax leal (%edx,%eax,4),%edx jmp .L3 .p2align 4,,7 .L5: addl $10,%edx .L6: addl $11,%edx jmp .L3 .p2align 4,,7 .L8: imull %edx,%edx jmp .L3 .p2align 4,,7 .L9: xorl %edx,%edx .L3: movl %edx,%eax Key is jump table at L10 Array of pointers to jump locations
x86-64 conditionals Modern CPUs with deep pipelines Instructions fetched far in advance of execution Mask the latency going to memory Problem: What if you hit a conditional branch? Must predict which branch to take! Branch prediction in CPUs well-studied, fairly effective But, best to avoid conditional branching altogether x86-64 conditionals Conditional instruction execution
Conditional Move Conditional move instruction cmovXX src, dest Move value from src to dest if condition XX holds No branching Handled as operation within Execution Unit Added with P6 microarchitecture (PentiumPro onward) Example Current version of GCC won’t use this instruction Thinks it’s compiling for a 386 Performance 14 cycles on all data More efficient than conditional branching (simple control flow) But overhead: both branches are evaluated movl 8(%ebp),%edx # Get x movl 12(%ebp),%eax # rval=y cmpl %edx, %eax # rval:x cmovll %edx,%eax # If <, rval=x
x86-64 conditional example int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; } absdiff: # x in %edi, y in %esi movl %edi, %eax # eax = x movl %esi, %edx # edx = y subl %esi, %eax # eax = x-y subl %edi, %edx # edx = y-x cmpl %esi, %edi # x:y cmovle %edx, %eax # eax=edx if <= ret
Increasing Addresses Stack Pointer %esp IA32 Stack Stack “Bottom” • Region of memory managed with stack discipline • Grows toward lower addresses • Register %esp indicates lowest stack address • address of top element Stack Grows Down Stack “Top”
Increasing Addresses Stack Pointer %esp IA32 Stack Pushing Stack “Bottom” Pushing • pushl Src • Decrement %esp by 4 • Fetch operand at Src • Write operand at address given by %esp • e.g. pushl %eax subl $4, %esp movl %eax,(%esp) Stack Grows Down -4 Stack “Top”
Increasing Addresses Stack Pointer %esp IA32 Stack Popping Stack “Bottom” Popping • popl Dest • Read operand at address given by %esp • Write to Dest • Increment %esp by 4 • e.g. popl %eax movl (%esp),%eax addl $4,%esp Stack Grows Down +4 Stack “Top”
Stack Operation Examples Initially pushl %eax popl %edx 0x110 0x110 0x110 0x10c 0x10c 0x10c 0x108 123 0x108 123 0x108 123 0x104 213 0x104 213 Top Top Top %eax 213 %eax 213 %eax 213 %edx %edx %edx 555 213 %esp 0x108 %esp 0x104 0x108 %esp 0x104 0x108
Procedure Control Flow Procedure call: call label • Push address of next instruction (after the call) on stack • Jump to label Procedure return: • ret Pop address from stack into eip register
Procedure Call Example 804854e: e8 3d 06 00 00 call 8048b90 <main> 8048553: 50 next instruction call 8048b90 0x110 0x110 0x10c 0x10c 0x108 123 0x108 123 0x104 0x8048553 %esp 0x108 %esp 0x108 0x104 %eip 0x804854e %eip 0x804854e 0x8048b90 %eip is program counter