270 likes | 384 Views
Model Checking x86 Executables with CodeSurfer/x86 and WPDS++. G. Balakrishnan 1 , T. Reps 1,2 , N. Kidd 1 , A. Lal 1 , J. Lim 1 , D. Melski 2 , R. Gruian 2 , S. Yong 2 , C.-H. Chen 2 , and T. Teitelbaum 2,3 1 University of Wisconsin 2 GrammaTech, Inc. 3 Cornell University. Source code. IR
E N D
Model Checking x86 Executableswith CodeSurfer/x86 and WPDS++ G. Balakrishnan1, T. Reps1,2, N. Kidd1, A. Lal1, J. Lim1,D. Melski2, R. Gruian2, S. Yong2, C.-H. Chen2, and T. Teitelbaum2,3 1University of Wisconsin 2GrammaTech, Inc. 3Cornell University
Source code IR Construction Front end CFG + call graph + other info State machine IR Exploration Model checker Error report OK Static Bug-Detection Tools
Executable Memory-access analyzer IR Recovery CFG + call graph + memory-access info State machine IR Exploration Model checker Error report OK Static Bug-Detection Tools
Reveals platform-specific choices made by compiler What you see is what you get Some source-level issues go away Better platform for finding security vulnerabilities Source-code tools: Lack of fidelity can allow vulnerabilities to escape detection Why Executables?
Minimizing Data Lifetime? • Windows • Login process keeps a user’s password in the heap after a successful login • Should minimize data lifetime by • clearing memory • calling free() • But . . . • the compiler might optimize away the memory-clearing code (“useless-code” elimination) memset(buffer, ‘\0’, len); free(buffer); free(buffer);
Puzzle int callee(int a, int b) { int local; if (local == 5) return 1; else return 2; } int main() { int c = 5; int d = 7; int v = callee(c,d); // What is the value of v here? return 0; } Answer: 1 (for the Microsoft compiler)
Tutorial on x86 (Intel Syntax) p = q; p = *q; *p = q; p = &a[2];
Tutorial on x86 (Intel Syntax) mov ecx, edx mov ecx, [edx] mov [ecx], edx lea ecx, [esp+8] ecx = edx; ecx = *edx; *ecx = edx; ecx = &a[2];
Puzzle Standard prolog Prolog for 1 local push ebp push ebp mov ebp, esp mov ebp, esp sub esp, 4 push ecx int callee(int a, int b) { int local; if (local == 5) return 1; else return 2; } int main() { int c = 5; int d = 7; int v = callee(c,d); // What is the value of v here? return 0; } Answer: 1 (for the Microsoft compiler) mov [ebp+var_8], 5 mov [ebp+var_C], 7 mov eax, [ebp+var_C] push eax mov ecx, [ebp+var_8] push ecx call _callee . . .
The Vision • Code-inspection tools for security analysts • Analyses for identifying • security vulnerabilities and bugs • malicious behavior (code vs. memory snapshots) • commonalities and differences • Platform for • de-compilation • code obfuscation • installation of protection mechanisms • remediation of security vulnerabilities • de-obfuscation (w/ assistance from dyn. tools)
IR recovery control-flow graph (w/ indirect jumps resolved) call graph (w/ indirect calls resolved) identification of variables values of pointers used, killed, and possibly-killed variables for CFG nodes data dependences identification of types: base types, pointer types, structs, and classes GUI for code browsing and navigation Scripting language API for accessing the IR API for modifying the IR IR exploration API for traversal/searching/pattern matching API for defining static-analyzers/model-checkers GUI to investigate warnings Cooperation with dynamic tools What Should a Tool Provide? No use of symbol-table or debugging information!!!
IDA Pro ParseBinary CodeSurfer Build CFGs Build SDG Browse Initial estimate of • code vs. data • procedures • call sites • malloc sites • fleshed-out CFGs • fleshed-out call graph • used, killed, may-killed variables for CFG nodes • points-to sets • reports of violations CodeSurfer/x86 Architecture Binary Security Analyzers Connector Decompiler Value-setAnalysis Binary Rewriter UserScripts
IDA Pro ParseBinary CodeSurfer Build CFGs Build SDG Browse Initial estimate of • code vs. data • procedures • call sites • malloc sites • fleshed-out CFGs • fleshed-out call graph • used, killed, may-killed variables for CFG nodes • points-to sets • reports of violations CodeSurfer/x86 Architecture Binary Security Analyzers Connector Decompiler Value-setAnalysis Binary Rewriter UserScripts
IDA Pro ParseBinary CodeSurfer Build CFGs Build SDG Browse Initial estimate of • code vs. data • procedures • call sites • malloc sites • fleshed-out CFGs • fleshed-out call graph • used, killed, may-killed variables for CFG nodes • points-to sets • reports of violations CodeSurfer/x86 Architecture Binary Security Analyzers Connector Decompiler Value-setAnalysis Binary Rewriter UserScripts
IDA Pro ParseBinary CodeSurfer Build CFGs Build SDG Browse Initial estimate of • code vs. data • procedures • call sites • malloc sites • fleshed-out CFGs • fleshed-out call graph • used, killed, may-killed variables for CFG nodes • points-to sets • reports of violations CodeSurfer/x86 Architecture Binary Security Analyzers Connector Decompiler Value-setAnalysis Binary Rewriter UserScripts
IDA Pro ParseBinary CodeSurfer Build CFGs Build SDG Browse Initial estimate of • code vs. data • procedures • call sites • malloc sites • fleshed-out CFGs • fleshed-out call graph • used, killed, may-killed variables for CFG nodes • points-to sets • reports of violations CodeSurfer/x86 Architecture Binary Security Analyzers Connector Decompiler Value-setAnalysis Binary Rewriter UserScripts
IDA Pro ParseBinary CodeSurfer Build CFGs Build SDG Browse Initial estimate of • code vs. data • procedures • call sites • malloc sites • fleshed-out CFGs • fleshed-out call graph • used, killed, may-killed variables for CFG nodes • points-to sets • reports of violations CodeSurfer/x86 Architecture Binary Security Analyzers Connector Decompiler Value-setAnalysis Binary Rewriter UserScripts
IR Recovery: Scope of our Ambitions • Programs that conform to a “standard compilation model” • procedures • activation records • global data region • heap-allocated structs/objects (malloc/new) • virtual functions • dynamically linked libraries • Report violations • violations of stack protocol • return address modified within procedure Memory-safety violations!
Static Analysis of Executables:State of the Art Prior to CS/x86 • Relies on symbol-table/debugging info • Atom, EEL, Vulcan, Rival • Able to track only data movements via registers • EEL, Cifuentes, Debbabi, Debray • Poor treatment of memory operations • Overly conservative treatment many false positives • Non-conservative treatment many false negatives • Limited usefulness for security analysis
An Application of CodeSurfer/x86 • Project at MIT Lincoln Labs (originally classified) • Adopted CodeSurfer/x86 (replacing IDA Pro) • DARPA funding under “Dynamic quarantine of worms” • PI: Rob Cunningham; PM: Anup Ghosh • Given a worm . . . • What are its target-discovery, propagation, and activation mechanisms? • What is its payload? • Use of CodeSurfer/x86’s analysis mechanisms • Find system calls • Find their arguments • Follow dependences backwards to find where their values come from • . . .
Demo CodeSurfer/C CodeSurfer/x86
Reveals platform-specific choices made by compiler memory layout padding between fields of a struct which variables are adjacent? register usage execution order optimizations performed compiler bugs Some source-level issues go away analyze the actual library code, not hand-written stubs in-line assembly code use of multiple source languages Better platform for finding security vulnerabilities A source-code tool would have to duplicate all choices made by the compiler & optimizer Why Executables?
IR Exploration • API for traversal/searching/pattern matching • API for defining static-analyzers/model-checkers • Use a script to traverse IR • Create a model of the program as Weighted PDS • Invoke analyzer (WPDS++) • Path Explorer tool • Software-assurance plug-in to CodeSurfer/x86 • Performs security-related analyses on the IR • Uses the GUI to investigate warnings
Related Work Balakrishnan and Reps, “Analyzing memory accesses in x86 executables” [CC04] http://www.cs.wisc.edu/~reps/#cc04 Debray et al., “Alias analysis of executable code” [POPL 98] Cifuentes et al., “Assembly to high-level language translation” [ICSM 98] A. Mycroft, “Type-based decompilation” [ESOP 99] Linn et al., “Stack analysis of x86 executables” [Unpublished] Guo et al., Practical and accurate low-level pointer analysis” Amme et al., “Data dependence analysis of assembly code” [PACT 98]