300 likes | 439 Views
Post-Attack Analysis of Unknown Vulnerabilities. Peng Ning With Emre C. Sezer, Chongkyung Kil, and Jun Xu. Motivation. Vulnerability analysis Essential for Patching Vulnerability based signature generation Painstakingly slow Depends on human efforts Existing approaches
E N D
Post-Attack Analysis of Unknown Vulnerabilities Peng Ning With Emre C. Sezer, Chongkyung Kil, and Jun Xu
Motivation • Vulnerability analysis • Essential for • Patching • Vulnerability based signature generation • Painstakingly slow • Depends on human efforts • Existing approaches • Static analysis (e.g., [Chen et al. 04] , [Feng et al. 04], [Larochelle & Evans 01]) • False positives • Dynamic analysis (e.g., Minos [Crandall et al. 04], TaintCheck [Newsome & Song 05], DIRA [Smirnov & Chiueh 05]) • Used for detection; inadequate vulnerability information • Symbolic execution (e.g., Exe [Cadar et al. 06], DACODA [Crandall et al. 05]) • Scalability issues • Recovery (e.g., STEM [Sidiroglou et al. 05], SEAD [Lacosto et al. 07]) • Change of application semantics 2007 GMU-CSA Workshop
MemSherlock • MemSherlock is an automateddebugger • Automated analysis of unknownmemory corruption vulnerabilities • Appeared in ACM CCS ’07 • MemSherlock provides • Statement that causes the memory corruption • Dynamic program slice leading to the corruption • Program variables involved in the vulnerability • All presented at programming language level • Implications • Generating vulnerability conditions • Improves signature or patch generation speed 2007 GMU-CSA Workshop
Light-weight IDS MemSherlock Trigger Program Instrumented Program Logger Replayer General Framework: Web Application Example Traffic 2007 GMU-CSA Workshop
MemSherlock Overview • Goal is to provide vulnerability information • Intuitive, easy to understand for the programmer • Not only the corruption point • Slice of program involved in the vulnerability • Effects of user inputs • Program variables involved • Variable relationships (e.g., pointer aliasing) • Type of vulnerability (e.g., stack buffer overflow) • MemSherlock performs two important tasks • Finding the corruption point • Tracking program state 2007 GMU-CSA Workshop
MemSherlock: Finding Corruption Point • Observation: A memory object is modified by a small set of statements (inspired by AccMon) • For memory object m, write set of mis the set of statements that legitimately modify m, WS(m) • Security Condition:Memory object m should only be updated by statements in WS(m) 2007 GMU-CSA Workshop
MemSherlock: Assembly Line • Pre-Debugging Phase • Instruments the program for debugging phase • Extracts program information via static analysis • Needs to be performed once • Debugging Phase • Tracks program state • Monitors memory writes and checks for violation of security condition • Tracks tainted data and its propagation 2007 GMU-CSA Workshop
MemSherlock Architecture 2007 GMU-CSA Workshop
Pre-debugging: Generating Write Sets • MemSherlock analyses source code to determine write sets • For a program variable v, WS(v) includes • Assignment statements (i.e., v=expr) • Library function calls where v is passed as an argument that can be modified (i.e., memcpy(&v,src)) • MemSherlock treats DLLs as black boxes • Assumption: A DLL is internally secure, but externally insecure • e.g., no stack overflows in the library functions • Sound for common, well tested libraries (e.g., clib) • Requires library specifications • For each DLL, a list of functions and the arguments they might modify 2007 GMU-CSA Workshop
Dealing with Pointers • For a pointer variable p two write sets are kept • WS(p) – Statements that modify p • WS(ref(p)) – Statements that modify the referent (e.g., *p=5) • ref(p) is resolved during runtime (debugging) • Perform the same analysis for pointer-type function arguments at function calls • Removes the requirement for inter-procedural static analysis 2007 GMU-CSA Workshop
Chained Dereferences • Earlier technique can only handle simple dereferences • Source code rewriting is used to convert all chained dereferences to simple dereferences • Any other dereference that is not simple is converted in the same manner 2007 GMU-CSA Workshop
Output of Pre-debugging Phase • Simplified program • Simplified pointer dereferences • Compiled with debugging options • Input file for the debugger • Program variables and their write sets • Addresses of global symbols • Frame pointer offsets of local variables • Other flags that help the debugger 2007 GMU-CSA Workshop
MemSherlock Architecture: Debugging 2007 GMU-CSA Workshop
Debugging: Dynamic Monitoring • Runtime monitoring • State Maintenance • Incorporates taint analysis from TaintCheck • Produces a dynamic slice of the program leading to the vulnerability • Write Checking • Monitors and validates memory writes • Write sets are file name and line number pairs <f,l> • Instruction pointer IP is translated into <f,l> • Write sets are associated with program variables • A destination address is translated into a program variable 2007 GMU-CSA Workshop
Keeping Program State Virtual Address Space Stack base Stack base main main fnc A fnc A Memory write 0xABABABAB fnc B fnc C Memory write 0xABABABAB Program State 1 Program State 2 • A given memory region may correspond to different program variables depending on program state • Dynamic monitor keeps track of memory mapping 2007 GMU-CSA Workshop
Debugging: Key Data Structures • Keeps two lists of memory regions • ActiveMemoryRegions • Memory corresponding to program variables or their referent memory regions • NonWritableRegions • Saved registers, return addresses, metadata encapsulating dynamically allocated memory regions 2007 GMU-CSA Workshop
Debugging: State Maintenance • Function calls/returns (memory) • Local variable addresses are calculated and added to ActiveMemoryRegions • Location of return address and saved registers are added to NonWritableRegions list • Heap memory (memory) • malloc/free calls are intercepted • Allocated memory is added to ActiveMemoryRegions • The metadata encapsulating the buffer is added to NonWritableRegions • Pointer value updates (write sets) • Searches ActiveMemoryRegions to find the referent and updates its WS 2007 GMU-CSA Workshop
Debugging: Write Checking • When instruction IP modifies memory m • if m is in ActiveMemoryRegions • determines the variable v it belongs to • converts IP into <f,l> • checks if <f,l> is in WS(v) • If the memory write check fails or m is in NonWritableRegions • Marks the operation as a memory corruption • Displays the vulnerability information 2007 GMU-CSA Workshop
Generating Vulnerability Information • The slice of program contributing to the vulnerability • Statements that have propagated tainted values • Statements that have modified related memory regions • Dependency between memory objects involved in the vulnerability • Points to analysis shows memory regions and how they were accessed • Program state • Call stack information • Write set information 2007 GMU-CSA Workshop
Example Test Case: Null HTTP • ~~http.c~~ • 91: void ReadPOSTData(int sid) { • … • 100: conn[sid].PostData=calloc(conn[sid].dat->in_ContentLength+1024, sizeof(char)); • 101: if (conn[sid].PostData==NULL) { ... • 107: do { • 108: rc=recv(conn[sid].socket, pPostData, 1024, 0); • 109: … • Error Report: • --20361-- Error type: Heap Buffer Overflow • --20361-- Dest Addr: 3AB3E360 • --20361-- IP: 0x804E5C7: ReadPOSTData (http.c:108) • --20361-- Dest address resolved to: • --20361-- Global variable "heap var" • @ 3AB3E280 (size: 224) • --20361-- • --20361-- Memory allocated by 0x804E531: • ReadPOSTData (http.c:100) • --20361-- TAINTED destination 3AB3E360 • --20361-- Fully tainted from: • --20361-- 0x804E5C7: ReadPOSTData (http.c:108) • --20361-- • --20361-- TAINTED size used during allocation • --20361-- Tainted from: • --20361-- 0x804E456: ReadPOSTData (http.c:100) • --20361-- 0x804FBB5: read_header (http.c:153) • --20361-- 0x805121B: sgets (server.c:211) 2007 GMU-CSA Workshop
Vulnerability Analysis Example ~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData; ... 100: conn[sid].PostData=calloc( conn[sid].dat->in_ContentLength+1024, sizeof(char)); ... 107: do { 108: rc=recv(conn[sid].socket, pPostData, 1024, 0); ... Create Heap Object 2007 GMU-CSA Workshop
Vulnerability Analysis Example ~~http.c:~~ 119: int read_header(int sid) { 121: char line[2048]; ... 127: do { 128: memset(line, 0, sizeof(line)); 129: sgets(line, sizeof(line)-1, conn[sid].socket); ... 153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ... 169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { 170: ReadPOSTData(sid); Object Taint ~~http.c~~ 91: void ReadPOSTData(int sid) { 92: char *pPostData; ... 100: conn[sid].PostData=calloc( conn[sid].dat->in_ContentLength+1024, sizeof(char)); ... 107: do { 108: rc=recv(conn[sid].socket, pPostData, 1024, 0); ... Object Use 2007 GMU-CSA Workshop
Vulnerability Analysis Example ~~http.c:~~ 119: int read_header(int sid) { 121: char line[2048]; ... 127: do { 128: memset(line, 0, sizeof(line)); 129: sgets(line, sizeof(line)-1, conn[sid].socket); ... 153: conn[sid].dat->in_ContentLength=atoi((char *)&line+16); ... 169: if (conn[sid].dat->in_ContentLength<MAX_POSTSIZE) { 170: ReadPOSTData(sid); Create ~~server.c~~ 202: int sgets(char *buffer, int max, int fd) 203: { ... 209: conn[sid].atime=time((time_t*)0); 210: while (n<max) { 211: if ((rc=recv(conn[sid].socket, buffer, 1, 0))<0) { ... Taint Object Taint Object 2007 GMU-CSA Workshop
Implementation • Source code is rewritten using CIL (C Intermediate Language) • CodeSurfer was used to extract program variables and their write sets • A commercial static analysis tool • objdump and dwarfdump were used to extract global symbol information • Dynamic Monitoring is implemented in Valgrind • An open source emulator 2007 GMU-CSA Workshop
Evaluation • Tested 11 real-world applications with known memory corruption vulnerabilities • Test cases included • Stack/Heap buffer overflow, Format string • Both control flow and non-control data attacks • Testing methodology • Programs were run under MemSherlock • Exploit programs were used to attack the applications • Log and replay was not used 2007 GMU-CSA Workshop
Evaluation Results Type abbreviations: (S)tack overflow, (H)eap overflow and (F)ormat string 2007 GMU-CSA Workshop
False Negatives • Prozilla: • memcpy uses a kernel function to manipulate page tables when copying entire pages • Valgrind cannot trace into kernel • Can be prevented by function wrappers • Other false negatives are theoretically possible • structs within unions or arrays • Current implementation does not support unions • Currently do not differentiate between elements of an array • Memory corruption errors inside DLLs 2007 GMU-CSA Workshop
False Positives • Embedded assembly • Incomplete library specification • library functions keeping internal state (e.g., strtok(Null, delim) ) • library functions that modify global variables as side effects (e.g., optarg, errno) • pointers that point to hidden global structures (e.g., getdatetime() in time.h) • struct pointers • void pointers that are type-cast to modify struct variables • since the pointer is not of type struct, MemSherlock fails to update accordingly 2007 GMU-CSA Workshop
Conclusion • Fully automated vulnerability analysis • The analysis output is intuitive and human readable • Future Challenges • Automated, long-term fix of vulnerabilities • Semantic consistency is a great challenge • Automated, temporary fix of vulnerabilities • Generating vulnerability condition • Improving signature generation 2007 GMU-CSA Workshop