490 likes | 694 Views
DIRA: Automatic Detection, Identification, and Repair of Control-Hijacking Attacks. Alexey Smirnov and Tzi-cker Chiueh SUNY at Stony Brook {alexey, chiueh}@cs.sunysb.edu DEFCON 13. Outline of the Talk. Introduction Related Work DIRA Architecture Attack Detection Attack Identification
E N D
DIRA: Automatic Detection, Identification, and Repair of Control-Hijacking Attacks Alexey Smirnov and Tzi-cker Chiueh SUNY at Stony Brook {alexey, chiueh}@cs.sunysb.edu DEFCON 13
Outline of the Talk • Introduction • Related Work • DIRA Architecture • Attack Detection • Attack Identification • Attack Repair • Performance Evaluation • Conclusion
Introduction • Buffer overflow attacks are the most common type of attacks. • A comprehensive protection strategy should consists of the following components: • Attack detection – to prevent the attack from causing damage; • Attack identification – to feed the IDS with the attack signature; • Attack repair – to allow the compromised application to continue its normal execution. • We propose a compile-time solution that provides all three components.
What is a Buffer Overflow Attack • Control-hijacking attacks work by overwriting a control pointer such as the return address, function pointer, etc. • Buffer overflows are possible when the length of the target buffer is less than the length of the data that can be written into it. • Standard libc functions such as strcpy() or sprintf() are responsible for most buffer overflows.
Outline of the Talk • Introduction • Related Work • DIRA Architecture • Attack Detection • Attack Identification • Attack Repair • Performance Evaluation • Conclusion
Attack Detection • Stackguard – place a canary word before the return address (RA) in the function prolog and check it in the function epilog. The assumption is that the attacker will have to overwrite the canary word in order to overwrite the RA. • RAD – save the original RA in a safe place in the function prolog and compare it to the value stored in the stack in the function epilog.
Approaches to Attack Identification • Automatic ways to identify attacks (that is, to generate their signatures) are very important for worm epidemics confinement. • Previous systems either provided a single attacking packet or required a large pool of malicious network data. • Toth and Kruegel – look at network packets payloads and perform abstract code execution. • TaintCheck – uses the value of compromised control pointer as the attack signature. • Autograph – extracts most common subsequences from suspicious flows and reports them as signatures. • Polygraph and Nemean – use machine learning algorithms to derive common patterns from a large set of malicious flows.
Approaches to Attack Repair • Program rollback and replay is used in software debugging. Two approaches: (1) keep execution history (Spyder) or (2) do periodic state check-pointing. Check-pointing is easy under Linux because of copy-on-write fork() system call (RECAP and Flashback). Can be more difficult under other OS. • Check-pointing relies on the OS rather than on the applications. • Shadow Honeypot runs two versions of the application (protected and non-protected) and dynamically switches between the two once an attack has been detected.
Outline of the Talk • Introduction • Related Work • DIRA Architecture • Attack Detection • Attack Identification • Attack Repair • Performance Evaluation • Conclusion
DIRA Approach • DIRA is an extension to GCC 3.4.1. It uses memory updates logging to solve the three problems at the same time. • The idea is to maintain a run-time log of all changes to the memory state of the program. • Assignments such as a=b; and libc function calls such as memcpy() change the memory state of the program. • For each memory update DIRA stores its source address, destination address, length, and the pre-image.
DIRA Approach • How to detect, identify, and repair an attack using memory updates log? • To detect– compare the current RA with that saved in the log; • To identify – trace back the data that replaced the control pointer to the point where it was read from the network; • To repair – restore the memory state using the pre-images stored in the log. • At compile time, DIRA instruments the source code to perform logging and to check correctness of control pointers. • At run-time, the logging code generates the memory updates log.
Memory Updates Logging • Memory updates log is a circular buffer; each entry has four fields: read_addr, write_addr, len, data. • DIRA logs effect of each operation of the form X=Y where X and Y are directly referenced variables, array references (a[i]), or de-referenced variables (*(a+1)). • read_addr is set to &Y, • write_addr is set to &X, • len is set to sizeof(Y), • data is set to the pre-image of X in DIR mode and is empty in other modes.
Memory Updates Logging • If the right-hand side is a complex expression then a log record is created for each variable of it. • To handle updates performed by libc functions DIRA proxies several of them: string manipulation functions, format string functions, file and network I/O functions; • The log is also used to store tags, special records indicating change of program’s run-time state: • FUNCTION_ENTRY tag is inserted when a function is called; • FUNCTION_EXIT tag is inserted before a function returns. • Tags are used for signature generation and repair.
Memory Updates Logging Example • At compile time: • Source code: x=y+z; • Instrumented code: (log(&x, &y, sizeof(y), &x), (log(&x, &z, sizeof(z), &x), x=y+z)); • At run time: log() adds two records to the memory updates log: • read_addr: &y; write_addr: &x; len: sizeof(y); data: x; • read_addr: &z; write_addr: &x; len: sizeof(z); data: x;
Memory Updates Logging Example • At compile time: • Source code: strcpy(a,b); • Instrumented code: dira_strcpy(a,b); • At run time: • Proxy function dira_strcpy() adds a log record: read_addr=&b, write_addr=&a, len=strlen(b)+1, data=a
Attack Detection (D-mode) • DIRA uses RAD-like approach: the code to save the RA in a protected buffer is added to the function prolog. The actual RA stored in the stack is compared with this value in function epilog. Using a special buffer to store RAs is an optimization of using a common memory update log to store RAs. • DIRA can protect other control-sensitive data structures such as GOT, signal handler tables in a similar fashion (not implemented yet).
Attack Identification • The desired properties of an attack signature: • Context-aware (to reduce false positives); • Semantics-aware (to reduce false positives); • Provides a degree of flexibility within each packet (to reduce false negatives); • DIRA’s signatures consist of multiple packets, each packet is a regular expression. The length constraint limits the length of the attacking part of the last packet. • Memory updates log is used to build attack signatures.
Attack Identification • Two types of dependencies: data and control dependencies. • A data dependency is created when one variable is assigned to another. • A control dependency is created between variable X and variable Y if value of variable Y depends on the value of variable X used in a conditional expression. Example: if (x>0) y=1; else y=2; • Why we need control dependencies? Example: FTP server attack involving authentication.
Vulnerable FTP Server Example • A vulnerable FTP server pseudo-code: char buf[16]; Is_auth=is_user=0; // user not authenticated initially while (1) { recv_packet(p); if (!strncmp(p, “QUIT”,4)) break; if (!strncmp(p, “USER”, 4)) { is_user=1; continue; } if (!strncmp(p, “PASS”, 4) && is_user) { is_auth=1; continue; } if (!is_auth) continue; // authentication required if (!strncmp(p, “GET”, 3)) { strcpy(buf, p+4); // copy filename send_file(buf); } }
FTP Server Attack • FTP server GET attack (3 packets): • USER alexey • PASS my_pass • GET very_long_file_name_that_will_overwrite_the_return_address
FTP Server Attack • FTP server GET attack (3 packets): • USER alexey • PASS my_pass • GET very_long_file_name_that_will_overwrite_the_return_address • Log records: • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user>
FTP Server Attack • FTP server GET attack (3 packets): • USER alexey • PASS my_pass • GET very_long_file_name_that_will_overwrite_the_return_address • Log records: • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth>
FTP Server Attack • FTP server GET attack (3 packets): • USER alexey • PASS my_pass • GET very_long_file_name_that_will_overwrite_the_return_address • Log records (third packet): • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
FTP Server Attack • The return address (RA) is located after buf: RA=buf+17. • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Identifying Attack Using Data Dependencies • The return address (RA) is located after buf: RA=buf+17. • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Identifying More Packets Using Control Dependencies • The return address (RA) is located after buf: RA=buf+17. • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Identifying More Packets Using Control Dependencies • The return address (RA) is located after buf: RA=buf+17. • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Identifying More Packets Using Control Dependencies • The return address (RA) is located after buf: RA=buf+17. • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Identifying More Packets Using Control Dependencies • The return address (RA) is located after buf: RA=buf+17. • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Identifying More Packets Using Control Dependencies • The return address (RA) is located after buf: RA=buf+17. • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Definition of Control Dependencies • Whenever variable X can prevent control flow from reaching variable Y, a control dependency is created between X and Y. • stmt1 and stmt2 are always dependent. • Control dependencies are also created for for and while. Tags START_SCOPE and END_SCOPE are used to store control dependencies in the memory updates log.
Representing Packets as Regular Expressions • For each byte of the attacking packet DIRA determines whether it was looked at by the program or not looked at. For example, strcmp() applied to the packet bytes converts them into looked-at bytes. If the bytes are blindly copied with strcpy() then they are non-looked-at. Initially all bytes are not-looked-at. • DIRA traverses the log forward from where the packets were received and records all packet bytes that were looked at. • When it outputs the bytes, a looked-at byte is output as is, a non-looked-at is output as ‘?’.
Building Regular Expressions • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Building Regular Expressions • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Building Regular Expressions • <DIRA_RECV, &p, 11, “USER alexey”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <NULL, &is_user, 4, is_user> • <DIRA_RECV, &p, 12, “PASS my_pass”> • <DIRA_STRNCMP, &p, 4, NULL> • <DIRA_COND, &p, 0, NULL> • <DIRA_COND, &is_user, 0, NULL> • <NULL, &is_auth, 4, is_auth> • <DIRA_RECV, &p, 62, “GET …”> • <DIRA_COND, &is_auth, 0, NULL> • <DIRA_STRNCMP, &p, 3, NULL> • <DIRA_COND, &p, 0, NULL> • <&p+4, &buf, strlen(p)-4+1, *(p+4)>
Length Constraint Generation • The length constraint limits the attacking part of the packetby specifying the terminating character and its maximum offset in any benign packet.
DIRA’s Signature File Format • N – number of packets • L_i – length of i-th packet • Regular expression of the packet. Possible characters are shown on the right: • The length constraint is specified for the last attacking packet.
Complete Signature for FTP Attack • 3 # number of packets • 11 # 1st packet length • USER??????? • 12 # 2nd packet length • PASS???????? • 62 # 3rd packet length • GET???...??? • 4 17 \0 # length constraint
Attack Recovery (DIR-mode) • Main goal: bring the program to the state in which it was before the attack packet(s) was received. • How to restore the pre-attack state? • From which point to continue execution? • Program restart points can only be at the beginning of a function because only global updates are logged in DIR mode (for performance reasons). • The proper function is the least common dynamic ancestor of the function in which the attack was detected and the function in which the data was read in.
Choosing the Restart Point • depth is a loop invariant: it is the relative depth of the current function with respect to the greatest dynamic ancestor seen so far.
Choosing the Restart Point • When all updates are tracked it is possible to resume execution from the middle of a function. • No system support is required for restarting – longjmp() and setjmp() are used. A setjmp() call is inserted before the function that can be a potential restart point is called (to push the arguments again). • DIRA inserts the first local update tag when it encounters such an update after a function call.
Outline of the Talk • Introduction • Related Work • DIRA Architecture • Attack Detection • Attack Identification • Attack Repair • Performance Evaluation • Conclusion
DIRA Evaluation • Programs tested: • ghttpd 1.4 – have exploit; • drcatd 0.5.0 – have exploit; • named 8.1 – have exploit; • qpopper 4.0.4; • proftpd 1.2.9; • Two goals: measure run-time overhead and quality of automatically generated signatures • Configuration: server machine (P-4M 1.7GHz, 512 MB RAM), two clients (Athlon 1.7GHz, 512 MB RAM). • Used exploit programs from securiteam.com and insecure.org.
Run-time Overhead • The following two graphs show run-time overhead for programs compiled in DIR-mode:
Signature Generation • Signatures were produced for all programs that we had exploits for. ghttpd signature specifies length constraint using terminating character; named signature specifies maximum value of the length field. The drcatd signature has three packets in it: login, password, and the attacking packet
Is Recovery Really Useful? • Recovery incurs significant overhead. Is it really better than just terminating the application? Yes, because: • Terminating a single-threaded program disconnects all clients. • Same tradeoff exists in the case of source-code checking tools: using them requires developer’s time investment and we can always use Stackguard instead to protect the programs.
Outline of the Talk • Introduction • Related Work • DIRA Architecture • Attack Detection • Attack Identification • Attack Repair • Performance Evaluation • Conclusion
Conclusion • DIRA solves the problems of attack detection, identification, and repair in a unified way. • It produces accurate multi-packet signatures from a single attack instance. • Dynamic slicing of the memory updates log is the underlying technique. • Same technique can be used for automatic patch generation – our future work.