400 likes | 545 Views
Enhancing Security of Real-World Systems with a Better Understanding of the Threats . Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable and High Performance Computing Coordinated Science Laboratories University of Illinois at Urbana-Champaign. My Dissertation.
E N D
Enhancing Security of Real-World Systems with a Better Understanding of the Threats Shuo Chen Candidate of Ph.D. in Computer Science Center for Reliable and High Performance Computing Coordinated Science Laboratories University of Illinois at Urbana-Champaign
My Dissertation • Security Threat Analysis and Mitigations in Real-World Systems • Investigate the impact of hardware memory errors on the security of Internet servers and firewalls. • Simulate random hardware memory errors • Stochastic model to estimate the probability of security violations. • Analyze and model a wide spectrum of software security vulnerabilities reported by CERT and Bugtraq. • Decompose each vulnerability to many primitive operations. • Introduce formalism into reasoning and description of real vulnerabilities. • Interesting outcome: discovered a new security bug in an HTTP server, now published in Bugtraq. • Construct non-traditional methods to attack major Internet server programs without being detected by most current defense techniques. This represents a new challenge for defense research. • Develop techniques to provide a better security protection for real-world systems • A theorem proving based code analysis • A processor architecture level runtime defense Earlier work Focus of this talk
PART I:Analyzing and Identifying Security Threats on Real-World Software
Significance of Memory Vulnerabilities • CERT Advisories: 66% vulnerabilities are low level memory errors in software. • Widely exploited by attackers, worms and viruses.
Widely Understood Threats of Memory Corruptions • Once a memory error is found, it is straightforward to take control of the victim system by control-hijacking attacks. • First, overwrite control data, such as return addresses, function pointers, GOT entries or DTOR entries. • Program control is hijacked to execute code with malicious purposes. • The malicious code is able to make system calls with the privilege of the victim process. Do real damages to the system.
Current Techniques to Defeat Memory Corruption Attacks • Control hijacking is the most dominant form of memory corruption attacks (CERT and Microsoft Security Bulletin) • Accordingly, many current defense techniques are designed to enforce program control flow integrity in order to provide software security. This research area has been active for many years. • A common justification: attacks not hijacking program control flow (i.e., non-control-hijacking attacks) are rare against real-world software. • Important question: • How confident can we rely on this justification to build defenses? • Is it possible that people currently underestimate the real threats of memory corruption attacks? • Specifically, dominance of control-hijacking attacks attackers’ incapability or lack of incentive to mount non-control-hijacking attacks?
Our Claim: General Applicability of Non-control-hijacking Attacks • Our previous papers suggest an initial doubt • Even random hardware memory errors can subvert the security of real-world systems with a non-negligible probability. None of the compromises is due to control hijacking. • Software vulnerabilities are more deterministic and more amenable to attacks. Why attackers are incapable to mount non-control-hijacking attacks against real-world systems? • We make a hypothetical claim: • Many real-world software applications are susceptible to non-control-hijacking attacks; • The severity of the attack consequences is equivalent to that due to control hijacking attacks. • If the claim is indeed true, it represents a new challenge to defense techniques.
Goal: Empirical Validation of the Claim • Investigate many “representative software applications”. Try to break into them using non-control-hijacking attacks. • Choose representative software applications • We did a quick survey on the recent four years of CERT advisories. Over 1/3 vulnerabilities are in FTP, SSH, Telnet and HTTP servers. • Construct non-control-hijacking attacks to compromise these servers. Each attack results in the root compromise of the victim server.
x uninitialized, run as EUID 0 x=109, run as EUID 0 x=109, run as EUID 109. Lose the root privilege! Get a special SITE EXEC command. Exploit a format string vulnerability. x= 0, still run as EUID 109. Get a data command (e.g., PUT) x=0, run as EUID 0 x=0, run as EUID 0 Non-control-hijacking attack on WU-FTP Server (via a format string bug) int x; FTP_service(...) { authenticate(); x = user ID of the authenticated user; seteuid(x); while (1) { get_FTP_command(...); if (a data command?) getdatasock(...); } } getdatasock( ... ) { seteuid(0); setsockopt( ... ); seteuid(x); } When return to service loop, still runs as EUID 0 (root). Allow me to upload /etc/passwd I can grant myself the root privilege! Only corrupt an integer, not control hijacking.
Non-control-hijacking attack on NULL-HTTP Server (via a heap overflow bug) • Attack the configuration string of CGI-BIN path. • Mechanism of CGI • suppose server name = www.foo.comCGI-BIN = /usr/local/httpd/exe • Requested URL = http://www.foo.com/cgi-bin/bar • The server executes • Our attack • Exploit a heap overflow vulnerability to overwrite CGI-BIN to /bin • Request URL http://www.foo.com/cgi-bin/sh • The server executes /usr/local/httpd/exe /bar /bin /sh The server gives me a root shell! Only overwrite four characters in the CGI-BIN string. Not control hijacking.
auth = 0 auth = 0 auth = 1 auth = 1 Password incorrect, but auth = 1 Logged in without correct password Non-control-hijacking attack on SSH Communications SSH Server (via an integer overflow bug) void do_authentication(char *user, ...) { int auth = 0; ... while (!auth) { /* Get a packet from the client */ type = packet_read(); switch (type) { ... case SSH_CMSG_AUTH_PASSWORD: if (auth_password(user, password)) auth =1; case ... } if (auth) break; } /* Perform session preparation. */ do_authenticated(…); }
More non-control-hijacking attacks • Against NetKit Telnet server (default Telnet server of Redhat Linux) • Exploit a heap overflow bug • Overwrite two strings:/bin/login –h foo.com -p (normal scenario) /bin/sh –h –p -p (attack scenario) • The server runs /bin/sh when it tries to authenticate the user. • Against GazTek HTTP server • Exploit a stack buffer overflow bug • Send a legitimate URL http://www.foo.com/cgi-bin/bar • The server checks that “/..” is not embedded in the URL • Exploit the bug to change the URL to http://www.foo.com/cgi-bin/../../../../bin/sh • The server executes /bin/sh
Implications of Non-Control-Hijacking Attacks • Control flow integrity is not a sufficiently accurate approximation to software security. • Given a memory bug in a real software, attackers’ behaviors can be very diversified. • Although non-control-hijacking attacks are specific to application semantics, there are many types of non-control data critical to software security • E.g., user identity data, configuration data, user input data and decision-making Booleans. • Once attackers have the incentive, they are likely to succeed in non-control-hijacking attacks.
Re-Examining Current Defense Techniques • They were mainly tested against control-hijacking attacks. Need to re-examine the effectiveness. • Many of them are based on control flow integrity • Monitor system call sequence • Protect control data • Non-executable stack and heap • Pointer encryption (PointGuard) • Need to encrypt pointers in libraries to be effective (challenging because no enough type info, type casting very often, performance). • Address space randomization • Good idea. In each run of the program, memory layout is different. • Challenging to deploy on all program segments. • Even every segment is randomized, a recent paper shows the deployment on 32-bit address space doesn’t provide enough entropy. • StackGuard, Libsafe and FormatGuard • They are specific to defeat stack smashing attacks and format string attacks. Not generic solutions. • Building a generic and secure defense technique to defeat memory corruption attacks is still an open problem. • Future defense research should consider non-control-hijacking attacks more seriously.
PART II:Pointer Taintedness Detection: Towards a Better Security Protection for Real-World Systems
Pointer Taintedness • Pointer Taintedness: a pointer value, including a return address, is derived from user input. • Most memory corruption attacks are due to pointer taintedness. • It allows attackers to specify the memory locations to read, write or transfer control to. Usually a pathological program behavior. • Pointer taintedness provides a unifying perspective for reasoning about a significant number of security vulnerabilities.
Most Memory Corruption Attacks are Due to Pointer Taintedness • Format string attack • Taint an argument pointer of functions such as printf, fprintf, sprintf and syslog. • Stack buffer overflow (stack smashing) • Taint a function frame pointer or a return address. • Heap corruption • Taint the free-chunk doubly-linked list of the heap. • Glibc globbing attack • User input resides in a location that is used as a pointer by the parent function of glob().
Stack Buffer Overflow Frame pointer or return address can be tainted. Vulnerable code: char buf[100]; strcpy(buf,user_input); High Return addr Frame pointer buf[99] … buf[1] buf[0] user_input Stack growth buf Low
fmt: format string pointer ap: argument pointer fmt: format string pointer ap: argument pointer Format String Attack Vulnerable code: recv(buf); printf(buf); /* should be printf(“%s”,buf) */ \xdd \xcc \xbb \xaa %d %d %d %n High … %n %d %d %d 0xaabbccdd Stack growth Low In vfprintf(), if (fmt points to “%n”) then **ap = (character count) *ap is a tainted value.
Heap Corruption Attack Free chunk A Vulnerable code: buf = malloc(1000); recv(sock,buf,1024); free(buf); Allocated buffer buf user input Free chunk B fd=A bk=C In free(): B->fd->bk=B->bk; B->bk->fd=B->fd; Free chunk C When B->fd and B->bk are tainted, the effect of free() is to write a user specified value to a user specified address.
Building Defense Techniques based on Pointer Taintedness • Static code analysis: analyze the source code to extract the conditions under which the possibility of pointer taintedness exists. • To uncover potential vulnerabilities • Runtime detection: monitor at runtime whether a tainted value is dereferenced as a pointer. • To defeat memory corruption attacks (both control-hijacking and non-control-hijacking attacks)
Project AFormal Reasoning about Pointer Taintedness: To Extract Security Specifications of Library Functions
Project Overview • Our analysis on CERT advisories shows • A significant portion of vulnerabilities ( 33.6%) due to errors in library functions or incorrect invocations of library functions. • Need a more rigorous reasoning on library function specifications. • Library function specifications are currently ad-hoc. Many of them are specified after real attacks are discovered. • printf(fmt,…): fmt cannot be a user-specified string • strcpy(d,s): the length of string s should not exceed the size of buffer d, and d and s cannot be overlapped. • d= savestr(s): do not free d if this is not the first invocation of savestr. • free(p): p must be a pointer obtained from a previous malloc; p cannot be freed before. • glob(p): p cannot be a string starting with ‘~’ and ending with ‘{’. • What is a unified reason why these specifications are required? • Answer: they are required to eliminate the possibility of pointer taintedness. • Extraction of security specifications of a function is reduced to a theorem proving problem: under which conditions can a function eliminate the possibility of pointer taintedness. • I develop an equational logic based theorem proving approach to extract security specifications.
Extracting Function Specifications by Theorem Prover Automatically translated to formal semantic representation C source code of a library function formal semantic representation Theorem generation For each pointer dereference in an assignment, generate a theorem stating that the pointer is not tainted Theorem proving A set of sufficient conditions that imply the validity of the theorems. They are the security specifications of the analyzed function.
int vfprintf (FILE *s, const char *format, va_list ap) { char * p, *q; int done,data,n,state; char buf[10]; p=format; done=0; if (p==NULL) return 0; state=NO_PENDING; while (*p != 0) { if (state==NO_PENDING) { if (*p=='%') state=PENDING; else outchar(s,*p); } else { switch (*p) { case '%': outchar(s,'%') break; case 'd': data=va_arg (ap, int); if (data<0) { outchar(s,'-'); data=-data; } n=0; while (data>0 && n<10) { buf[n]=data%10+'0'; data/=10; n++; } while (n>0) { n--; outchar(s,buf[n]); } break; case 's': q=va_arg (ap, char *); if (q==NULL) break; while (*q!=0) { outchar(s,*q) q++; } break; case 'n': q= va_arg(ap,void*) ; *(int*) q = done; break; default: outchar(s,*p) } state=NO_PENDING; } p++; } return done; } Example: vfprintf() Theorem1: buf+n should not be a tainted value Theorem2: q should not be a tainted value
Suggest the scenario of format string vulnerability Extracting the Specifications of vfprintf() • Try to prove the two theorems • Initially, the theorem prover cannot complete the proof, because the theorems are only valid under certain preconditions. • Add these preconditions as axioms to the theorem prover. • Repeat the above step until the theorems are proved. • Finally, the following four preconditions are added, which are the specifications of vfprintf (FILE *s, const char *format, va_list ap) • ap never points to any location within the current function frame. • *ap never points to the location of variable ap, i.e., *ap &ap • Suppose the memory segment that ap sweeps over is called ap_activitiy_range, then *ap never points to any location within ap_activitiy_range. • No locations within ap_activitiy_range are tainted before vfprintf() is called.
Other Studied Examples • Function strcpy() • Four security specifications indicating buffer overflow, buffer overlapping and buffer underflow scenarios causing pointer taintedness. • Function free() of a heap management system • Seven security specifications are extracted, including several specifications indicating heap corruption vulnerabilities. • Socket read functions of Apache HTTPD and NULL HTTPD • The Apache function is proven to be free of pointer taintedness. • Two (known) vulnerabilities are exposed in the theorem proving process of NULL HTTPD function.
Project BRuntime Pointer Taintedness Detection: To Defeat Memory Corruption Attacks
Project Overview • We propose a processor architectural level mechanism to detect pointer taintedness • Implemented on SimpleScalar simulator • An extended memory system with taintedness bit attached to every byte • Enhanced load, store and ALU instructions to track taintedness bits in memory • Detecting security attacks when tainted data are dereferenced. • Evaluation • It detects both control hijacking and non-control-hijacking attacks against real-world software. • No known false positive: no alarm during normal executions of network servers and SPEC benchmarks. Fully compatible to existing applications. • Transparent to applications. We can run precompiled binaries on the architecture. • Some potential false negative scenarios. They are rare and not defeated by current generic detection techniques either.
Conclusions • Our analysis shows that real-world software can be compromised by corrupting non-control data. Non-control-hijacking attacks represent a realistic threat. • It is insufficient to rely on control flow integrity for software security. • Pointer taintedness is a common characteristic of most memory corruption attacks, including control hijacking and non-control-hijacking attacks. • A theorem proving based code analysis approach is designed to reason about possibilities of pointer taintedness. • E.g., to formally extract security specifications of library functions. • A runtime pointer taintedness detection mechanism is designed. It can effectively detect most memory corruption attacks.
Summary of My Research Methodology • Analysis-centric approach • Analyzed impact hardware faults on security (fault injection + stochastic modeling) • Analyzed Bugtraq and CERT vulnerability databases • Analyzed application source code, attacks and current defense techniques • Analysis results motivate • To expose new security threats • Propose new defense techniques • I like doing analysis of real data and incidents • Tedious? Sometimes, but it is a crucial step toward a lot of fun. • Rewarding? Definitely. Analysis is especially important for systems research. • Goal: strongly motivate research topics that solve problems in the reality.
Static and Dynamic Approaches • Static approaches (avoid producing memory vulnerabilities in programs) • Writing code with type safe language • Compiler techniques to uncover memory vulnerabilities • Compiler instruments source code according to program annotations. • Challenges: legacy code and low level code, compatibility and performance. • Fact: Memory vulnerabilities are still constantly discovered and exploited. • Intrusion detection techniques (defeat attacks, given the existence of vulnerabilities) • Specialized techniques • Defeat stack buffer overflow and format string attacks. • Generic defense techniques • Most techniques are designed to defeat control-hijacking attacks. Host intrusion detection system and control flow integrity protection techniques. very active research area. • Others have constraints and difficulties in their deployments. (pointer encryption and address randomization)
One-Slide Intro to Equational Logic • Use term rewriting to establish proofs of theorems. • Natural number addition expressed in the Maude system. 0 : Natural . s_ : Natural -> Natural . _+_ : Natural Natural -> Natural . vars N M : Natural Axiom: N + 0 = N . Axiom: N + s M = s (N + M) . (s s s 0) + (s s 0) = s ((s s s 0) + (s 0)) = s( s((s s s 0) + 0)) = s(s((s s s 0)) = s s s s s 0 Intuitively, this is a proof of “3 + 2 = 5” in natural number algebra.
Axioms of Eval and ExpT operations Eval(S, I) = I // I is an integer constant Eval(S, ^ E1) = Ftch(S, Eval(S,E1)) Eval(S, E1 + E2) = Eval(S, E1) + Eval(S, E2) Eval(S, E1 - E2) = Eval(S, E1) - Eval(S, E2) … … ExpT (S, I) = false ExpT(S, ^ E1) = LocT(S,Eval(S,E1)) ExpT(S,E1 + E2) = ExpT(S,E1) or ExpT(S,E2) ExpT(S,E1 - E2) = ExpT(S,E1) or ExpT(S,E2) … … E.g., is the expression (^100)–2 tainted in store S? ExpT(S, (^100)–2) = ExpT(S, (^100)) or ExpT(S, 2) = LocT(S,100) or false = LocT(S,100) Note: ^ is the dereference operator, ^100 gives the content in the location 100
Taintedness-Aware Memory Model • Astore represents a snapshot of the memory state at a point in the program execution. • For each memory location, we can evaluate two properties: content and taintedness (true/false). • Operations on memory locations: • The fetch operation Ftch(S,A)gives the content of the memory address A in store S • The location-taintedness operation LocT(S,A) gives the taintedness of the location A in store S • Operations on expressions: • The evaluation operation Eval(S,E)evaluates expression E in store S • The expression-taintedness operation ExpT(S,E) computes the taintedness of expression E in store S.
Semantics of Language L • The following instructions are defined: • mov [Exp1] <- Exp2 • branch (Condition) Label • call FuncName(Exp1,Exp2,…) • Axioms defining mov instruction semantics • Specify the effects of applying mov instruction on a store • Allow taintedness to propagate from Exp2 to [Exp1]. • Axioms defining the semantics of recv (similarly, scanf, recvfrom: user input functions) • Specify the memory locations tainted by the recv call.
Example: strcpy() char * strcpy (char * dst, char * src) { char * res; 0: res =dst; while (*src!=0) { 1: *dst=*src; dst++; src++; } 2: *dst=0; return res; } 0: mov [res] <- ^ dst lbl(#while#6) branch (^ ^ src is 0) #ex#while#6 1: mov [^ dst] <- ^ ^ src mov [dst] <- (^ dst) + 1 mov [src] <- (^ src) + 1 branch true #while#6 lbl(#ex#while#6) 2: mov [^ dst] <- 0 mov [ret] <- ^ res Translate to formal semantics Theorem generation a) Suppose S1 is the store before Line L1, then LocT(S1,dst) = false b) If S0 is the store before Line L0, and S2 is the store after Line L1, then I < Eval(S0, ^dst) or Eval(S0, ^dst+dstsize) I => LocT(S2,I) = LocT(S0, I) c) Suppose S3 is the store before Line L2, then LocT(S3,dst) = false Theorem proving
Specifications Extracted • Suppose when function strcpy() is called, the size of destination buffer (dst) is dstsize, the length of user input string (src) is srclen • Specifications that are extracted by the theorem proving approach • srclen <= dstsize • The buffers src and dst do not overlap in such a way that the buffer dst covers the string terminator of the src string. • The buffers dst and src do not cover the function frame of strcpy. • Initially, dst is not tainted Documented in Linux man page Not documented