560 likes | 579 Views
G.Necula et al. Taming C Pointers . George C. Necula, Jeremy Condit, Matthew Harren. CCured: Type-Safe Retrofitting of Legacy Software . ACM Transactions on Programming Languages and Systems (TOPLAS), to appear, 2004. CQUAL: A Tool for adding type qualifiers to C. Presentation Group A2.
E N D
G.Necula et al. Taming C Pointers.George C. Necula, Jeremy Condit, Matthew Harren. CCured: Type-Safe Retrofitting of Legacy Software. ACM Transactions on Programming Languages and Systems (TOPLAS), to appear, 2004. CQUAL: A Tool for adding type qualifiers to C Presentation Group A2 CS342 Blake Johnson February 1, 2007
CQual: The Need for a Type Safe C • Standard C has no protection against invalid pointers, buffer overflows, or improper type casts. • Newer languages have more protection, but frequently have slower performance in many applications • Many large legacy programs are written in C, meaning it will be with us for a long, long time.
The Need for a Type Safe C: Earlier Attempts • The authors provide a summary of prior attempts to make C safer. • According to the authors, previous attempts to make C type/memory safe have been focused largely on dynamic bounds checks, which have the worst performance impact. • Applications such as Purify can reduce performance by as much as 90%
CCured - a Type Safe C • The driving insight behind CCured is that a majority of C pointers are already used in a provably type-safe way. • The authors set out to show that it is not necessary to perform every dynamic check on every pointer, thereby increasing performance without sacrificing safety.
CCured – a Type Safe C • CCured distinguishes itself by using static, compile-time checks, when possible, to maximize performance, and run-time checks only when necessary. • A developer can develop new code explicitly in the CCured language, or run the CCured inference engine on legacy C code, which will analyze pointer usage with its Inference Algorithm and convert the code to CCured.
Type Safe C – the CCured Approach • Static Analysis - Identifies pointers that are already used in a safe manner. • Additional Metadata - for potentially unsafe pointers, data consisting of base/limit addresses, and limited run-time type information is stored with each pointer/allocation. • Runtime Checks – Dynamic checks on every pointer read/write to ensure memory safety. CCured only performs checks determined to be necessary in static analysis.
Type Safe C – the CCured ApproachStatic Analysis • Determines type/memory safety before compilation, where possible. • Group pointers into three main groups: SAFE – Used in a type safe manner SEQ – Used with pointer arithmetic WILD/DYN – Cast to incompatible types • Run-time null pointer checking for all pointers. • CCured performs additional run-time checks for SEQ and WILD pointers. • The goal is to use as many SAFE pointers as possible, then as many SEQ pointers as possible of the pointers remaining, and then the rest as WILD pointers.
Type Safe C – the CCured ApproachSAFE pointers • SAFE pointers may not use pointer arithmetic or undergo most type casts. At run-time, every read/write through a SAFE pointer is checked against NULL. • SAFE pointers may not be cast to/from WILD pointers, but may be cast to to SEQ pointers.
Type Safe C – the CCured ApproachSAFE pointer optimization • Original CCured implementation resulted in too many WILD pointers, which have serious performance and compatibility issues. • Authors added an optimization allowing SAFE pointers to be type cast in certain common cases. • Safe casts can be a cast between identical types (based on physical structure), or an upcast to a compatible pointer type (see next slide).
Type Safe C – the CCured ApproachSAFE Pointer Casting struct type_1 { char a; int b; double c; float d; }; struct type_2 { char x; int y; }; struct type_3 { struct type_2 i double j; }; • type_2 is a prefix of and may be upcast from type_1 or type_3. • type_3 is a prefix of and may be upcast from type_1
struct linked_list_node { struct linked_list_node * next; }; struct int_linked_list_node { struct linked_list_node * next; int value; }; void function() { struct int_linked_list_node * a = malloc(sizeof(*a)); struct linked_list_node * b; b = (struct linked_list_node *)a; /* Upcast */ a = (struct int_linked_list_node *)b; /* Downcast (not normally allowed!) */ } Type Safe C – the CCured ApproachExample of SAFE Pointer Use
When run on the preceding program, Ccured detects the downcast as a bad cast and makes all of the pointers into WILD pointers. Type Safe C – the CCured ApproachExample of SAFE Pointer Use I wonder what would happen if we removed the downcast?
After the downcast is removed, all of the pointers become SAFE pointers. Type Safe C – the CCured ApproachExample of SAFE Pointer Use This is an indication that some manual tuning will likely result in better performance from CCured.
Usually there is only one dynamic check performed on SAFE pointers, a check for NULL, but there is another check as well, when you have a pointer to a pointer, which prevents you from storing a pointer to memory on the stack. For example: Type Safe C – the CCured ApproachAnother SAFE Pointer Example void my_function(int **ppvar) { int var = 1, *pvar; pvar = &var; *pvar = 2; *ppvar = pvar; } int main() { int *pvar; my_function(&pvar); *pvar = 3; return 0; }
The code compiles and the pointers are SAFE: Type Safe C – the CCured ApproachAnother SAFE Pointer Example
But look what happens when we run the program: There are a lot of subtleties to C, and it is not simple to catch them all. Type Safe C – the CCured ApproachAnother SAFE Pointer Example # ./a.out Failure STORE_SP at test2.c:9: my_function(): Storing stack address Abort (core dumped)
SEQ pointers may use pointer arithmetic but still may not undergo any cast that is not guaranteed to be safe. SEQ pointers carry additional metadata – pointers to the beginning and end of the allocated block. At runtime, every read/write through a SEQ pointer is checked against these boundaries. SEQ pointers may not be cast to WILD pointers. Type Safe C – the CCured ApproachSEQ pointers
void mystrcpy(char * dest, char * src) { while (*src) { *dest = *src; dest++; src++; } } /* This function is potentially unsafe in C if src does not contain a NULL terminator or if dest does not have sufficient space to store the string. When SEQ pointers are used with CCured, a dynamic bounds check is made on every pointer dereference, so either of those cases will result in an assertion failure. There is a significant performance overhead for functions like these, which is why they have included a special type of SEQ pointer, the STRING pointer (discussed later) */ Type Safe C – the CCured ApproachExample of SEQ Pointer Use
As expected, the pointers become SEQ pointers when Ccured is run Type Safe C – the CCured ApproachExample of SEQ Pointer Use
WILD pointers may use pointer arithmetic and also can be cast to arbitrary (incompatible) types. Like SEQ pointers, WILD pointers carry additional metadata. Attached to the WILD pointer itself is a pointer to the base of its memory region. The allocations pointed to by WILD pointers are special. They have a length field at the front, and additional flag bits for each word at the end, which are used to check whether pointers in the allocation are valid. These only pointers these regions may contain are WILD pointers. At runtime, every read/write/cast to a WILD pointer is checked against memory boundaries and to ensure only valid pointers are referenced. Type Safe C – the CCured ApproachWILD Pointers
int * WILD * SAFE is allowed, butint * SAFE * WILD (a WILD pointer to a SAFE pointer to integer) is illegal. This is because the memory pointed to by a WILD pointer can be altered arbitrarily and a safe pointer has no way of checking its own validity. WILD pointers are allowed in the wild memory region, because the authors add a flag bit for every word in the allocated region which is used to identify valid WILD pointers. These bits can be set and cleared as the region is cast to one type or another. The result is that one WILD pointer can start a chain reaction requiring other pointers to become WILD because it references them, even though they are used safely. Type Safe C – the CCured ApproachWILD Pointers
Type Safe C – the CCured ApproachWILD Pointers Here is an example of a WILD pointer and two WILD memory allocations int * WILD * WILD a; **a == 76
The general rule is that you can do anything with WILD pointers that you can do with ordinary C pointers. The difference is that CCured should catch your memory errors at runtime so they don’t do any damage. Type Safe C – the CCured ApproachWILD Pointer Usage struct str1 { int *i_ptr; }; struct str2 { char string[50]; }; int main(int argc, char **argv) { int i = 5; struct str1 s1, *s1_ptr = &s1; struct str2 *s2_ptr; s1_ptr->i_ptr = &i; s2_ptr = (struct str2 *)s1_ptr; s2_ptr->string[0] = '1'; *(s1_ptr->i_ptr) = 2; return 0; } /* (This program is likely to result in a run-time error) */
As expected, the code compiled fine but our pointers became WILD pointers. Now let’s try running our new executable Type Safe C – the CCured ApproachWILD Pointer Usage
Type Safe C – the CCured ApproachWILD Pointer Usage # ./a.out Failure LBOUND at wild1.c:15: main(): Lbound Abort (core dumped)
Let’s try another example of WILD pointer checking. In this example, I accidentally assign a pointer to a pointer to int to point directly to the int. int main() { int var = 1; int *ptr = &var; int **ptr_to_ptr; ptr_to_ptr = ptr; **ptr_to_ptr = 2; return 0; } Type Safe C – the CCured ApproachWILD Pointers – Another Example
The code compiles, and CCured detects the bogus cast and forces our pointers to become WILD pointers: Type Safe C – the CCured ApproachWILD Pointers – Another Example It’s interesting to note that int var, which is not a pointer, is also now flagged as WILD memory. In CCured, such variables are actually allocated on the heap, not on the stack, and var now has the length field and flag bits discussed earlier.
When we execute the code, the flag bits come into play. We attempt to dereference my int * * ptr_to_ptr, which we have accidentally set to point to the int itself (var), rather than a pointer to integer. int var, which is now actually a WILD memory allocation on the heap, does not have its flag bits set to indicate it holds a valid pointer, so when we attempt to dereference it, the following error appears: # ./a.out Failure NONPTR at wild2.c:9: main(): Non-pointer Abort (core dumped) Type Safe C – the CCured ApproachWILD Pointers – Another Example
Type Safe C – the CCured ApproachThe Inference Engine • CCured analyzes declarations, type casts, and expressions to infer pointer types. • Each declaration or cast produces a set of constraints based on a list of rules. For example, an assignment of the form (type 1 *)a = (type 2 *)b will produce the following constraints: • WILD a <=> WILD b • SEQ b => SEQ a • SEQ a ^ SEQ b => a[n] ≈ b[n’] • …
Type Safe C – the CCured ApproachThe Inference Engine • Another example: for an addition like (type1 *)a++ the sole constraint generated is • a != SAFE (because SAFE pointers cannot undergo pointer arithmetic) • After the constraints are constructed for the whole program, the engine uses these to determine which pointers can be SAFE according to the constraints, then SEQ, then WILD.
Type Safe C – the CCured ApproachThe Run-Time Engine • When a pointer is dereferenced, read from, or written to, CCured inserts additional checking code based on rules for each type of pointer reference. The simplest, rule, for dereference of a SAFE pointer, is: *x => assert(x != NULL); *x; For WILD pointers and pointers to pointers, the rules become significantly more complex and time consuming.
Type Safe C – the CCured ApproachType Safe? • Clearly, SAFE and SEQ pointers are reasonably type safe. However, the type safety provided by WILD pointers is actually somewhat minimal. • The only real type checking is that WILD pointers embedded in WILD allocations are valid (via the flag bits). This is sufficient to provided memory safety, but not absolute type safety. • For example, in CCured it is still possible to write int values into doubles, or read data across variable boundaries, etc. • This seems an appropriate compromise, however, because the ability to reinterpret areas of memory arbitrarily is an important part of the C language.
Type Safe C – the CCured ApproachCompatibility • CCured has poor compatibility with external libraries when WILD pointers are involved. • The CCured WILD metadata is hard to reconcile with the bare pointers expected by the libraries. • The authors’ solution is to add wrapper functions around most external library calls to handle the conversion. The function calls are fixed up by CCured to use the wrappers, which cast the pointers properly and may enforce other restrictions (for example, checking for sufficient space in a memory region before calling strncpy). • CCured comes with wrappers for most Operating System calls for the OS’ it is compatible with, as well as many standard C calls.
Type Safe C – the CCured ApproachCompatibility • Wrappers work fairly well when passing simple pointers. Unfortunately, even this approach is inadequate in the case of many complex data structures. For example, how to strip the data out of a two-dimensional WILD ** array? Or a struct with a SEQ * in it? • The wrapper method will be forced to do a complicated deep copy (both before and after the library call) of the entire data structure.
Type Safe C – the CCured ApproachCompatibility – SPLIT pointers • To improve compatibility, the authors introduce SPLIT and NOSPLIT data types. • SPLIT data types store their metadata away from the pointer itself. • For example, consider the structure: struct { int v; int SPLIT * SEQ p; }; • Without the SPLIT, the standard CCured representation of this type includes base and limit metadata as part of pointer p, but no external library would expect that. SPLIT keeps the additional metadata stored elsewhere in a separate table.
Type Safe C – the CCured ApproachCompatibility – SPLIT pointers • The inference system is further updated so that the programmer need only “seed” the type system by declaring a subset of SPLIT types explicitly, and CCured will figure out what other variables need to be SPLIT. • This unfortunately adds another layer of complexity, as SPLIT pointers are not allowed to point to NOSPLIT types, but NOSPLIT pointers are allowed to point to SPLIT types. For this reason it is beneficial to use SPLIT types sparingly. • The authors do not focus on the performance aspects of SPLIT, but it seems likely they are negative. • Unfortunately, some compatibility issues still remain, because library calls may modify memory regions, but they cannot update the CCured metadata. • Also, the authors’ current implementation does not support SPLIT WILD pointers. So it is very difficult to interface your code to an external library if you are using WILD pointers.
Type Safe C – the CCured ApproachPerformance: CCured Optimizations • The Authors have implemented several optimizations to improve performance. • FSEQ – For SEQ pointers than CCured can determine are only increased, and never decreased, CCured will store only the upper memory bound of the allocation with the pointer, and only perform a single bounds check against pointer references, rather than two.
Type Safe C – the CCured ApproachPerformance: CCured Optimizations • STRING – for SEQ pointers which only increase (like FSEQ pointers), and represent NULL terminated strings, the authors provide the STRING pointer. It stores no additional metadata beyond the pointer itself, just like SAFE pointers. When the length of the allocation is needed, strlen() is called. • Instead of checking against the memory boundaries with each access, it simply checks for the NULL terminator, which can significantly speed up string processing functions. • Memory safety is still ensured because CCured inserts another “secret” NULL terminator just after the string, which the program cannot overwrite.
Type Safe C – the CCured ApproachPerformance: CCured Optimizations • RTTI – This type of pointer is a variant of the SAFE pointer, used to allow safe, high performance downcasts in the same way SAFE pointers allow upcasts, without forcing the pointers to become WILD. • What is needed is the equivalent of the dynamic_cast in c++. • CCured creates a global tree data structure which stores the subtype relationships for the entire program. • RTTI pointers have an additional data field which identifies the node for the type in the global tree.
Type Safe C – the CCured ApproachPerformance: CCured Optimizations • With RTTI, when an upcast is performed on a SAFE pointer, it can be cast to an RTTI pointer, which then dynamically records the original subtype of the SAFE pointer. • This allows us to safely downcast from an RTTI pointer to a SAFE pointer of the original subtype (but dynamically detect an improper downcast). • This introduces some additional overhead, but is still far preferable to the alternative, which is to convert both pointers to WILD pointers.
Type Safe C – the CCured ApproachPerformance: CCured Optimizations • Unfortunately, the Inference Engine cannot automatically determine where RTTI pointers are desirable. • The user must explicitly define “seed” pointers as RTTI, and then the Inference Engine will propagate other pointers to RTTI as needed. • One drawback – the RTTI pointer must have initially been upcast from a SAFE pointer to the subtype in the first place – otherwise the CCured run-time system has no information about what the actual subtype is and downcasts cannot be allowed.
Type Safe C – the CCured ApproachPerformance • The Authors have found that performance varies greatly depending on the application. • For some computationally intensive applications, performance can be reduced by as much as 50%. • For many server applications, however, where security is highly important, as well as performance, they found slowdowns of 10% or less. • They conclude that CCured has significantly improved performance over other, fully-dynamic secure C alternatives.
CCured – the VerdictPositives • For most applications, particularly servers, where memory protection is perhaps the most critical, the performance appears to be excellent. Implementing a fast, memory safe C is a significant achievement. • I appreciate the flexibility of being able to write new code explicity in CCured, or allowing the program to convert my code to CCured for me. • This also allows developers to keep making use of millions of lines of C code • CCured could help C remain a first-class language on secure platforms such as Microsoft .NET.
CCured – the VerdictNegatives • CCured wraps calls to almost every library function that uses pointers (which seems to be most of them) to handle its pointer metadata properly. If you have a lot of custom libraries, you will likely have to go through the time-consuming process of writing your own wrappers. • Getting the promised performance and compatibility may not be trivial – in my experiments I found all of my pointers being turned into WILD pointers (the slowest) more frequently than I expected. I think some significant manual tuning might be needed on large applications, particularly for RTTI and SPLIT pointers. • Due to the way CCured relies on wrapping the library calls of the host operating system, this limits the portability of CCured itself. It was quite a bit of work for me to make a basic port of CCured to FreeBSD.
CCured – the VerdictConclusion • If you are willing to put in the effort to get your software working properly with CCured, it does deliver on its promises. • CCured has gotten a lot more complex from the first paper to the second. • An interesting idea would be to compile your entire OS from the bottom up with CCured. The CCured system has trouble fitting into a normal C world (hence all of the wrappers, SPLIT, RTTI required, etc), but if its protections were build in to your system libraries (and your compiler) from the start, it would probably be much simpler to use.
Cqual Extensible Type Qualifiers for C • This is a fun, straightforward tool to let you add your own arbitrary type qualifiers to C. • Cqual’s checking is performed entirely at compile time, using an inference engine somewhat similar to that of CCured. • The Author’s propose several applications of the tool, particularly relating to uncovering bugs in software.
Cqual What are type qualifiers? • Of course, the C language already has two type qualifiers: const and volatile. These have specific meanings for C. The compiler checks to make sure you don’t mix qualifiers improperly. For example: int var = 0; const int * c; int * i; c = &var; i = c; • This code will produce an error (or at least a warning), because you assigned a pointer to a const integer to a pointer to a non-const integer. The type was essentially the same, but the qualifiers were different.
CqualWhat are Cqual type qualifiers? • Cqual lets you add your own custom qualifiers (prefixed with a $) that can mean anything. Then it uses its inference engine (which works in a similar manner to the inference engine of CCured), to verify that your qualified types are being used correctly. • The qualifiers are treated similarly to the built-in qualifiers (with a few exceptions to be discussed in a bit), except that they have meaning only to the programmer. • This allows you to distinguish between types you know to be different, even though they appear the same to C.
CqualWhat are Cqual type qualifiers? • Another significant difference between Cqual qualifiers and the built-in C qualifiers is that you typically need only specify the qualifiers for a small subset of the variables. The inference engine will automatically determine the qualifers for the other variables (similar to the way CCured determines RTTI pointers from a set of “seeded” RTTI pointers). • The authors point to this as key to Cqual’s ease of use.
CqualExample (from the Authors) /* User-controlled strings can represent a security threat when used as an argument to *printf. This code introduces two qualifiers, $tainted and $untainted, to isolate such strings so they cannot be used as arguments to printf . */ $tainted char *getenv(const char *name); int printf($untainted const char *fmt, ...); int main(void) { char *s, *t; s = getenv("LD LIBRARY PATH"); t = s; printf(t); } /* Notice that it was not necessary to qualify char *s and *t. The inference engine will assign them the correct qualifiers automatically and generate an type mismatch error on the call to printf(). */