200 likes | 372 Views
Run-Time Type Checking for Pointers and Arrays in C. Wes Weimer, George Necula Scott McPeak, S.P. Rahul, Raymond To. What are we doing?. Add run-time checks to C programs Catch pointer and array errors Minimal user effort More effort yields more performance Make C “feel” as safe as Java.
E N D
Run-Time Type Checkingfor Pointers and Arrays in C Wes Weimer, George Necula Scott McPeak, S.P. Rahul, Raymond To OSQ Retreat
What are we doing? • Add run-time checks to C programs • Catch pointer and array errors • Minimal user effort • More effort yields more performance • Make C “feel” as safe as Java OSQ Retreat
Motivation • 50% of software errors are due to pointers • 50% of security errors due to buffer overruns • Such errors are often hard to reproduce • Difficult to locate true source of errors OSQ Retreat
Overview • Motivation and System Goals • Checkable Errors • Run-Time Representation • Static Analysis • Preliminary Results • Future Work OSQ Retreat
Goals • Support existing C code • Compatibility with external libraries • Handle GCC/MSVC source, Makefiles • Efficiency: 50% overhead rather than 1000% • Research: 5x, Purify 10x, BoundsChecker 150x • Default: many checks • Reduce by static analysis and/or user annotations OSQ Retreat
Checkable Errors • Array and pointer bounds checks • Well-understood • Dereferencing a non-pointer (or NULL) • Complicated by casts and unions • Pointer arithmetic outside of object bounds • Not always caught by Purify, etc. • Freeing non-pointers, using freed memory OSQ Retreat
Required Information • Checks require information about pointers • Length, base, capabilities, etc. • Can be stored in a global table • High table-lookup overhead: 500% • Can be stored with each pointer • struct { Foo *p; Foo *base; Foo *end; } SafeFoo • Library compatibility is tricky OSQ Retreat
Must keep track of which locations are valid pointers Use per-object tags (like in GC) int **X; int *Y; *Y = 55; // OK X = Y; printf(“%d”,**X); // CRASH! More is Needed: Tags OSQ Retreat
Run-Time Representation • Associate with each object in memory: • Base (lower bound), End (upper bound) • Tags (bitfield: 1 bit per word: is it a valid pointer?) • Checks bounds on every access, check tags on pointer reads, set tags on every write • Example: struct { int x; int *y; } *p; base p 0 1 end x y tags OSQ Retreat
Kinds of Pointers • Many pointers only move forward (no casts) • Notably C strings: for (; *p; p++) if *p==‘c’ … • Such “forward” pointers need only an end bound • Many pointers are not involved in evil casts • But may use pointer arithmetic: arrays • Such “index” pointers need not carry tags OSQ Retreat
Kinds of Pointers • Many pointers are completely “safe” • No evil casts, no arithmetic, etc. • e.g., FILE * fin = fopen(“input”, “r”); • These can be represented without any extra information (just a NULL check when used) • These cases yield better performance! OSQ Retreat
Physical Subtyping • Define a formal notion of representation equality and subtyping for casts • Keep pointers and scalars separate! • Intuition: struct {char a[4];} = struct {int x;} struct {char a[4];} ¹ struct {int *x;} struct {int a; int b;} £ struct {int a;} OSQ Retreat
Extended Type System • Simplified C types: • t ::= int | t ref q | t1t2 | t1+t2 -- Types • q ::= safe | string | seq | wild -- Qualifiers • safe = one word: standard C pointer • seq = three words: pointer, base, end • wild = two words: pointer, base, end, tags OSQ Retreat
Type System (continued) • t ref wild, t must contain only wild pointers • May cast between safe and seq • wild may only be cast to or from wild • Physical equivalence: • shortshort = int a(b+c)= (ab)+(ac) • Width subtyping: t1t2£t1 OSQ Retreat
Some Typing Rules address of field pointer arithmetic OSQ Retreat
Handling Casts cast between pointers When is t1 ref q1£t2 ref q2 ? OSQ Retreat
Static Analysis & Inference • For every pointer in the program • Try to infer the fastest safe representation • This is like eliminating classes of run-time checks we know will never fail • Can be formulated as constraint-solving • Apply subtyping rules to casts to get constraints • O(E) where E is number of casts/assignments (flow insensitive) OSQ Retreat
Preliminary Results OSQ Retreat
Future Work • Encode type information at run-time • More expressive casts with low overhead • More complete handling of function pointers • Handle C polymorphism • Uses void*, requires vast overhead • Efficient memory management • GC (or something else) takes free() as a hint OSQ Retreat
Conclusion • Can add efficient run-time checks to C • Check bounds, valid pointers, frees, etc. • Static analysis is fast and useful • Can support existing C code • Whole programs are considered • safe pointers and wrappers for libraries • Default to many checks, infer them away OSQ Retreat