220 likes | 332 Views
CIL: Infrastructure for C Program Analysis and Transformation. George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/~necula/cil. ETAPS – CC ’02 Friday, April 12. What is CIL?. Distills C language into a few key forms
E N D
CIL: Infrastructure for C Program Analysis and Transformation George C. Necula, Scott McPeak, S. P. Rahul, Westley Weimer http://www.cs.berkeley.edu/~necula/cil ETAPS – CC ’02 Friday, April 12
What is CIL? • Distills C language • into a few key forms • with precise semantics • Parser + IR + Program Merger for C • Maintains types, close ties to source • Highly structured, clean subset of C • Handles ANSI/GCC/MSVC
Why CIL? • Analyses and Transformations • Easy to use • impersonates compiler & linker • $ make project CC=cil • Easy to work with • converts away tricky syntax • leaves just the heart of the language • separates concepts
C Feature Separation • CIL separates language components • pure expressions • statements with side-effects • control-flow • embedded CFG • Keeps all programmer names • temps serialize side-effects • simplified scoping
Example: C Lvalues • An exp referring to a region of storage • Example: rec[1].fld[2] • May involve 1, 2, 3 memory accesses • 1 if rec and fld are both arrays • 2 if either one is a pointer • 3 if rec and fld are both pointers • Syntax (AST) is insufficient
CIL Lvalues • An exp referring to a region of storage lval ::= <base ´ offset> base ::= Var(varinfo) | Mem(exp) offset ::= None | Field(f ´ offset) | Index(exp ´ offset)
CIL Lvalues • Example: rec[1].fld[2] becomes either: <Var(rec), Index(1, Field(fld, Index(2, None)))> or: <Mem(2 + Lvalue(<Mem(1 + Lvalue(<Var(rec), None>)), Field(fld, None)>), None> • Full static and operational semantics
Semantics • CIL gives syntax-directed semantics • Example judgment: environment meaning lvalue form
CIL output: struct __anonstruct1 { int fld[3] ; }; typedef struct __anonstruct1 * Myptr; Myptr rec; (rec + 2)->fld[1] = (int)’h’; SUIF 2.2.0-4 output: typedef int __ar_1[3]; struct type_1 { __ar_1 fld; }; struct type_1 * rec; (((((int *)(((char *)&((((struct type_1 *) (rec))))[2])+0U))))[1]) =(104); CIL Source Fidelity typedef struct { int fld[3]; } * Myptr; Myptr rec; rec[2].fld[1] = ’h’;
Corner Cases • Your analysis will not have to handle: • return ({goto L; p;}) && ({L: 5;}); • return &(--x ? : z) - & (x++, x); • Full handling of • GNU-isms, MSVC-isms • attributes • initializers
Corner Cases • Your analysis will not have to handle: • return ({goto L; p;}) && ({L: 5;}); int tmp; goto L; if (p) { L: tmp = 1; } else { tmp = 0; } return tmp;
StackGuard Transform • Cowan et al., USENIX ’98 • Buffer overrun defense • push return addess on private stack • pop before returning • only change functions with local arrays • 40 lines of commented code with CIL • Quite easy: uses visitors for tree replacement, explicit returns, etc.
Other Transforms • Instrument and log all calls: 150 lines • Eliminate break, continue, switch: 110 • 1 memory access per assignment: 100 • Make each function have a single return statement: 90 • Make all stack arrays heap-allocated: 75 • Log all value/addr memory writes: 45
Whole-Program Merger • C has incremental linking, compilation • coupled with a weak module system! • Example (vortex / gcc / c++2c): /* foo.c */ struct list { int head; struct list * tail; }; struct list * mylist; /* bar.c */ struct chain { int head; struct chain * tail; }; extern struct chain * mylist;
Merging a Project • Determine what files to merge • Merge the files • handle file-scoped identifiers • C uses name equivalence for types • but modules need structural equivalence • Key: Each global identifier has 1 type!
Other Merger Details • Remove duplicate declarations • every file includes <stdio.h> • Match struct pointer with no defined body in file A to defined body in file B • Be careful when picking representatives
How Does it Work? • Make project, pass all files through CIL • Run your transform and analysis • Emit simplified C • Compile simplified C with GCC/MSVC • … and it works!
Large Programs Used in the CCured and BLAST projects
Merged Kernel Stats • Stock monolithic Linux 2.4.5 kernel • http://manju.cs.berkeley.edu/cil/vmlinux.c • Statistics: Before | After • 324 files | One 12.5MB file • 11.3 M-words | 1.5 M-words • 7.3 M-LOC (post-process) | 470 K-LOC • $ make CC=“cil –merge” HOSTCC=“cil –merge” LD=“cil –merge” AR=“cil –mode=AR –merge”
Conclusion • CIL distills C to a precise, simple subset • easy to analyze • well-defined semantics • close to the original source • Well-suited to complex analyses and source-to-source transforms • Parses ANSI/GCC/MSVC C • Rapidly merges large programs
Questions? • Try CIL out: • http://www.cs.berkeley.edu/~necula/cil • Complete source, documentation and test cases freely available