740 likes | 947 Views
Assume/Guarantee Reasoning using Abstract Interpretation. Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv. Limitations of Whole Program Analysis. Complexity of Chaotic Iterations Not all the source code is available Large libraries Software components No interaction with the client
E N D
Assume/Guarantee Reasoningusing Abstract Interpretation Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv
Limitations of Whole Program Analysis • Complexity of Chaotic Iterations • Not all the source code is available • Large libraries • Software components • No interaction with the client • Program design
A Motivating Example List rev(List x) requires acyclic(x) ensures $$=reverse(x) List rev(List x) { if (x ==null) return null ; return append(rev(xnext), x); } Contract List append(List x, List y) { List e; if (x == null) return y; e = malloc(…); edata = xdata; enext = append(xnext, y); } List append(List x, List y) requires acyclic(x) acyclic(y) ensures $$= x || y Contract Can also used for runtime testing
Challenges in A/G Reasoning • Specifying procedure contracts • Performing abstract interpretation using contracts
Specifying Contracts • Executable specifications • assert • Can use loops • Expressive • Natural • But what about side-effects • Declarative specifications • Types • First order logic • Z • Hybrid • Larch • Java Modeling Language
Procedure Contracts and Modularity • The postcondition does not reveal the whole story List foo(List x) requires acyclic(x) acyclic(y) ensures true void foo(List x, List z) { List y, t ; y = rev(x); t = rev(z); } List rev(List x) requires acyclic(x) ensures $$=reverse(x)
Procedure Contracts and Modularity • Specify parts of the state which may be modified • But difficult to define potential side-effects • Can use abstract interpretation List foo(List x) requires acyclic(x) acyclic(y) ensures true void foo(List x, List z) { List y, t ; y = rev(x); t = rev(z) } List rev(List x) requires acyclic(x) ensures $$=reverse(x)
Issues in Specifying Contracts • Expressible • Conciseness • Natural • Reuse • Cost of dynamic check (model checking) • Decidability • Cost of abstract interpretation
Plan • CSSV: A tool for verifying absence of buffer overruns (N. Dor) • An algorithm for performing abstract interpretation in the most precise way using specification
CSSV: Towards a Realistic Tool for Statically Detecting All Buffer Overflows in C Nurit Dor, Michael Rodeh, Mooly Sagiv DAEDALUSproject
Null dereference Dereference to unallocated storage Out of bound pointer arithmetic Out of bound update Vulnerabilities of C programs /* from web2c [strpascal.c] */ void foo(char *s) { while ( *s != ‘ ‘ ) s++; *s = 0; }
Is it common? • General belief – yes! • FUZZ study • Test reliability by random input • Tens of applications on 9 different UNIX systems • 18% – 23% hang or crash • CERT advisory • Up to 50% of attacks are due to buffer overflow • COMMON AND DANGEROUS
CSSV’s Goals • Efficient conservative static checking algorithm • Verify the absence of buffer overflow • not just finding bugs • All C constructs • Pointer arithmetic, casting, dynamic memory, … • Real programs • Minimum false alarms
string(src) string(dst) (size > len(src)+len(dst)) alloc(dst+len(dst)) > len(src)) Verifying Absence of Buffer Overflow is non-trivial void safe_cat(char *dst, int size, char *src ) { if ( size > strlen(src) + strlen(dst) ) { dst = dst + strlen(dst); strcpy(dst, src); } } {string(src) string(dst)alloc(dst+len(dst)) > len(src)} {string(src) alloc(dst) > len(src)}
Complex linear relationships Pointer arithmetic Loops Procedures Use Polyhedra[CH78] Pointer analysis Widening Procedure contracts Can this be done for real programs? Very few false alarms!
V = { (1,2) (2,1) } R = { (1,0) (1,1) } 0 1 2 3 y 0 1 2 3 x Linear Relation Analysis • Cousot and Halbwachs, 78 • Statically analyze program variable relations: a1* var1 + a2* var2 + … + an* varn b • Polyhedron y 1 x + y 3-x + y 1
C String Static Verifier • Detects string violations • Buffer overflow (update beyond bounds) • Unsafe pointer arithmetic • References beyond null termination • Unsafe library calls • Handles full C • Multi-level pointers, pointer arithmetic, structures, casting, … • Applied to real programs • Public domain software • C code from Airbus
Plan • Semantics for C program • Contract language • Static analysis algorithm • Implementation
‘x’ 0x5050510 0x5050518 0 Standard C Semantics dst 0x480580 0x5058510 size 0x480584 125 void safe_cat( char *dst, int size, char *src ) { if ( size > strlen(src) + strlen(dst) ) { dst = dst + strlen(dst); strcpy(dst, src); } } src 0x480588 0x6000009 ‘y’ 0x6000009 0x6000A00 0
dst 0x480580 0x5058510 size 0x480584 125 src 0x480588 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 Instrumented C Semantics base asize 4 4 0x6000009 4 130 245
‘x’ 0x5050510 0x5050518 0 Instrumented C Semantics base asize offset dst 0x480580 0x5058510 4 0 size 0x480584 4 125 src 0x480588 0x6000009 4 9 130 0x6000000 245 ‘y’ 0x6000009 0x6000A00 0
i offset(dst) asize(base(dst)) base(dst) dst Safety • The instrumented semantics checks validity of C expressions • ANSI C • Cleanness • dst = dst + i offset(dst) + i asize(base(dst))
Contracts • Defined in the instrumented semantics • Specify string behavior of procedures (C expressions) • Precondition • Postcondition • Use of values at procedure entry • Side-effects • Can be approximated from pointed information • No need to specify pointer information • Not aiming for modular pointer analysis
Contracts’ Advantages • Modular analysis • Use contracts on call statements • Not all the code is available • Enable more expensive analyses • User control of the verification • Detect errors at point of logical error • Improve the precision of the analysis • Check additional properties • Beyond ANSI-C
Example char* strcpy(char* dst, char* src) requires mod ensures ( string(src) alloc(dst) > len(src) ) dst ( len(dst) = = [len(src)]pre return = = [dst]pre )
safe_cat’s contract void safe_cat(char* dst, int size, char* src) requires mod ensures ( string(src) string(dst) alloc(dst) == size ) dst ( len(dst) <= [len(src)]pre + [len(dst)]pre len(dst) >= [len(dst)]pre )
Contracts and Soundness • All errors are detected • Violation of statement’s precondition • …a[i]… • Violation of procedure’s precondition • Call • Violation of procedure's postcondition • Return • Violation messages depend on the contracts • But may lead to more false alarms (e.g., trivial contracts)
CSSV Static Analysis • Inline contracts • Expose behavior of called procedures • Pointer analysis (global) • Find relationship between base addresses • Integer analysis • Compute offset information
Step 1: Inliner void safe_cat( char *dst, int size, char *src ) requires ( string(src) string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)]pre + [len(dst)]pre ) void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)]pre return = = [dst]pre )
Step 1: Inliner void safe_cat( char *dst, int size, char *src ) requires ( string(src) string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)]pre + [len(dst)]pre ) void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } assume assert char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)]pre return = = [dst]pre )
Step 1: Inliner void safe_cat( char *dst, int size, char *src ) requires ( string(src) string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)]pre + [len(dst)]pre ) void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } assert char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)]pre return = = [dst]pre ) assume
Step 2: Compute Pointer Information • Required for reasoning about pointers • Every base address is abstracted by an abstract location • Relationships between base addresses is computed (points-to) • Global analysis • Scalable • Imprecise • Flow insensitive • (Almost) Context insensitive
Global Points-To safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } dst src main() { char s[10], t[20],r; char *p1, *p2; … p1= r + i; safe_cat(s,10,p1); p2 = r + j; safe_cat(t,10,p2); … } s t r p2 p1
Procedural Points-to (PPT) • “Project” pointer information on visible variables of the procedure • Introduce abstract locations for formal parameters • Allow destructive updates through formal parameters (well behaved programs) • Can decrease precision in some procedures
PPT dst src safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } Param #1 Param # 2
Step 3: Static Analysis • Prove linear inequalities on string indices • Abstract string properties using constraint variables • Use abstract interpretation to conservatively interpret program statements • Verify safety preconditions
‘x’ 0x5050510 0x5050518 0 Back to Semantics base asize offset dst 0x480580 0x5058510 4 0 size 0x480584 4 125 src 0x480588 0x6000009 4 9 130 0x6000000 245 ‘y’ 0x6000009 0x6000A00 0
‘x’ 0x5050510 0x5050518 0 Abstract Representation dst 0x480580 dst 0x5058510 size 0x480584 125 src 0x480588 size 0x6000009 src n1 0x6000000 n2 ‘y’ 0x6000009 Base address relationship 0x6000A00 0
Constraint Variables • For every abstract location a.offset src src.offset = 9
Constraint Variables • For every integer abstract location a.val size size.val = 125
Constraint Variables • For every abstract location a.is_nullt a.len a.asize n1.len n1.asize n1 0
Abstract Representation dst.offset < n1.len dst size size.val+ dst.offset = n1.asize src n1.is_nullt = true n1 n2 n2.is_nullt = true
n1.is_nullt = true dst.offset < n1.len size.val + dst.offset = n1.asize dst.offset n1.len n1.asize 0 size.val What does it represent? dst ? size ? ?
dst.offset = n1.len size.val = n1.asize - dst.offset + n1.len Abstract Interpretation dst.offset < n1.len size.val = n1.asize - dst.offset dst = dst + strlen(dst);
offset(dst) + i asize(base(dst)) dst.offset + i.val n1.asize dst i offset(dst) asize(base(dst)) dst.offset i n1.asize n1 base(dst) dst Verify Safety Condition dst = dst + i concrete semantics abstract semantics
The Assume-Operation • Use two copies of constraint variables • Set modified values to ⊤ • Meet the post
Procedure’sPointer info PreModPost Pointer Analysis Inliner C2IP Cfiles Cfiles Cfiles C’files Integer Procedure Potential Error Messages Integer Analysis CSSV Implementation contracts Procedure name
Used Software • ASToolKit [Microsoft] • Core C [TAU - Greta Yorsh] • GOLF [Microsoft - Manuvir Das] • New Polka [Inria - Bertrand Jeannet]
Applications • Verified string library from Airbus with 6 false alarms • Could be avoided by analyzing correlated conditions • Found 8 real errors in another string intensive application with 2 false alarms • In one case safety depends on correctness • Could be avoided by defensive programming • 1 - 206 CPU seconds per procedure • No optimizations • Very few false alarms
Related Work Non-Conservative • Wagner et. al. [NDSS’00] • LCLint’s extension [USENIX’01] • Eau Claire [IEEE Oakland 02] Conservative • Polyspace verifier • Dor, Rodeh and Sagiv [SAS’01]