400 likes | 563 Views
Symbolic Analysis for Buffer Overflow. Surinder Kumar Jain School of IT. Supervisor : Bernhard Scholz University of Sydney, Australia. Buffer overflow threats. Vulnerability Note VU#180513. Buffer Overflow. Update/Read beyond bounds of buffer Results in Erratic program behaviour
E N D
Symbolic Analysis for Buffer Overflow Surinder Kumar Jain School of IT Supervisor : Bernhard Scholz University of Sydney, Australia
Buffer overflow threats Surinder Kumar Jain - suri@it.usyd.edu.au
Vulnerability Note VU#180513 Surinder Kumar Jain - suri@it.usyd.edu.au
Buffer Overflow • Update/Read beyond bounds of buffer • Results in • Erratic program behaviour • Program crashes • Security breaches • Caused by • Array access outside array limits • Pointer reference errors • Array indicies errors Surinder Kumar Jain - suri@it.usyd.edu.au
Array Access Errors • Variable array index • Modified in a loop i=0 array a[b-1] while c < b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 /*buffer overflow */ if e > 0 f=f+2 else e=g*e Buffer overflow during (last iteration of the loop. Surinder Kumar Jain - suri@it.usyd.edu.au
Static Program Analysis • Data flow analysis • Abstract Interpretation • Model checking • Symbolic Analysis • Analyse Program behaviour • Without running the program Techniques Surinder Kumar Jain - suri@it.usyd.edu.au
Symbolic Analysis & Execution • Symbolic execution of the program • Execute a program with symbolic values • Symbolic domains, predicates, semantics • Relate symbolic results to concrete interpretation Surinder Kumar Jain - suri@it.usyd.edu.au
Array bounds violation Analysis • Enumerate program paths in a loop • For each program path, do • Symbolic execution • Compare array indices with array bounds Surinder Kumar Jain - suri@it.usyd.edu.au
Example i=0 array a[b-1] while c < b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 /*buffer overflow*/ if e > 0 f=f+2 else e=g*e • Number of Loop iterations = min(b-c0,m-n-2) • Value of c during i’th iteration (closed form of c) at line a[c]=0 is ci = c0+i • Value of c in final iteration is c = c0 +(b – c0) = b • Hence statement a[c]=0 when c=b causes buffer overflow in program path where m-n-2>=b-c0 c0 is value of c at the start of the program Surinder Kumar Jain - suri@it.usyd.edu.au
Problem in General • Undecidable • Enumerate program paths • State explosion problem with loops • How to do it for general programs with GoTos • Symbolic execution of a Loop • Unknown number of repetitions • Conditional assignments inside the loop Surinder Kumar Jain - suri@it.usyd.edu.au
Enumerating Program Paths • Gulwani et al. • non-deterministic semantics - no GoTos • Burgstaller et al. • Path expressions algebra – with GoTos • Loops as black boxes • Extend non-deterministic semantics to control flow graphs • Loop paths analysis • Algorithm to • Enumerate disjoint acyclic program paths for any program Surinder Kumar Jain - suri@it.usyd.edu.au
Symbolic Execution • Algorithmto • Do symbolic execution • Compute path condition • Eliminate invalid paths • For each loop in a program path, solve • Closed form of loop induction variables • Symbolic loop counter • Number of loop iterations Surinder Kumar Jain - suri@it.usyd.edu.au
Solving Program Loops i=0 array a[b-1] while c<b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 /*buffer overflow*/ if e>0 f=f+2 else e=g*e Recurrence System (for j) : j(i+1)=j(i)+b Solution is : j(i)= j(0)+i*b Surinder Kumar Jain - suri@it.usyd.edu.au
Solving Program Loops …. i=0 array a[b-1] while c<b and i>m and i<n i=i+1 j=j+b d=2*d c=c+1 a[c]=0 if e>0 f=f+2 else e=g*e • Loop continue condition : • i > m :: i is in range [m+1,infinity] • i < n :: i is in range [-infinity,n-1] • And’ing the two we get : i is in range [n-1,m+1] • Loop non-entry condition is : m+1<n-1 • Loop entry condition is : m+1>=n-1 • Loop counter is : (m+1)-(n-1)=m-n-2 Values at end of loop : j = j0 + (m-n-2)*b d = d0 * 2(m-n-2) Surinder Kumar Jain - suri@it.usyd.edu.au
Solving Program Loops …. • Computer Algebra algorithms to • Solve recurrence systems • Solve loop exit condition • Solve loop counters as symbolic expressions • Solve number of loop iterations as symbolic expressions • Use skolmisation techniques for unsolvable cases Surinder Kumar Jain - suri@it.usyd.edu.au
Path State Explosion • Extend the notion of Burgstaller’s path expressions • Extend Gulwani’s semantics • Combine them together defining new • Group of paths as non-deterministic domains Surinder Kumar Jain - suri@it.usyd.edu.au
Non-Determinism and Paths If c then s10 else s11 s2 If d then s3 choose( {assume(c);s10;s2;assume(d);s3 assume(-c);s11;s2;assume(d);s3 assume(c);s10;s2;assume(-d) assume(-c);s11;s2;assume(-d) } ) Path Condition Path statements c & [[s2]][[s10]]d s10;s2;s3 -c & [[s11]][[s2]]d s11;s2;s3 c & [[s2]][[s10]](-d) s10;s2 -c & [[s2]][[s10]](-d) s11;s2 Surinder Kumar Jain - suri@it.usyd.edu.au
Path Enumeration Loops… while c if a1 then A1 else if a2 then A2 … … … else if an then An • Deterministic program Paths : nm • (m is the number of loop iterations) • Non-deterministic program paths : n+1 • (independent of loop iterations) • Reduction from exponential to polynomial of degree nested loop depth Surinder Kumar Jain - suri@it.usyd.edu.au
Initial Results Surinder Kumar Jain - suri@it.usyd.edu.au
Conditional recurrences while p A if c then B else C endif D endwhile while p while p & [[A]]c A B D endwhile while p & -[[A]]c A C D endwhile endwhile Surinder Kumar Jain - suri@it.usyd.edu.au
Experimental Work We develop program to • Enumerate program paths • path conditions • Interface with a Computer Algebra System to • Obtain closed form for loop induction variables • Solve loop exit condition • Solve number of loop iterations • Perform symbolic execution • Array bound comparison • Reporting array bound violation (Static analysis) Surinder Kumar Jain - suri@it.usyd.edu.au
Experimental Work …. • Analyse a small program • for all possible paths • to crash it Or • Analyse an open source system • for Buffer Overflow reporting • Prioritised as – Definite – May be • With full set of counter examples Surinder Kumar Jain - suri@it.usyd.edu.au
Summary • Array access in loops causes buffer overflow error • Enumerate disjoint program paths & conditions • Non-deterministic semantics for path expressions • Symbolic execution of program paths to identify buffer overflows • Program loops cause path enumeration state explosion • Non-deterministic domains to reduce state explosion Surinder Kumar Jain - suri@it.usyd.edu.au
Time Table Surinder Kumar Jain - suri@it.usyd.edu.au
References References [1] Johann Blieberger Bernd Burgstaller, Bernhard Scholz. Symbolic Analysis An Algebra-based approach. [2] G. Canfora, A. Cimitile, and A. De Lucia. Conditioned program slicing. Information and Software Technology, 40(11-12):595607, 1998. [3] T. Fahringer and B. Scholz. Advanced symbolic analysis for compilers. Springer-Verlag New York, Inc. Secaucus, NJ, USA, 2003. [4] Michael P. Gerlek, Eric Stoltz, and Michael Wolfe. Beyond induction variables: detecting and classifying sequences using a demand-driven ssa form. ACM Trans. Program. Lang. Syst., 17(1):85122, 1995. [5] S. Gulwani, S. Jain, and E. Koskinen. Control-refinement and progress invariants for bound analysis. PLDI, 2009. [6] M.R. Haghighat and C.D. Polychronopoulos. Symbolic analysis for parallelizing compilers. ACM Transactions on Programming Languages and Systems (TOPLAS), 18(4):477518, 1996. Surinder Kumar Jain - suri@it.usyd.edu.au
Conferences • SCAM 2010 TConference on Source Code Analysis and Manipulation, 12th-13th September 2010, Timişoara, Romania, Deadline : 23rd April, 2010 • FSE-18 2010 - ACM SIGSOFT / FSE ICSE - ideas, innovations, trends, and experiences in the field of software engineering, deadline 5 March 2010. • POPL 2010 - ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM'2010) Jan 2010 (Closed) • PLDI 2010 – ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation deadline: November 13th (closed) • ESOP 2010: devoted to fundamental issues in the specification, analysis, and implementation of programming languages and systems (closed) Surinder Kumar Jain - suri@it.usyd.edu.au
? Surinder Kumar Jain - suri@it.usyd.edu.au
Canonical form Program = Choose (path1, path2, … , pathn) Where each pathi = (ci, Si) is a program path. Path conditions are in semi-canonical form if For all i & j, ci&cj=False i.e. only one of the paths can be taken as only one of the path conditions can be true at any time. Path conditions are in canonical form if • They are in semi-canonical form and • c1or c2 or … or cn=True • i.e. one of the path will always be taken
Enumerating Canonical form Program = Choose ((a,A),(b,B)) Its canonical form is : Choose((a&b,A;B),(a&-b,A),(-a&b,B),(-a&-b,SKIP)) A program with n program paths has at most 2n program paths in its canonical form.
Assume and Choose • Assume is a path condition • Choose permits choice of execution between program paths • A general control flow graph can be represented as a choice of program paths* E.g. Program = Choose (path1, path2, … , pathn) Where each pathi = (ci, Si) is a program path. *Burgstaller et al & Gulwani et al
Canonical form Program = Choose (path1, path2, … , pathn) Where each pathi = (ci, Si) is a program path. Path conditions are in semi-canonical form if For all i & j, ci&cj=False i.e. only one of the paths can be taken as only one of the path conditions can be true at any time. Path conditions are in canonical form if • They are in semi-canonical form and • c1or c2 or … or cn=True • i.e. one of the path will always be taken
Enumerating Canonical form Program = Choose ((a,A),(b,B)) Its canonical form is : Choose((a&b,A;B),(a&-b,A),(-a&b,B),(-a&-b,SKIP)) A program with n program paths has at most 2n program paths in its canonical form.
Enumerating Union P1 = Choose ((a1,A1), (a2,A2), … , (an,An)) P2 =Choose ((b1,B1), (b2,B2), … , (bn,Bm)) P1 union P2 = Choose ((a1,A1), (a2,A2), … , (an,An)(b1,B1), (b2,B2), … , (bn,Bm)) has m+n paths with 2m+n maximum canonical paths. If P1 and P2 were in canonical form then maximum number of canonical paths of P1 union P2 is m*n.
Enumerating Sequence P1 = Choose ((a1,A1), (a2,A2), … , (an,An)) P2 =Choose ((b1,B1), (b2,B2), … , (bn,Bm)) P1;P2 is new program which executes P1 followed by execution of P2. If P1 and P2 were in canonical form then maximum number of canonical paths of P1;P2 is m*n.
Enumerating Loops Consider the following program While c Do S This program can be split into two program paths ; Choose((-co,Skip),(c0,Smu)) Where c0 is the initial value of c and mu is a function that returns an integer greater than zero, the number of times loop is executed. (mu can be infinite) If program uses symbolic expressions as program variable values then mu is a function over symbolic expressions. If loop has a closed form then mu can be expressed as a symbolic expression too.
Enumerating 2-path Loop Consider the following program While p Do Choose ((a,A),(-a,B)) Unrolling the loop gives following program path sequences : • (-p0,skip) • (p0&a0,A) • (p0&-a0,B) • (p0&p1&a0&a1,A;A) • (p0&p1&a0&-a1,A;B) • …. • (p0&p1&…&pi&x0&x1&…&xi,X1;X2;…;Xi) • Where pk is value of predicate p at the start of k’th iteration of the loop • ak is value of the predicate a at the start of k’th iteration of the loop • xk refers to ak if ak evaluates to True otherwise it refers to –ak • Xk refers to A if ak evaluates to True otherwise it refers to B • All values are in symbolic expressions • Number of Program Paths is : 2mu+2mu-1+2mu-2+….+20
Enumerating multi-path Loop Consider the following program While p Do Choose ((a1,A1),(a2,A2),…,(an,An)) The path condition at the start of i’th iteration is given by : p0&p1&…&pi&x0&x1&…&xi • Where pk is value of predicate p at the start of k’th iteration of the loop • xk refers to ajk for some value of j, 1<=j<=n such that aj during k’th iteration evaluates to True. (j is unique if Choose is in canonical form) Path statement is : X1;X2;…;Xi • Where Xk refers to Ak when xk evaluates to True otherwise Xk is null statement. • All values are in symbolic expressions • Number of Symbolic Program Paths is : nmu+nmu-1+nmu-2+….+n0
Enumerating multi-path Loop The program While p Do Choose ((a1,A1),(a2,A2),…,(an,An)) is same as Choose(-p0,Skip); Choose ((p0&p1&…&pi&x0&x1&…&xi,X1;X2;…;Xi) where i varies from 1 to mu) • Number of Choices in the second choose statement is : nmu+nmu-1+nmu-2+….+n1
Enumerating multi-path Loop Choose ((p0&p1&…&pi&x0&x1&…&xi,X1;X2;…;Xi) where i varies from 1 to mu) • Number of Choices is : nmu+nmu-1+nmu-2+….+n1 • We will extend Burgstaller et al’s Symbolic domain (Deterministic) to a Non-deterministic domain to reduce this large number of paths to a single non-deterministic path in Non-deterministic domain. • Non-deterministic domain will group a large number of deterministic symbolic values. • We will extend definitions of operations over non-deterministic symbolic expressions • We will extend boolean operations over non-deterministic boolean predicates • We will define program paths over non-deterministic symbolic expressions
Extending to non-deterministic Symbolic Domain • Extend Burgstaller et al’s Symbolic domain (Deterministic) to a Non-deterministic domain to reduce this large number of paths to a single non-deterministic path in Non-deterministic domain. • Non-deterministic domain will group a large number of deterministic symbolic values. (Boolean : All(True), None(False), Any) • Extend definitions of operations over non-deterministic symbolic expressions • Extend boolean operations over non-deterministic boolean predicates • Define program paths over non-deterministic symbolic expressions • Prune and simplify non-deterministic paths and determine upper and lower bounds on these values