• 250 likes • 435 Views
Finding Security Vulnerabilities in Java Applications with Static Analysis USENIX Security 2005. Authors: V. Benjamin Livshits and Monica S. Lam. Presented in UIUC CS527 (Fall ’07) by Matt Stockton. Introduction / Motivation.
E N D
Finding Security Vulnerabilities in Java Applications with Static AnalysisUSENIX Security 2005 Authors: V. Benjamin Livshits and Monica S. Lam Presented in UIUC CS527 (Fall ’07) by Matt Stockton
Introduction / Motivation • Many web applications create, delete, update, and display sensitive information that has financial value to hackers • Many of these web applications are vulnerable to attacks (Imperva Application Defense Center Study) • Attacks on web applications are expensive to deal with after the fact (litigation, lost proprietary information, lost customer information, etc.) • The most commons means of discovering web application vulnerabilities before application deployment is also expensive. How can we solve this dilemma?
Static Analysis Overview Definition: the analysis of computer software that is performed without actually executing programs built from that software - wikipedia Many static analysis tools exist for analyzing C/C++ code. These look for buffer overflows, string format vulnerabilities, etc. Java language safety features prevent direct memory access so static analysis is not as necessary…or is it? Even with automatic memory management, Java applications are still exploitable , although the vector of attack is quite different. The paper presents a technique for identifying these attack vectors by using static analysis techniques.
Unchecked User Input • Unchecked user input is the source of most web application vulnerabilities • Many ways to send data to a web application. Web programmers make assumptions on what data can be sent and how this data will be formatted. When assumptions are wrong, there is opportunity for attack. • Attackers must fulfill two goals to exploit unchecked input vulnerabilities: • Inject malicious data into a web applications • Manipulate web applications using the malicious data
Injecting Malicious Data Parameter Tampering – Enter maliciously formed data into HTML forms URL Tampering – Directly edit the URL string (usually modifying an HTTP GET request after form submissions) Hidden Field Manipulation – Web sites sometimes use hidden forms for persistence. An attacker can manually change the values HTTP Header Manipulation – Free tools allow you to intercept browser requests, and change HTTP headers. Cookie Poisoning – Manually modify web site cookies stored on a computer Non-Web Input Sources – Modify command-line parameters sent to web application management scripts.
Manipulating apps with unchecked data SQL Injections – Use input to generate SQL queries that will leak information from the database, or perform a malicious insert, update, or deletion. Example: Username: Cross-Site Scripting – User-controlled content that the web application displays without filtering at all (e.g. load a javascript library, send cookie information to another domain) HTTP Response Splitting – User-controlled content that the web application uses in the HTTP response header. Web application could send multiple responses, could corrupt proxy cache Path Traversal – Crafted user input allows user to read / write / update files that shouldn’t be accessible. Example: File To Delete: Command Injection – Force web application to execute a command it shouldn’t be executing Matt ‘ OR 1 = 1; -- Matt/../../../etc/passwd
‘Usual’ web app security analysis • Several manual techniques are prevalent for web app security analysis • Source code audits by security professionals / white box analysis • Penetration testing / black box analysis • Shortcomings from manual techniques include • Time / cost associated with the investigation • Coverage / precision of the investigation • If investigation causes code changes, the changes may need to be re-audited Need a less costly, more automated process for this type of auditing. Primary motivation for this paper.
Static Analysis in more detail Analyze the code without actually running the application Use different algorithms to analyze the code to find errors. Wide range of complexity to the algorithms If source code is not available, byte code can be used to perform the static analysis (technique used in this tool) Basic premise is to give a static analysis tool something to look for (some type of pattern). If the tool finds a match, it will note the match. Simple Example: grep Complex Example: this tool General Goals for static analysis – Soundness, Precision, Scalability
Contributions of the paper Proposes a methodology and tool to detect a diverse set of common web application vulnerabilities. Improve precision of tool by using fully context-sensitive pointer analysis (less false positives) Deliver an actual implementation of the idea, built as an Eclipse Plug-in: http://suif.stanford.edu/~livshits/work/lapse/ Validate the methodology against real web applications. Found real errors, with few false positives.
Data as an attack vector – Tainted Objects • How does data propagate from data the user controls (user input) to data the application uses (e.g. SQL query)? • We can model this using tainted object propagation, which is composed into three segments: • Source Descriptors – How user-provided data can enter a web application • Sink Descriptors – Potentially unsafe ways that data can be used in a program (e.g. if the data is tainted) • Derivation Descriptors – How tainted objects can be manipulated, and still remain tainted in the application (e.g. what methods can be sent to a tainted object, or can use the tainted object as an argument to a method, to create another tainted object or keep the objected in a tainted state. Tainted Object Propagation ~ Modeling object flow through an application
Tainted Object Propagation Descriptions Source Descriptor Example <HttpServletRequest.getParameter(String), -1, ε> Sink Descriptor Example <Connection.executeQuery(String), 1, ε> Derivation Descriptor Examples <StringBuffer.append(String), 1, ε, -1, ε> <StringBuffer.toString(String), 0, ε, -1, ε > Using the descriptors, you can theoretically find all sources and sinks in the code, and can understand when a sink uses a tainted source object that is still tainted after manipulation by derivation descriptor rules.
Tainted Object Security Violation • A Security Violation occurs when: • A source object is tainted (given the rules for source descriptors) • From this tainted source object, there are derivations performed in the code to produce another object that is tainted, or keep the current object in the tainted state (based on derivation descriptors) • The tainted object is used in a sink (based on the sink descriptors) • When the above steps occur, then user-controlled input is being used by the • application in a potentially vulnerable / exploitable way.
Generating Sources, Sinks, and Derivations Generating the rules for sources, sinks, and derivations is a manual process. Without providing 100% coverage for all sources, sinks, and derivations, the model is incomplete and can miss vulnerabilities! For this tool, J2EE APIs were evaluated to generate sources and sinks, and Java String manipulation libraries were evaluated to generate derivation descriptors What if something is missing? Definitely a possibility. - This tool used some additional static analysis to pinpoint tainted sources that were never passed to a method listed in derivation descriptors. This found additional derivation descriptors. Concern: What if the source is written to a File, and used later by the application? This ‘derivation’ cannot be covered through String manipulation
What about object references? To have sound static analysis, your tool needs to track what object references (program variables) point to tainted objects (on the heap) In a naïve implementation, to maintain soundness, you could end up with a very large number of potentially tainted object references if you do not perform good points-to analysis. Example: Are buf1 and buf2 both tainted? Is this a violation?
Points-to analysis • Solve the tainted object scalability problem using approximation with static • object names • Do not want to miss potential pointers to a tainted object, but at the same • time, if you do not do any bounding, end up with a huge number of • potentially tainted objects • Tool uses a context-sensitive Java points-to analysis developed by Whaley • and Lam • Uses Binary Decision Diagrams (BDDs) to represent points-to results for multiple execution contexts in a program Not many technical details on the BDD method, but this essentially allows this tool to perform context-sensitive static analysis to reduce the set of objects that could be tainted. NOTE: Exact points-to analysis is an undecidable problem. Need a conservative estimate that is still sound (doesn’t miss any tainted objects)
Additional Claims Of Novelty Sound and precise context-based points-to analysis, reducing the tainted object space Further reduction of tainted object space by introducing a clever way to handle Container references – can identify / name underlying structure of the Container, resulting in a further reduced tainted object space. Object naming for String manipulation methods. Introduced logic to name Strings produced from String manipulation methods to further reduce tainted object space.
Programmatic representation of descriptions Source, Sink, and Derivation descriptions can be created using Program Query Language (PQL) PQL – Java-like language that can be used to describe a sequence of dynamic events that involves variables referring to object instances Two main PQL statements define the framework that is used to find security violations. User must then define source(), derived() and sink()
PQL Example – SQL Injection Fairly simple to understand the definitions for source, sink, and derived.
Evaluation / Experimental Results Tested against 8 large open-source web applications Created set of source, sink, and derivation descriptors (derivation focused on String, StringBuffer, and StringTokenizer classes Four combinations of testing (with/without context sensitivity, with/without improved object naming) Recorded a total of 41 potential security violations. 29 turned out to be security errors, and 12 were false positives More precise with both context sensitivity and improved naming enabled (and actually faster execution time) Found two errors in common library code (J2EE and hibernate) Almost all errors were confirmed by the application developers, resulting in code fixes.
Errors Discovered Parameter manipulation to perform HTTP splitting was the most prevalent attack vector Browser re-direct attacks based on user-entered data (HTTP referrer field was modified) SQL injection vector in Hibernate library code False Positives All due to not defining an object naming rule correctly -StringWriter.toString() Once this was added to the naming rules, there were no false positives
Shortcomings Input validation / control flow is not handled – If application does some parameter validation – this tool will not take that into account Source / Sink / Derivation descriptions need to be manually created and potentially updated – J2EE sources / sinks, and String library descriptors cover a lot – are there more? Need to manually tune the object naming rules so that you can minimize false positives Can you think of other paths not covered by the implementation? Example - user input gets stored to a file, then read in later and used in a sink
Other Techniques used in Practice Penetration Testing – Black box and white box. Depending on the effort, may only catch a small sample of security risks. Will not identify parts of the system that remain untested Runtime Monitoring – Pattern matching of HTTP requests at runtime by a proxy. White list of good inputs and/or blacklist of bad inputs. Protection against errors already manifested in application. Protection at levels other than application (e.g. Oracle virtual private databases to minimize amount of data available to application)
Conclusions This paper proposes applying tainted object propagation techniques to Java web applications, and presents a tool implemented as an Eclipse plug-in The proposed technique maintains static analysis soundness, and increases scalability and precision with context-sensitive pointer analysis and object naming. Improved object naming by modifying naming for Containers and Strings Seems like a good tool requiring minimal manual integration work to use as an additional mechanism to measure your web application’s security
LAPSE Tool <source id="javax.servlet.ServletRequest.getParameterMap()"> <category>Parameter tampering</category> </source> <source id="javax.servlet.ServletRequest.getParameterNames()"> <category>Parameter tampering</category> </source>
Initial Student Feedback Shortcomings - Weak Analysis(?), Manually creating PQL descriptors Can we use this with other languages (.NET, ROR, *SPs) Do people actually use PQL? (http://pql.sourceforge.net/)