290 likes | 505 Views
TAJ: Effective Taint Analysis of Web Applications. PLDI 2009 Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, Omri Weisman. INDEX. Authors Introduction Motivation Core Taint Analysis Techniques Experimental Results Illumination. Omer Tripp.
E N D
TAJ: Effective Taint Analysis of Web Applications PLDI 2009 Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, Omri Weisman
INDEX • Authors • Introduction • Motivation • Core Taint Analysis • Techniques • Experimental Results • Illumination
Omer Tripp • Advisory Software Engineer, Researcher • Omer is a member of the static analysis group at IBM Rational's Security Products Department. He is engaged in research and development in the areas of Static Program Analysis for Security and Language-based Security, with emphasis on Web-application security. • Publications:Omer Tripp, and Dror Feitelson. Zipf's Law Revisited. Technical Report Number 2007-115. School of Computer Science and Engineering, The Hebrew University of Jerusalem, August 2007.Omer Tripp. Exploration in the Dark: Reasoning about Planning Strategies. M.Sc. Thesis. School of Computer Science and Engineering, The Hebrew University of Jerusalem, January 2009.
Omer Tripp’s Patents • Rob Calendino, Craig Conboy, Guy Podjarny, Ory Segal, Adi Sharabani, Omer Tripp, and Omri Weisman. Black-box Testing Optimization Using Information from White-box Testing. Filed in the United States Patent and Trademark Office, October 2009. • Omer Tripp. Detecting Security Vulnerabilities Relating to Cryptographically-sensitive Information Carriers when Testing Computer Software. Filed in the United States Patent and Trademark Office, September 2009. • Yinnon Haviv, Roee Hay, Marco Pistoia, Adi Sharabani, Takaaki Tateishi, Omer Tripp, and Omri Weisman. Identifying Security Vulnerabilities in Computer Software. Filed in the United States Patent and Trademark Office, June 2009. • Adi Sharabani, and Omer Tripp. Efficient Code Instrumentation. Filed in the United States Patent and Trademark Office, March 2009. • Stephen Fink, Yinnon A. Haviv, Marco Pistoia, Omer Tripp, and Omri Weisman. Importance-based Call Graph Construction. Filed in the United States Patent and Trademark Office, March 2009. • Marco Pistoia, Takaaki Tateishi, Omer Tripp, and Omri Weisman. A Client-Driven Refinement-Based Static Analysis Method for Identifying Chainable Accesses to a Logical Container. Filed as Docket IL8-2008-0188 in the United States Patent and Trademark Office, June 2008.
Marco Pistoia • Research Staff Member • Recent activities: • ACSAC 2009, Program Committee Member • PLDI 2009, Poster and Student Research Competition Chair • PLAS 2009, Program Committee Member • SSIRI 2009, Program Committee Member • NDSS 2009, Program Committee Member • Refereed Conference Papers and Journal Articles: • Avraham Shinnar, Marco Pistoia, and Anindya Banerjee. A Language for Information Flow: Dynamic Information Tracking in Multiple Interdependent Dimensions. Accepted for Publication in Proceedings of the 4th ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS 2009), co-located with the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009), Dublin, Ireland, June 2009. • Emmanuel Geay, Marco Pistoia, Takaaki Tateishi, Barbara Ryder, and Julian Dolby. Modular String-Sensitive Permission Analysis with Demand-Driven Precision Accepted for Publication in Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), Vancouver, BC, Canada, May 2009.
Marco Pistoia • Marco Pistoia and Úlfar Erlingsson. Programming Languages and Program Analysis for Security: A Three-year Retrospective. ACM SIGPLAN Notices, Volume 43, Number 12, New York, NY, USA, December 2008. • Sharon Shoham, Eran Yahav, Stephen J. Fink, and Marco Pistoia. Static Specification Mining Using Automata-Based Abstractions. IEEE Transactions on Software Engineering (TSE) Journal, Volume 34, Number 5, Piscataway, NJ, USA, September 2008. • Paolina Centonze, Robert J. Flynn, and Marco Pistoia. Combining Static and Dynamic Analysis for Automatic Identification of Precise Access-Control Policies. In Proceedings of the Annual Computer Security Applications Conference (ACSAC 2007), Miami Beach, FL, December 2007. • Sharon Shoham, Eran Yahav, Stephen J. Fink, and Marco Pistoia. Static Specification Mining Using Automata-Based Abstractions. In Proceedings of the ACM SIGSOFT 2007 International Symposium on Software Testing and Analysis (ISSTA 2007), London, United Kingdom, July 2007. ACM Press. Winner of the following recognitions: • ACM SIGSOFT Distinguished Paper Award, London, United Kingdom, July 2007. • IBM Research Pat Goldberg Memorial Best Paper Award (3 papers selected our of 130 submissions), IBM Thomas J. Watson Research Center, Hawthorne, NY, USA, July 2008. • Invited for publication in the IEEE Transaction on Software Engineering (TSE) Journal, Volume 34, Issue 5, Piscataway, NJ, USA, September 2008.
Stephen J. Fink • Research Staff Member • The Complexity of Andersen's Analysis in Practice. Manu Sridharan and Stephen J. Fink. To appear in The 16th International Static Analysis Symposium (SAS 2009). • Snugglebug: A Powerful Approach To Weakest Preconditions. Satish Chandra, Stephen J. Fink, and Manu Sridharan. ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI 2009). • Effective Taint Analysis for Java. Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, and Omri Weisman. Accepted for Publication in Proceedings of the ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI 2009), Dublin, Ireland, June 2009 • Static Specification Mining Using Automata-Based Abstractions Sharon Shoham, Eran Yahav, Stephen J. Fink, Marco Pistoia September 2008 IEEE Transactions on Software Engineering , Volume 34 Issue 5 • Verifying dereference safety via expanding-scope analysis Alexey Loginov, Eran Yahav, Satish Chandra, Stephen Fink, Noam Rinetzky, Mangala Nanda July 2008 ISSTA '08: Proceedings of the 2008 international symposium on Software testing and analysis
Stephen J. Fink • Effective typestate verification in the presence of aliasing Stephen J. Fink, Eran Yahav, Nurit Dor, G. Ramalingam, Emmanuel Geay April 2008 Transactions on Software Engineering and Methodology (TOSEM) , Volume 17 Issue 2 • Static Specification Mining Using Automata-Based Abstractions. Sharon Shoham, Eran Yahav, Stephen Fink, and Marco Pistoia, ISSTA 2007. • Thin Slicing . Manu Sridharan, Stephen Fink, and Ras Bodik, PLDI 2007. • Declarative Object Identity using Relation Types Mandana Vaziri, Frank Tip, Stephen Fink, and Julian Dolby, ECOOP 2007. • When Role Models Have Flaws: Static Validation of Enterprise Security Policies. Marco Pistoia, Stephen J. Fink, Robert J. Flynn, and Eran Yahav. Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, May 2007. • Effective Typestate Verification in the Presence of Aliasing , Stephen Fink, Eran Yahav, Nurit Dor, Ramalingam, and Emmanuel Geay, ISSTA 06, July 2006. • Role-Based Access Control Consistency Validation , Paolina Centonze, Gleb Naumovich, Stephen Fink, and Marco Pistoia, ISSTA 06, July 2006. • Scalable and Flexible Error Detection , Emmanuel Geay, Eran Yahav, and Stephen Fink, PEPM 06 tools track, January 2006.
Introduction • In this paper, they present Taint Analysis for Java (TAJ), a tool designed to be precise enough to produce a low false-positive rate, yet scalable enough to allow the analysis of large applications. • TAJ incorporates a number of techniques to produce useful results on extremely large applications, even when constrained to a given time or memory budget. • They have designed and implemented TAJ that meets the requirements of industry-level applications.
Introduction • Contributions: • Hybrid thin slicing. a novel thin-slicing algorithm that combines flow-insensitive data-flow propagation through the heap with flow- and context-sensitive data-flow propagation through local variables. • An effective model for static analysis of Web applications. • A set of bounded analysis techniques.Make it possible to analyze in a short time or stay below a given memory consumption level. • Implementation and evaluation. on industrial codes.
Motivation 1: public class Motivating { 2: private static class Internal { 3: private String s; 4: public Internal(String s) { 5: this.s = s; 6: } 7: public String toString() { 8: return s; 9: } 10: } 11: protected void doGet(HttpServletRequest req, 12: HttpServletResponse resp) throws IOException { 13: String t1 = req.getParameter("fName"); 14: String t2 = req.getParameter("lName"); 15: PrintWriter writer = resp.getWriter(); 16: Method idMethod = null; 17: try { 18: Class k = Class.forName("Motivating"); 19: Method methods[] = k.getMethods(); 20: for (int i = 0; i < methods.length; i++) { 21: Method method = methods[i]; 22: if (method.getName().equals("id")) { 23: idMethod = method; 24: break; 25: } 26: } 27: Map m = new HashMap(); 28: m.put("fName", t1); 29: m.put("lName", t2); 30: m.put("date", new String(Date.getDate())); 31: String s1 = (String) idMethod.invoke(this, new 32: Object[] {m.get("fName")}); 33: String s2 = (String) idMethod.invoke(this, new 34: Object[] {URLEncoder.encode(m.get("lName"))}); 35: String s3 = (String) idMethod.invoke(this, new 36: Object[] {m.get("date")}); 37: Internal i1 = new Internal(s1); 38: Internal i2 = new Internal(s2); 39: Internal i3 = new Internal(s3); 40: writer.println(i1); // BAD 41: writer.println(i2); // OK 42: writer.println(i3); // OK 43: } catch(Exception e) { 44: e.printStackTrace(); 45: } 46: } 47: public String id(String string) { 48: return string; 49: } 50: }
Core Taint Analysis • TAJ takes a Web application and its supporting libraries, and checks it with respect to a set of “security rules”. Each security rule is of the form (S1, S2, S3), where S1 is a set of “sources”, S2 is a set of “sanitizers”, and S3 is a set of “sinks”.A source is a method whose return value is considered tainted, or untrusted. A sanitizer is a method that manipulates its input to produce taint-free output. A sink is a pair (m, P), where m is a method that perform security-sensitive computations and P contains those parameters of m that are vulnerable to attack via tainted data. • TAJ statically checks that no value derived from a source is passed as an input to a sink unless it first undergoes appropriate sanitization. • Two stages: • Pointer Analysis and Call-graph Construction.The current implementation relies on a context-sensitive variant of Andersen’s analysis with on-the-fly call graph construction. The pointer analysis adds one level of call-string context to calls to library factory methods • Hybrid Thin Slicing
Hybrid Thin Slicing • Using the preliminary pointer analysis and call graph, the second phase of TAJ tracks data flow from tainted sources using hybrid thin slicing, a novel thin-slicing algorithm . • Hybrid thin slicing combines flow-insensitive reasoning about flow through the heap with flow- and context-sensitive tracking of flow through local variables. • Q: what is the difference between Thin Slicing and Slicing, how to be thin, what to hybrid?
Hybrid Thin Slicing • Program slicing systematically identifies parts of a program relevant to a seed statement. A thin slice consists only of producer statements for the seed, i.e., those statements that help compute and copy a value to the seed. Statements that explain why producers affect the seed are excluded. For example, for a seed that reads a value from a container object, a thin slice includes statements that store the value into the container, but excludes statements that manipulate pointers to the container itself. • A thin slice can typically captures the statements most relevant to a tainted flow. Hybrid thin slicing is a novel thin-slicing algorithm. Hybrid thin slicing combines aspects of the previously proposed context-sensitive (CS) and context-insensitive (CI) thin slicing algorithms, achieving a better tradeoff between scalability and precision for taint analysis.
Hybrid Thin Slicing Hybrid thin slicing performs a demand-driven traversal over a special System Dependence Graph (SDG) called the Hybrid SDG (HSDG). Nodes in an HSDG correspond to load and store statements in the program, as well as call statements representing source and sink methods.
Figure 2 shows an example, which displays the slice computed on the no-heap SDG corresponding to a load-to-store summary edge in the HSDG. • An HSDG has two types of edges representing data dependence: “direct edges” and “summary edges”. A direct edge connects a store to a load and represents a data dependence computed by a preliminary pointer analysis. A summary edge can connect s to t if t is transitively data-dependent on s purely via flow through local variables; flow through the heap is excluded. Summary edges are obtained on demand by computing context-sensitive reachability over a no-heap SDG—an SDG that elides all control- and data-dependence edges reflecting flow through heap locations.
Techniques • Code-modeling Techniques1.Security-specific Modeling 1.Taint Carriers ,2.Handling Exceptions2.General Models 1.Code-reduction Techniques ,2.Approximating the Behavior of Web Frameworks, 3.Reflection APIs and Native Methods • Eliminating Redundant Reports • Bounded Analysis Techniques1.Priority-driven Call-graph Construction2.Useful Bounds on Analysis Dimensions 1.Slice Size, 2.Flow Length,3. Nested-taint Depth
Code-reduction Techniques • A simple, yet effective, code-reduction optimization is to exclude benign library classes, packages, and subpackages based on a whitelist generated by hand. • simplify dataflow propagation by substituting simpler models for library methods, where the simpler model encodes the behavior with respect to flow of taint. For example, taint analysis does not need to analyze the complex manipulations in the implementation of URLEncoder.encode; it suffices to observe that this method returns some string that is sanitized according to the relevant rules. • Using this insight, TAJ gives special treatment to String operations, which arise frequently in tainted flows, have relatively simple semantics, but are often difficult to analyze precisely.
Eliminating Redundant Reports • Some tainted flows reported by the analysis may be redundant to a user. We now describe an approach to address this potential redundancy. • an approach to address this potential redundancy. Considering the insertion of a sanitizer invocation into the path as a remediation action, they propose an approach whereby flows are grouped together according to the remediation actions they map to. TAJ reports one representative per group, rather than all the flows.
Eliminating Redundant Reports a library call point (LCP) is the last statement along a flow from a source to a sink where data flows from application code (i.e., the project’s source code) to library code (i.e., libraries referenced by the project). p1, p3, p4, p5 reported.
Bounded Analysis Techniques • Q: How does TAJ maintain QoS even when constrained to a given time or memory budget? • Priority-driven Call-graph Construction • Useful Bounds on Analysis Dimensions #1 Slice Size ,constrain the size of a slice, when computed through hybrid thin slicing. limiting the number of heap transitions. #2 Flow Length the longer a flow is, the less likely it is to be a true positive. #3 Nested-taint Depth
Priority-driven Call-graph Construction • Under a fixed time and memory budget, TAJ may terminate pointer analysis and call-graph construction early. • TAJ uses priority-driven call-graph construction to heuristically improve pointer analysis quality within a fixed budget. The priority heuristic favors the analysis of methods that are more likely to generate and propagate taint. • Priority-driven call-graph construction forces the pointer analysis to add constraints first from higher-priority methods—in this case, those methods likely to be more relevant to taint analysis.
Experimental Results • The unbounded hybrid algorithm offers a compelling tradeoff between performance and accuracy, when compared to the CI and CS configurations. • The prioritized hybrid algorithm offers superior accuracy and performance tradeoffs than the CI and unbounded hybrid configurations. • The fully optimized version of the hybrid algorithm is more accurate than the prioritized variant and more efficient than the CI algorithm.
Illumination • Hybrid Thin Slicing • Priority-driven • Experimental Results