260 likes | 269 Views
Saving the World Wide Web from Vulnerable JavaScript International Symposium on Software Testing and Analysis (ISSTA 2011). Salvatore Guarnieri IBM Software Group sguarni@us.ibm.com. Marco Pistoia IBM T. J. Watson Research Center pistoia@us.ibm.com.
E N D
Saving the World Wide Webfrom Vulnerable JavaScriptInternational Symposium on Software Testing and Analysis (ISSTA 2011) Salvatore GuarnieriIBM Software Groupsguarni@us.ibm.com Marco PistoiaIBM T. J. Watson Research Centerpistoia@us.ibm.com Omer TrippIBM Software Groupomert@il.ibm.com Stephen TeilhetIBM Software Groupsteilhet@us.ibm.com Julian DolbyIBM T.J. Watson Research Centerdolby@us.ibm.com Ryan BergIBM Software Groupryan.berg@us.ibm.com www.research.ibm.com/labasec
Consequences of Taint Violations • Read and write access to saved data in cookies and local data stores • Read and write access to data in the web page • Key loggers • Impersonation • Phishing via page modifications or redirects
Getting data from the DOM Sanitizing some, but not all, of the data var el1 = document.getElementById("d1"); function foo() { var el2 = document.getElementById("d2"); function bar() { var el3 = new Element(); var s = encodeURIComponent(el2.innerText); document.write(s); el1.innerHTML = el2.innerText; document.location = el3.innerText; } bar(); } foo(); function baz(a, b) { a.f = document.URL; document.write(b.f); } var x = new Object(); baz(x, x); Writing untrusted data into web page Writing unchecked data to the web page
Motivation Sources, Sinks, and Sanitizers Taint Analysis Results
var el1 = document.getElementById("d1"); function foo() { var el2 = document.getElementById("d2"); function bar() { var el3 = new Element(); var s = encodeURIComponent(el2.innerText); document.write(s); el1.innerHTML = el2.innerText; document.location = el3.innerText; } bar(); } foo(); function baz(a, b) { a.f = document.URL; document.write(b.f); } var x = new Object(); baz(x, x);
var el1 = document.getElementById("d1"); function foo() { var el2 = document.getElementById("d2"); function bar() { var el3 = new Element(); var s = encodeURIComponent(el2.innerText); document.write(s); el1.innerHTML = el2.innerText; document.location = el3.innerText; } bar(); } foo(); function baz(a, b) { a.f = document.URL; document.write(b.f); } var x = new Object(); baz(x, x);
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks • Sources • Seeds of untrusted data • Field gets or returns of function calls • Ex: document.url
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks • Sources • Seeds of untrusted data • Field gets or returns of function calls • Ex: document.url • Sinks • Security critical operations • Field puts or parameters to function calls • Ex: element.innerHTML
Rules • A rule is a triple <Sources, Sinks, Sanitizers> • Not all sources are valid for all sinks, and not all sanitizers are valid for all sinks • Sources • Seeds of untrusted data • Field gets or returns of function calls • Ex: document.url • Sinks • Security critical operations • Field puts or parameters to function calls • Ex: element.innerHTML • Sanitizers • Marks flow as non-dangerous • Function calls • Ex: encodeURIComponent(str)
Motivation Sources, Sinks, and Sanitizers Taint Analysis Results
Complexities of JavaScript function foo() { var y = 42; var bar = function() { write(y); } } • Reflective property access • Prototype chain property lookup • Lexical scoping • Function pointers • eval and its relatives function F() { this.bar = document.url; } function G() { } G.prototype = new F(); var a = new G(); write(g.bar); eval("document.write('evil')"); var a = "foo" + "bar"; var b = obj[a]; var m = function() ... var k = function(f) { f(); } k(m);
The seeds are the assignments to sources or return values from sources The analysis proceeds by tainting variables Variables consist of triplets: Static Single Assignment (SSA) variable ID Method where SSA variable is defined Access path Ex: (v7, m, <f, g>) Demand Driven Taint Analysis
Start from taint sources Propagate taint intra-procedurally through def-use Inter-procedurally propagate taint forward Resolve aliasing by using Andersen alias analysis Record constraints on call sites, recursively In the final constraint-propagation graph, detect paths between sources and sinks not intercepted by sanitizers Context Sensitive Taint Analysis m1() m2(p1, p2, p3) m3(q1, q2)
Analysis Example Taint variable: (v2, foo, <f, *>) function foo(p1, p2) { p1.f = p2.f; } var a = new Object(); var b = new Object(); b.f = window.location.toString(); var c = new Object(); var d = new Object(); d.f = "safe"; foo(a, b); foo(c, d); document.write(a.f); // This is a taint violation document.write(c.f); // This is NOT a taint violation Install taint summary for foo: p2.f -> p1.f Since d.fis not tainted, c.fwill not be tainted
Motivation Sources, Sinks, and Sanitizers Taint Analysis Results
Data Sets • Developed a micro-benchmark suite of about 150 test scripts • Downloaded Web pages and ran Actarus on them
Real World Data Set • Crawled portions of top Alexa Web sites and downloaded pages to disk • Ran Actarus on a sample of the saved pages • Ran on over 12,000 pages • Successfully analyzed over 9,000 pages • ~22% failure due to a 4 minute timeout
Findings • Several vulnerable Web sites were found • Duplicates of vulnerabilities were found on many pages from the same site • Some exploits were found in third party code that was shared among several websites • 40% true positive rate • Vulnerabilities can be fixed with common sanitization routines
User Friendly Output • Flows are highlighted and numbered in the source code • JavaScript was pretty printed to improve readability and usefulness of line numbers
Future Work • Using string analysis to reduce false positives • Make analysis modular so library code does not have to be reanalyzed
Thank You E-mail: sguarni@us.ibm.com