330 likes | 343 Views
Learn how Apollo utilizes automated techniques to identify and troubleshoot crashes and HTML errors in dynamic web applications. Discover the innovative approach that streamlines the testing of dynamically generated web pages.
E N D
Finding Bugs in Dynamic Web Applications Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, Michael D. Earnst Presented By: Christopher Hamilton
Introduction • Webscript crashes and malformed dynamically-generated Web pages impact usability of Web applications • Current tools for Web-page validation cannot handle the dynamically-generated pages on today’s Internet
The Problem • Bad scripts creating syntactically-malformed HTML • Less portable across browsers and new versions • Non-displayable HTML on separate executions • Browser’s attempt to correct crashes & security • Discard important information • Trouble indexing correct pages
More Problems • Dynamic web page testing challenges • HTML validation tools only perform testing of static page • Developer must perform • Static Testing • Dynamic Testing
Previous Work • Dynamic test-generation tools (DART, Cute, EXE) • Execute application on concrete inputs • Create additional input by solving symbolicconstraints from control paths • Not practical with Web applications
The Authors’ Goals • Present automated technique for finding faults manifested as Web application crashes or malformed-HTML • Identify minimal part of input responsible for triggering failures • Use of an oracle to detect specification in applications output
Apollo at a Glance • On each execution: • Combined concrete and symbolic execution and constraint solving • Program monitored to record path constraints capturing outcome of control-flow predicates • Oracle determines whether fatal failure or malformed HTML occur • Automatic/iterative creation of new inputs • explore different execution paths
1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?> PHP Scripting Language • Widely used in Web development • Network interactions • Database • HTTP processing • Object oriented • Classes, interfaces, dynamically dispatched methods • Similar to Java • Scripting • Dynamic typing & eval
1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?> Failures in PHP Scripts • Execution Failures • Missing an included file • Wrong MySQL query • Uncaught exceptions • Malformed HTML • Generated HTML page not syntactically correct according to HTML validation tool ‘printReportCards.php’ missing make_footer() not executed in certain situations unclosed HTML tag Generates illegal <j2> tag
Failure-Finding in PHP Applications • Concolic Testing – execute application on initial input, then on additional inputs obtained by solving constraints derived from exercised control flow paths • Extensions • Validate to correctness of control flow output • Use isset, isempty, require, etc. to require generation of constraints absent in other OOPL’s • Use pre-specified set of values for databaseauthentication • Simulate each user input by transformingcode
Transformation of Code • For each page (h) that contains N buttons • Add additional input parameter p to PHP program • Values range from 1 to N • Switch statement inserted including appropriate PHP source file, depending on p • Required modifications are minimal performed by hand
The Failure Detection Algorithm • parameters: Program P, oracle O • result : Bug reports B; • B : setOf (hfailure, setOf (pathConstraint), setOf (input)i) • P′ ≔ s1 imulateUserInput(P); • B ≔ ?; • pcQueue ≔ emptyQueue(); • enqueue(pcQueue, 4 emptyPathConstraint()); • while not empty(pcQueue) and not timeExpired() do • pathConstraint ≔ dequeue(pcQueue); • input ≔ solve(pathConstraint); • if input , ⊥ then • output ≔ executeConcrete(P′, 9 input); • failures ≔ getFailures(O, 10 output); • foreach f in failures do • merge hf , pathConstraint, 12 inputi into B; • c1 ∧ . . . ∧ cn ≔ executeSymbolic(P′, 13 input); • foreach i = 1,. . . ,n do • newPC ≔ c1 ∧ . . . 15 ∧ ci−1 ∧ ¬ci; • queue(pcQueue, 16 newPC); • return B; A solution, if it exists, to such an alternative path constraint corresponds to an input that will execute the program along a prefix of the original execution path, and then take the opposite branch.
parameters: Program P, oracle O result : Bug reports B; B : setOf (hfailure, setOf (pathConstraint), setOf (input)i) P′ ≔ s1 imulateUserInput(P); B ≔ ?; pcQueue ≔ emptyQueue(); enqueue(pcQueue, 4 emptyPathConstraint()); while not empty(pcQueue) and not timeExpired() do pathConstraint ≔ dequeue(pcQueue); input ≔ solve(pathConstraint); if input , ⊥ then output ≔ executeConcrete(P′, 9 input); failures ≔ getFailures(O, 10 output); foreach f in failures do merge hf , pathConstraint, 12 inputi into B; c1 ∧ . . . ∧ cn ≔ executeSymbolic(P′, 13 input); foreach i = 1,. . . ,n do newPC ≔ c1 ∧ . . . 15 ∧ ci−1 ∧ ¬ci; queue(pcQueue, 16 newPC); return B; Example: Execution 1 (Expose Third Fault) 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> true – sets page = 0 false • HTML validation tool determines output is illegal • NotSet(page) || page2 ≠ 1337 || login ≠ 1 NotSet(page)||page2 ≠ 1337 || login = 1 NotSet(page) ||page2 ≠ 1337 Set(page) GoTo(20) Execution
Example: Execution 2 (The Opposite Path) 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?> • For path constraint: NotSet(page) ||page2 ≠ 1337 • Constraint solver may get page2 0; login 1 HTML validation tool discovers failure and generates bug report added to output set of bug reports true true
Minimization on Path Constraints • Eliminates irrelevant constraints • Solution for a shorter path constraint is a smallerinput • Does not guarantee returned path constraint is shortest that exposes failure • Simple, fast, and effective in practice • Differs from input minimization – operate on path constraint that exposes failure instead of input • Handles multiple constraints that lead to failure
Minimization Example • HTML malformation from previous example could have been reached from different execution paths • NotSet(page) || page2 ≠ 1337 || login = 1 page2 ≠ 1337 || login = 1 • Set(page) || page = 0 || page2 ≠ 1337 || login = 1 page2 ≠ 1337 login = 1 (login 1)
Apollo • User Input Simulator • Executor • Bug Finder • Oracle • Bug Report Repository • Input minimizer • Input Generator • Symbolic Finder • Constraint Solver • Value Generator
User Input Simulator • Performs a transformation of the program that models the user input.
Executor: Shadow Interpreter • ShadowInterpreter– PHP interpreter modified to record path constraints and positional information • Symbolic variable associated with each value • At branching points, extend initially empty path constraint with conjunct corresponding to branch taken in execution • Records conditions for PHP-specific comparison operations (isset, empty, etc) which can only be applied to one variable • Concretevalues– influence flow control during execution • Symbolicvalue– records control flow decisions at branching points
Executor: Database Manager • DatabaseManager • (Re) initializes DB used by a PHP application. Restores DB before each execution • Supply additional information about username/password pairs
Bug Finder • BugReport = Path constraint + Input inducing failure • Failure = Type of Failure + Corresponding Message + PHP statement generating bad HTML • Oracle – HTML validation tool (WDG and WC3) • InputMinimizer – uses the path constraints minimization algorithm • Executes program multiple times with multiple inputs that satisfy multiple constraints • Attempts to find shortest path constraint resulting in same failure characteristic
Input Generator • Symbolic Driver – Implements combined concrete and symbolic failure detection algorithm • Select next input (coverage heuristic) • Create additional inputs from each execution • Constraint Driver – implements lightweight symbolic execution • Constraints = equality or inequality • Choco constraint solver • Un-constrainted = random generation and constant-mining
Evaluation • How many faults can Apollo find, and of what varieties? • How effective is the fault localization technique compared to alternative approaches, in terms of number and severity of discovered faults? (line coverage achieved) • How effective is minimization in reducing size of inputs parameter constraints and failure-inducing inputs?
Experimentation <?php echo "<h2>WebChess ".$Version.“Login"</h2>; ?> <formmethod="post" action="mainmenu.php"> <p> Nick: <inputname="txtNick" type="text" size="15"/> <br /> Password: <inputname="pwdPassword" type="password" size="15"/> </p> <p> <inputname="login" value="login“ type="submit"/> <inputname="newAccount" value="New Account“ type="button" onClick="window.open(’newuser.php’, ’_self’)"/> </p> </form>
Generation Strategies • Compared to two other approaches • Halfond and Orso (Randomized) • Chosen from constant values appearing in program source and from default values • Difficult: parameters’ names and types not apparent • Infers names and types from dynamic traces • Minimide’s static analysis • Apollo’s test input generation previously discussed
Methodology • 10-minute runs on each program • Generation of hundreds of inputs • Ran on both Apollo and Random test input generation strategies • WDG offline HTML validation tool • Coverage (number of executed lines / total lines with executable PHP code in application) • Total number of lines w/ PHP opcode
Results Classification • Execution crash: PHP interpreter terminates with exception • Execution error: PHP interpreter emits warning visible in generated HTML • Execution warning: PHP interpreter emits warning invisible to HTML output • HTML error: program generates HTML for which validation tool produces error report • HTML warning: program generates HTML for which validation produces a warning report
Results Analysis Resulted in Malformed HTML Tries to load two missing files Database related Unset Time-zone Apollo Randomized Average line coverage – 58.0% Faults Found on Subject Apps – 214 Average line coverage – 15.0% Faults Found on Subject Apps – 59
Results Analysis: Effects of Constraint Minimization • Minimide’s tool • Approximates string output of program with a context-free grammar. • Able to discover unclosed tags • Intersect grammar with regular expression of matched pairs of delimiters • Covers phpwmis and timeclock (web-based) • Apollo is more effective and efficient • 2.7 more HTML validation faults • 83 additional execution faults • More scalable
Results Analysis: Compared to Static Analysis Reduces size of inputs by up to factor of 0.18 for more than 50% of faults
Threats to Validity and Limitations Threats to Validity Limitations Simulating inputs based on static information False positives… Limited tracking in native methods C, input output, Limited resources of input parameters Only inputs from global arrays Running as a stand-alone application Web server integration limited • Construct • Malformed HTML = Defect? • Line coverage = quality? • Minimization path constraints? • Internal • Real, unseeded, and unknown faults? • External • Generalized beyond subject programs? • Reproducible?
Future Work • Handle simulated user input dynamically • Create external language to model dependencies between inputs and outputs • Increase line coverage when executing native methods • Webserver integration
Conclusion • Detection of run-time errors • HTML Validation tool as oracle • PHP specific issues • Simulation of interactive user input that occurs when HTML elements are activated • Automated analysis to minimize size of failure-inducing inputs • Apollo run on 4 open source programs • Over 50% line coverage • 214 faults over these applications • Minimized inputs 5.3 times smaller than nonminimized inputs