390 likes | 523 Views
Finding Bugs in Dynamic Web Applications. Shay Artzi , Adam Kiezun , Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar , Michael D. Earnst. Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis ). CSE 6329 Special Topics in Advanced Software Engineering.
E N D
Finding Bugs in Dynamic Web Applications Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, AmitParadkar, Michael D. Earnst Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis )
CSE 6329 Special Topics in Advanced Software Engineering • Presented By • Md. Monjurul Hasan
Dynamic Web Application • Generates pages (HTML contents) on-the-fly • Content varies on user and user-specified criteria • Obtained by server-side programming • We can say that all big, known web applications are Dynamic Web Application Source: Dynamic Web Application Development using PHP and MySQL – By Simon Stobart and David Parsons
Web Threats • Web script crashes and malformed dynamically-generated Web pages impact usability of Web applications • Current tools for Web-page validation cannot handlethe dynamically-generated pages
Web Script Crash • Missing included file • Call to undefined method • Wrong Database query • Uncaught exceptions
Malformed HTML • HTML that does not conform to the WDG (Web Design Group) or W3C’s (World Wide Web Consortium) standard • Not using defined tags by W3C (e.g. <html><table><div>..etc.) • Not maintaining the structure(e.g. <html><header></header><body> .. </body></html>) • Not using proper opening and matching closing tag • etc. • Web Scripting language can generate HTML
The Problem • Bad scripts creating syntactically-malformed HTML • Partially displayable or Non-displayable HTML • Browser’s attempt to correct crashes • Slower HTML rendering • Discard important information • Trouble indexing correct pages for search engines • Example
More Problems • Dynamic web page testing challenges • HTML validation tools only perform testing of static page • Can not fully capture behavior since not all of functionality of code is found in the HTML result • No automatic validator for scripting languages that dynamically generate HTML pages • HTML Kit validates every generated page but requires manual generation of inputs that lead to displaying pages
What this paper presents… • Presents automated technique for finding faults manifested as Web script crashes or malformed-HTML – extends dynamic test generation to scripting languages. • Identifies minimal part of input responsible for triggering failures • Uses an oracle to determine well-formed HTML • Creates a tool, Apollo that implements all these in the context of PHP
Why ? • Widely used in Web development • Network interactions • Database • HTTP processing • Object oriented • Scripting • 21 millions domains1 (75%) are powered including large websites like Wikipedia, WordPress, Facebook, Dig etc. 1Source Netcraft, April 2007
Example: program • SchoolMate.php • Allows school administrators to manage classes and users, teachers to manage assignments and grades and students to access their information • Typical URL: schoolmate.php?page=1&page2=100&login=1&username=user&password=password
1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?>
1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?> ‘printReportCards.php’ missing make_footer() not executed in certain situations unclosed HTML tag Generates illegal <j2> tag
Failures in PHP programs • Targets two types of failures • Execution failures • Web Script Crashes • HTML failures • Malformed HTML
Failure-Finding in PHP Applications • Concolic Testing – Dynamic Test Generation Technique Execute application on • Initially on empty input • Then on additional inputs, obtained by solving constraints that are derived from control flow paths • Extensions • Validate to correctness of program output by using oracle • Use isset, isempty, require, etc. to require generation of constraints absent in other OOPL’s • Use pre-specified set of values for database authentication • Simulate each user input by transforming source code
Transformation of Code • Interactive HTML pages with buttons and menus • For each page (h) that contains Nbuttons • Add additional input parameterpto PHP program • Values range from 1 to N • Switch statement inserted including appropriate PHP source file, depending on p
An example <?php echo “<h2>Webchess “.$Version.” login”</h2>; ?> <form method = “post” action = “mainmenu.php”> <p> Nick: <input name=“txtNick” type=“text” size=“15” /><br /> Password: <input name=“pwdPassword” type=“password” size =“15” /> </p> <p> <input name=“login” value=“login” type=“submit” /> <input name=“newAccount” value=“New Account” type=“button” onClick =“window.open(‘newuser.php’, ‘_self’)” /> </p> </form> <? /* Simulated User Input */ Switch ($_GET[“_btn”] { Case 1: require_once(“mainmenu.php”); break; Case 2: require_once (“newuser.php”); break; } ?>
The Failure Detection Algorithm • parameters: Program P, oracle O • result : Bug reports B; • B : setOf (<failure, setOf (pathConstraint), setOf (input)>) • P′ ≔ simulateUserInput(P); • B ≔ empty; • pcQueue ≔ emptyQueue(); • enqueue(pcQueue, emptyPathConstraint()); • while not empty(pcQueue) and not timeExpired() do • pathConstraint ≔ dequeue(pcQueue); • input ≔ solve(pathConstraint); • if input not equals to⊥ then • output ≔ executeConcrete(P′, input); • failures ≔ getFailures(O, output); • foreachf in failures do • merge <f , pathConstraint, input>into B; • c1 ∧ . . . ∧ cn ≔ executeSymbolic(P′, input); • foreach i = 1,. . . ,n do • newPC ≔ c1 ∧ . . . ∧ ci−1 ∧ ¬ci; • queue(pcQueue, newPC); • return B;
parameters: Program P, oracle O result : Bug reports B; B : setOf (<failure, setOf (pathConstraint), setOf (input)>) P′ ≔ simulateUserInput(P); B ≔ empty; pcQueue ≔ emptyQueue(); enqueue(pcQueue, emptyPathConstraint()); while not empty(pcQueue) and not timeExpired() do pathConstraint ≔ dequeue(pcQueue); input ≔ solve(pathConstraint); if input not equals to⊥ then output ≔ executeConcrete(P′, input); failures ≔ getFailures(O, output); foreach f in failures do merge <f , pathConstraint, input>into B; c1 ∧ . . . ∧ cn ≔ executeSymbolic(P′, input); foreach i = 1,. . . ,n do newPC ≔ c1 ∧ . . . ∧ ci−1 ∧ ¬ci; queue(pcQueue, newPC); return B; Example: Execution 1 (Expose Third Fault) 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> true – sets page = 0 false • HTML validation tool determines output is legal • NotSet(page) ∧page2 ≠ 1337 ∧ login ≠ 1 NotSet(page)∧page2 ≠ 1337 ∧ login = 1 NotSet(page) ∧page2 = 1337 Set(page) GoTo(20) Execution
Example: Execution 2 (The Opposite Path) 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27 ... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo "<j2> username must be supplied.</h2>\n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo "<h2>Login error. Please try again</h2>\n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 <HTML> 46 <HEAD> <TITLE> Class Management </TITLE> </HEAD> 47 <BODY>"); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 </BODY> 53 </HTML>"); 54 } 55 ?> • NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 • Constraint solver may get page2 0; login 1 HTML validation tool discovers failure and generates bug report added to output set of bug reports true true
Minimization on Path Constraints • Find shorter path constraint for a given bug report • Eliminates irrelevant constraints – better assist programmer to detect location of the fault • Solution for a shorter path constraint is often a smaller input • Does not guarantee returned path constraint is shortest that exposes failure
Minimization Example • HTML malformation from previous example could have been reached from different execution paths • NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 page2 ≠ 1337 ∧ login = 1 • Set(page) ∧ page = 0 ∧ page2 ≠ 1337 ∧ login = 1 page2 ≠ 1337 login = 1 (login 1)
Path Constraint Minimization Algorithm • parameters: Program P, oracle O, bug report b • result : Short path constraint that exposes b.failure • c1 ∧ . . . ∧ cn ≔ intersect(b.pathConstraints); • pc ≔ true; • foreach i = 1, . . . , n do • pci ≔ c1 ∧ . . . ci−1 ∧ ci+1 ∧ . . . cn; • input ≔ solve(pci); • if input not equals ⊥ then • output ≔ executeConcrete(P, input); • failures ≔ getFailures(O, output); • if b.failure not belongs to failures then • pc ≔ pc ∧ ci; • input pc ≔ solve(pc); • if input pc not equals to ⊥ then • outputpc ≔ executeConcrete(P, input pc ); • failurespc ≔ getFailures(O, outputpc ); • if b.failure ∈ failurespc then • return pc; • return shortest(b.pathConstraints);
Apollo • User Input Simulator • Executor • Bug Finder • Oracle • Bug Report Repository • Input minimizer • Input Generator • Symbolic Finder • Constraint Solver • Value Generator
Executor: Shadow Interpreter • Shadow Interpreter • Modified Zend PHP interpreter 5.2.2 to record path constraints and information associated with output • Performs symbolic execution along with concrete execution • Records conditions for PHP-specific comparison operations such as isset and empty
Executor: Database Manager • Database Manager • (Re) initializes DB used by a PHP application. Restores DB before each execution • Supply additional information about username/password pairs
Bug Finder • Bug Report = Failure + Path constraint + Input inducing failure • Failure = Type of Failure + Corresponding Message + PHP statement generating bad HTML • Oracle – HTML validation tool (WDG and WC3) • Input Minimizer– uses the path constraints minimization algorithm
Input Generator • Symbolic Driver – generates new path constraints and select next path constraint • Constraint Solver – computes an assignment of values to input parameters that satisfies a given path constraint. • Choco constraint solver • Value Generator – generates value for parameters • Combines random value generation and constant values mined from source code
Experimentation faqforge = Tool for creating and managing documents webchess = Online chess game schoolmate = PHP/MySQL solution for administering schools phpsysinfo = Displays system info
Generation Strategies • Compared to two other approaches • Halfond and Orso (Randomized) • Random values to the parameters • Proposed for JavaScript • Minamide’s static analysis • Approximates the string output of program with a context-free grammar • Discovers malformed HTML faults • Apollo’s test input generation previously discussed
Methodology • 10-minute runs on each program • Generation of hundreds of inputs • Ran on both Apollo and Random test input generation strategies • WDG offline HTML validation tool
Results Classification • Execution crash: PHP interpreter terminates with exception • Execution error: PHP interpreter emits warning visible in generated HTML • Execution warning: PHP interpreter emits warning invisible to HTML output • HTML error: program generates HTML for which validation tool produces error report • HTML warning: program generates HTML for which validation produces a warning report
Results Analysis Resulted in Malformed HTML Tries to load two missing files Database related Unset Time-zone Apollo Randomized Average line coverage – 58.0% Faults Found on Subject Apps – 214 Average line coverage – 15.0% Faults Found on Subject Apps – 59 Line Coverage = Number of executed lines / Total lines with executable PHP code in application
Results Analysis • Apollo Vs Randomized • 58% line coverage Vs 15.2% line coverage • 214 faults Vs 59 faults • Apollo Vs Minamide’s tool • 2.7 more HTML validation faults (120 Vs 45) • 83 additional execution faults • 104 faults (10 minutes) Vs 14 faults (126 minutes) • Apollo is more effective and efficient than both
Results Analysis: Path Constraint Minimization Reduces size of inputs by up to factor of 0.18 for more than 50% of faults Success rate – Percentage of faults whose exposing input was minimized Orig. size – Average size of original path constraints (# of conjuncts) and inputs (# of key-value pairs) Reduction columns – Ratio of minimized to un-minimized size. The lower the ratio, the more successful the minimization
Limitations Simulating user inputs statically JavaScript code in the generated HTML not tracked Limited line coverage for native C methods Limited sources of input parameters Only inputs from global arrays (_POST, _GET and _REQUEST)