700 likes | 856 Views
Automated Test Case Generation to Validate Non-functional Software Requirements. Dissertation Proposal. Pingyu Zhang Spring 2013. Software Requirements. Software life cycle is bounded by requirements Functional – what a system must do
E N D
Automated Test Case Generation to Validate Non-functional Software Requirements Dissertation Proposal Pingyu Zhang Spring 2013
Software Requirements • Software life cycle is bounded by requirements • Functional – what a system must do • Non-functional – how well are the functional requirements satisfied
Example: A Fitness Tracking App myTracks from Google • Feature List • routing • tracking • sharing
Example: A Fitness Tracking App myTracks from Google • Feature List • routing • tracking • sharing
Example Cont. Functional Requirements • Routing – e.g. Calculate a round route that includes the givenpoints of interests. • Functional Validation – run a test with 3 POIs, check if the app can generate a round route that includes all of them. Input: {Home, Avery Hall, Capitol}
Example Cont. Non-functional Requirements • Performance Requirement • Non-functional Validation – check if the program can generate a round route in acceptable time Input: {Home, Avery Hall, Capitol} • response time < T • T = 5 seconds • Resp. Time = 3 seconds
Example Cont. Non-functional Requirements Load Testing Input: {Home, Avery Hall, Capitol} Input: {Lincoln Children’s Zoo, Antelope Park, 48th & Normal, Holmes Park, 56th & Pioneer, 48th & Hwy2, 27th & Pioneer…}
Example Cont. Load Testing: the Conventional Way • Goal: find performance faults • Process • Black box • Induce load through input rate, input size… • Check against performance requirement Load Testing Input: {Lincoln Children’s Zoo, Antelope Park, 48th & Normal, Holmes Park, 56th & Pioneer, 48th & Hwy2, 27th & Pioneer…} • response time < T
Conventional Approach Missed: Highly Dependent On Inputs Values Test 1 Input: 20 geocache locs randomly chosen from Geocache.comin 68508 Test 2 Input: 20 locs generated with my load testing approach T=20 sec Response Time: 128 sec Response Time: 11 sec
Conventional Approach Missed: Highly Dependent On Inputs Values • Response time to find a route for 20 POIs ranges from 11 seconds to 128 seconds (12X) (for 100 POIs the difference is 55X) • Depends on the location of the POIs – the particular inputs values can matter as much as the size Test 1 Input: 20 geocache locs randomly chosen from Geocache.comin 68508 Test 2 Input: 20 locs generated with my load testing approach Response Time: 128 sec Response Time: 11 sec
It Missed More Than That… • Highly dependent on inputs values • Response time for 20 POIs ranges from 11sec to 128sec • Increasing inputs size is too expensive • Increase 30sec jzlibresp. time means going from a 1MB to a 75MB input • Increasing input size is doing more of the same • Increasing tables size does not reveal new behavior in query application • Missing other resources • Memory & energy constraints on mobile platform We want tests that induce high-loads by selecting the right values, that exercise a diversity of paths, that may target a variety of resources
Example: A Fitness Tracking App myTracks from Google • Feature List • routing • tracking • sharing
Example: A Fitness Tracking App myTracks from Google • Feature List • routing • tracking • sharing
Example Cont. Functional Requirements • Tracking – e.g. workout with the app turned on and record the activity. • Functional Validation – check if the app captured the route, speed, elevation, etc. Input: locationdata obtained by calling GPS related APIs
Example Cont. Non-functional Requirements • Tracking – e.g. workout with the app turned on and record the activity. • Non-functional Validation – check if the app can produce correct data under unusual conditions – tunnels, roofs, woods, etc. Input: locationdata obtained by calling GPS related APIs or
Example Cont. Exception Handling for Unusual Conditions GPS API many more…
Example Cont. Exception Handling for Unusual Conditions GPS API Exception Handling many more…
Example Cont. To Validate Exception Handling Code GPS API Exception Handling
Example Cont. To Validate Exception Handling Code Mocking Device • A mocking device to inject exceptions while executing tests GPS API Exception Handling
Example Cont. To Validate Exception Handling Code Mocking Device • A mocking device to inject exceptions while executing tests • Capable of simulating the noisy nature of external resources GPS API Exception Handling simulate
Mocking Support in Android SDK android.test.mock – throws exceptions on every invocation Example DIY DIY Official DIY Question Complaint Complaint Complaint
Software Requirements • Software life cycle is bounded by requirements • Functional – what a system must do • Non-functional – how well are the functional requirements satisfied Why do non-functional requirements matter?
Importance of Non-functional Requirements • Google – people move away from you if your web is loading 250 milliseconds slower than your competitor [New York Times, Feb 2012]. • Netflix – the entire API is re-designed to improve performance [Netflix Report, Mar 2012]. • Oracle – the cost of fixing a performance problem at the end of development cycle account for 25% of total cost [Oracle Report, Jan 2013].
Importance of Non-functional Requirements Cont What if exception handling is not done correctly? corrupted data crashing app
Exception Handling IsNot A Trivial Problem 27% - poor exception handling code; 17% - interactions with external resources
Non-functional ValidationState of practice • Not Enough Testing Resource? • Functional Only! • Enough Testing Resource? • Functional First! • Why It That? • No Cost-effective Ways!
We Propose to Improve Non-functional Validation • For load testing • automatically generate load test by exhaustively traversing program paths • For exception handling • amplify existing tests to exhaustively explore new exceptional behaviors Exhaustive white-box testing techniques can cost-effectively validate non-functional requirements.
Research Progress So Far… Software Requirement Validation Functional Non-functional Exception Handling Load Testing for single programs ASE11’ for software pipelines ISSTA12’ best paper award ICSE12’ Extension of ICSE12’ in preparation
Research Progress So Far… Software Requirement Validation Functional Non-functional Exception Handling Load Testing for single programs ASE11’ for software pipelines ISSTA12’ best paper award ICSE12’ Extension of ICSE12’ in preparation
White-box Load TestingRevisiting the objective • Highly dependent on inputs values • Response time for 20 POIs ranges from 11sec to 128sec • Increasing inputs size is too expensive • Increase 30sec jzlibresp. time means going from a 1MB to a 75MB input • Increasing input size is doing more of the same • Increasing tables size does not reveal new behavior in query application • Missing other resources • Memory & energy constraints on mobile platform We want tests that induce high-loads by selecting the right values, that exercise a diversity of paths, that may target a variety of resources
White-box Load TestingIntuition Pick the longestpath computational space
White-box Load TestingIntuition Brute force approach: traverse all paths and return the longest one. But first, how to systematically traverse program paths? computational space
foo(int x, int y) { z = 2*x; if (z == x) if (x > y+8) print(“Hi”) } Symbolic Execution(since 1976) • Goal: A test input for every program path • Use symbolic test generation to explore program paths • Widely used in automated software testing: DART, CUTE, EXE, JPF, … 2*y == x 2*y == x 2*y == x F F F T T T x > y + 8 x > y + 8 x > y + 8 F F F T T T PC: 2y ≠ x Input: x=0, y=1 PC: 2y = x ∧ x ≤ y+8 Input: x=1, y=2 PC: 2y = x ∧ x > y+8 Input: x=-10, y=-20
Findings Long Paths with Symbolic Execution • Brute force approach • Generate every path on N inputs • Return input for the longest path • Cannot scale • With 5 POIs, a full symbolic execution reveals 142,352possible paths, and takes 171 min • With 6 POIs, full SE fails to finish in 4 hours For N=5 bytecodecount <70ms 0.43~0.5sec
Findings Long Paths with Symbolic Execution • Brute force approach • Generate every path on N inputs • Return input for the longest path • Cannot scale • With 5 POIs, a full symbolic execution reveals 142,352possible paths, and takes 171 min • With 6 POIs, full SE fails to finish in 4 hours Wasted Efforts Longest paths are here For N=5 bytecode count <70ms 0.43~0.5sec
Revisiting the objective Not Scalable We want tests that induce high-loads by selecting the right values, that exercise a diversity of paths, that may target a variety of resources.
Adapting Symbolic Execution towards Load Sensitive Paths • Directed • Favor paths according to performance measure • Explore diverse paths
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check:
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check: Step1: Split the frontier Step2: Check gap > TH C1 C2
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check:
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Pruning on frontiers • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check: Frontier Diversity Check:
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Pruning on frontiers • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check: Frontier Diversity Check: C1 C2
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Pruning on frontiers • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check: Frontier Diversity Check:
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Pruning on frontiers • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check: Frontier Diversity Check: Frontier Diversity Check:
Adapting Symbolic Execution towards Load Sensitive Paths • Incremental • Iterative-deepening • Pruning on frontiers • Directed • Favor paths according to performance measure • Explore diverse paths lookAhead Frontier Diversity Check: Frontier Diversity Check: Frontier Diversity Check:
ImplementationSymbolic Load Generation (SLG) • Implemented as an extension to SPF • Record & replay of paths • Path Performance Measures • Response Time: weighted bytecode count (invoke: 10, Others: 1) • Memory Usage: listens to object life cycle operations • Test Instantiation • Implemented new Yices Java API to work with JPF • Yicesappears to be better than others (choco, cvc3)
Dealing with Solver LimitationsConstraint Limited Load Generation (CLLG-k) • Challenge • Load tests traverse long paths --- more constraints for solver • Every SMT solver has an limit on size of constraints it can handle efficiently • Constraint Limited Load Generation(CLLG-k) • Wrapper algorithm for SLG • Chains partial solutions together • Scalable but sacrifices test quality • Introduce a new parameter: maxSolverConstraints(k) Partial inputs generated by SLG with bound k CLLG-k
Evaluation of SLG • Summary • Parameters & Environment • lookAhead=50 across programs, testSuiteSize=10 • 2.4GHz Intel Core 2 Duo, JVM 1.6, 2GB MEM
RQ1: Jzlib for Response Time • Control treatment: Random • 3-hour cap enforced across runs 50MB 4.5X 100MB 3.4X
RQ1: Jzlib for Response Time 25MB 100MB