610 likes | 734 Views
Generating Program Inputs for Database Application Testing. Kai Pan , Xintao Wu University of North Carolina at Charlotte. Tao Xie North Carolina State University. 26th IEEE/ACM International Conference on Automated Software Engineering Nov 11, 2011 Lawrence, Kansas. Background.
E N D
Generating Program Inputs for Database Application Testing Kai Pan, Xintao Wu University of North Carolina at Charlotte Tao Xie • North Carolina State University 26th IEEE/ACM International Conference on Automated Software Engineering Nov 11, 2011 Lawrence, Kansas
Background Functional Testing Test Generation Program Inputs
Background Functional Testing Test Generation Program Inputs Database States
An Example Program inputs Database
Benefits to use an existing database state • Represent real-world objects’ characteristics, helping detect • faults that could cause failures in real-world settings • Reduce cost of generating new database records
Dynamic Symbolic Execution (DSE) • Execute the program in both concrete and symbolic way (also called concolictesting) • Collect constraints along executed path as path condition • Negate part of the path condition and solve the new path condition to lead to new path • DSE tools for various program languages • Pex for .NET from Microsoft Research
Motivation Path Condition: C1: Query construction constraints
Motivation Path Condition: C1: Query construction constraints C2: Query/DB constraints
Motivation Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints
Motivation Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints C1 ^ C2 ^ C3
Motivation Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints C1 ^ C2 ^ C3 A hard part
Motivation How to derive high-covering program input values based on a given database state?
Outline • Background • Approach • Evaluation • Conclusion and future work
SQL query forms • Fundamental structure: SELECT, FROM, WHERE, GROUP BY, and HAVING clauses. SELECTselect-list FROMfrom-list WHERE qualification (GROUP BY grouping-list) (HAVING group-qualification)
SQL query forms (cont’d) • Nested query: a query with another query embedded within it • Nested query can be unnested into equivalent single level canonical queries SELECTS.snameSELECTS.sname FROM Sailors S FROM Sailors S, Reserves R WHERE EXISTS ( SELECT * WHERE R.sid=S.sid AND R.bid=103 FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid) transoformation rules A nested query Its canonical form
SQL query forms of focus • WHERE clause consisting of a disjunction of conjunctions SELECT C1, C2, ..., Ch FROMfrom-list WHERE (A11 AND ... AND A1n) OR ... OR (Am1 AND ... AND Amn)
Outline • Background • Approach • Evaluation • Conclusion and future work
Apply DSE on the existing database Step1: DSE chooses “ type=0, zip=0 ” executed query: Q1: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15 AND C.zipcode=1 AND C.SSN=M.SSN Execution of Q1 zero record, not covering loop body
Apply DSE on the existing database (cont’d) Step2: DSE flips “type == 0” to “type != 0” “type=1, zip=0” executed query: Q2: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=30 AND C.zipcode=1 AND C.SSN=M.SSN Execution of Q2 zero record not covering loop body
Apply DSE on the existing database (cont’d) However, An input like “type=0, zip=27694” executed query: Q3: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15 AND C.zipcode=27695 AND C.SSN=M.SSN Execution of Q3 one record {C.SSN = 001, C.income = 50000, M.balance = 20000}. Covering Line14=true and Line18=false
Apply DSE on the existing database (cont’d) Furthermore, An input like “type=0, zip=28222”, executed query: Q4: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15 AND C.zipcode=28223 AND C.SSN=M.SSN Execution of Q4 one record {C.SSN = 002, C.income = 150000, M.balance = 30000}. As a result, Line14=true and Line18=true
Assist DSE to generate program inputs How to derive high-covering program input values based on a given database state?
Our idea: construct auxiliary queries Auxiliary query : SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN e.g., result set includes “fzip=27695”. From “fzip=zip+1”, we derive “zip=27694”!
Our idea: construct auxiliary queries (cont’d) Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes “fzip=27695”. From “fzip=zip+1”, we derive “zip=27694”! Cover Line14=true and Line18=false! true false
Our idea: construct auxiliary queries (cont’d) Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes “fzip=27695”. From “fzip=zip+1”, we derive “zip=27694”! Cover Line14=true and Line18=false! true false Act like “Constraint Solver” for Program Constraints +DB State Constraints
Approach • Collect query construction constraints • on program variables used in the executed queries from the program code
Approach (cont’d) • Collect query construction constraints • on program variables used in the executed queries from the program code • Collect result manipulation constraints • on comparing with record values in the query’s result set (such as “if (diff>100000)”)
Construct auxiliary queries For path “Line04=true, Line14=true”, construct the abstract query: true SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15AND C.zipcode=‘fzip’ AND C.SSN=M.SSN true
Construct auxiliary queries For path “Line04=true, Line14=true”, construct the abstract query: true SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15AND C.zipcode=‘fzip’ AND C.SSN=M.SSN Our target true
Construct auxiliary queries true SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15AND C.zipcode=‘fzip’ AND C.SSN=M.SSN SELECTC.zipcode Construct auxiliary query true
Construct auxiliary queries true SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15AND C.zipcode=‘fzip’ AND C.SSN=M.SSN SELECTC.zipcode FROM customer C, mortgage M Construct auxiliary query true
Construct auxiliary queries true SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHEREM.year=15AND C.zipcode=‘fzip’ AND C.SSN=M.SSN SELECTC.zipcode FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN Construct auxiliary query true
Generate program input values Run auxiliary query: SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15AND C.SSN=M.SSN fzip:27695 or 28223
Generate program input values Run auxiliary query: SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15AND C.SSN=M.SSN fzip: 27695 or 28223 zip: 27694 or 28222
Generate program input values Input combinations: type: 0 or !0 X zip: 27694 or 28222 true “type=0, zip=27694” covers Line04=true, Line14=true, but Line18=false true false
Approach (cont’d) Not enough! Program variables in branch condition after executing the query may be data-dependent on returned record values. How to cover Line18 true branch?
Approach (cont’d) • To cover path Line04=true, Line14=true, Line18=true We need to extend previous auxiliary query true true true
Construct auxiliary queries We extend the WHERE clause true SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN (----how to extend?----) true true
Construct auxiliary queries We extend the WHERE clause true SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN (----how to extend?----) true true
Construct auxiliary queries We extend the WHERE clause true SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 true true
Generate program input values Run auxiliary query: SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 fzip=28223
Generate program input values Run auxiliary query: SELECTC.zipcode, FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 fzip=28223 zip=28222
Other issues (aggregate calculation) • Extend auxiliary query with GROUP BY and HAVING clauses. Involve multiple records
Other issues (aggregate calculation) SELECTC.zipcode, sum(M.balance) FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 GROUP BY C.zipcode HAVING sum(M.balance) > 500000
Other issues (cardinality constraints) SELECTC.zipcode FROM customer C, mortgage M WHEREM.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 GROUP BY C.zipcode HAVING COUNT(*) >= 3 Use a special DSE technique for dealing with input-dependent loops P. Godefroid and D. Luchaup. Automatic partial loop summarization in dynamic test generation. In ISSTA 2011.
Outline • Background • Approach • Evaluation • Conclusion and future work
Research questions • RQ1 (Effectiveness): What is the percentage increase in code coverage by the program inputs generated by Pex with our approach’s assistance? • RQ2 (Cost): What is the cost of our approach’s assistance?
Evaluation subjects Two open source database applications • RiskIt • 4.3K LOC, database: 13 tables, 57 attributes, and >1.2 million records • 17 DB-interacting methods selected for testing • UnixUsage • 2.8K LOC, database: 8 tables, 31 attributes, and >0.25 million records • 28 DB-interacting methods selected for testing