1 / 28

Generating Test Data for Killing SQL Mutants: A Constraint Based Approach

Generating Test Data for Killing SQL Mutants: A Constraint Based Approach. Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep Patidar Bhanu Pratap Gupta, Devang Vira. CSE Department, IIT Bombay. Testing SQL Queries: A Challenge. Complex SQL queries hard to get right

camden
Download Presentation

Generating Test Data for Killing SQL Mutants: A Constraint Based Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generating Test Data for Killing SQL Mutants: A Constraint Based Approach Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep PatidarBhanu Pratap Gupta, Devang Vira CSE Department, IIT Bombay ICDE 2011

  2. Testing SQL Queries: A Challenge • Complex SQL queries hard to get right • Question: How to check if an SQL query is correct? • Formal verification is not applicable since we do not have a separate specification and an implementation • State of the art solution: manually generate test databases and check if the query gives the intended result • Often misses errors ICDE 2011

  3. deptnameGcount(instrID) deptnameGcount(instrID) σsal>100000 σsal>100000 instructor instructor dept dept Example • Find number of high paid instructors in each department Problem: does not output departments with no high paid instructors Test dataset should be able to distinguish these two alternatives ICDE 2011

  4. Generating Test Data: Prior Work • Automated Test Data generation • Initial work based on database constraints alone • Later work: DB constraints + SQL query • Agenda, Reverse Query Processing (RQP), QAGen • Our earlier work in ICDE 2010 (poster) on killing SQL mutants • More recent (independent) de la Riva et al [AST10] consider test data generation for killing mutants • But limited space of mutations, no guarantees of completeness • None of the above guarantee anything about detecting errors in SQL queries • Question: How do you model SQL errors? • Answer: Query Mutation ICDE 2011

  5. Query Mutations • Mutations: model common programming errors • Join used instead of outerjoin (or vice versa) • Join/selection condition errors • < vs. <=, missing or extra condition • Wrong aggregate (min vs. max) • Mutant: syntactic variation of the given query Q • Mutant may be the intended query • A dataset D kills a mutant M of query Q if output of Q and M differ on D • For a mutation space, a dataset suite is complete if all non-equivalent mutants are killed ICDE 2011

  6. Mutation Testing of Queries • Mutation testing can be used to check coverage of a test suite • Given a test suite, generate mutants in a pre-defined mutation space; evaluate on a given test suite • Our goal is different: • Generate datasets for the given query, based on possible mutations • Each dataset and query result on the dataset shown to a human, who verifies that the query result is what is expected on that dataset • We do not need to actually generate and execute mutants ICDE 2011

  7. Query and Mutation Space Query: Single Block SQL Query with unconstrained aggregation at the top Mutation Space: • Join Type Mutation : change of any of inner join, left outer join, right outer join, full outer join to another • Across all possible join orders (unlike earlier work) • Comparison Operator Mutations: An occurrence of one of (=, <, >, <=, >=, <>) in the WHERE clause replaced by another • Unconstrained Aggregation Mutations: • MIN ↔ MAX, SUM ↔ SUM(DISTINCT)AVG ↔ AVG(DISTINCT), COUNT ↔ COUNT(DISTINCT) ICDE 2011

  8. Killing Join Mutants: Basic Idea • Schema: r(A), s(B) • To kill this mutant: ensure that for some r tuple there is no matching s tuple • Generated test case • r(A) ={(1)}; s(B) ={} • Basic idea, version 1 [ICDE 2010] • run query on given database, • from result extract matching tuples for r and s • delete s tuple to ensure no matching tuple for r • Limitation: foreign keys, repeated relations ICDE 2011

  9. Need to Ensure Difference in Final Query Result • Schema: r(A,B), s(C,D), t(E); Extra join above mutated node • To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple • Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)} • Our approach: generate not-matching constraints of the form∃ r1 in r, t1 in t s.t. (r1.B=t1.E and ∃si in s s.t. (si.C=r1.A))and solve using a solver • Informally, we “nullify” s to kill the above mutation ICDE 2011

  10. Working around Foreign Key Constraints • teaches instructoris equivalent to teaches instructor if there is a foreign key from teaches.ID to instructor.ID • BUT: teachesσdept=CS(instructor)is not equivalent to teaches σdept=CS(instructor) • Key idea: have a teaches tuple with an instructor not from CS • Selections and joins can be used to kill mutations • We show that it comes for free when we “nullify” relations to kill other join mutations or generate data to kill selection mutations ICDE 2011

  11. Join Orders and Equivalent Mutations • Schema: r(A,B), s(C,D), t(E), Mutation: rs • Any result with a r.B being null will be removed by join with t • Similarly equivalence can result due to selections • But mutation of s to s would be non equivalent with join order (r t) s • SQL Query may not specify join orders; many join trees possible; the datasets should kill mutants across all join orders • Note: de la Riva [AST10] do not consider alternative join orders ICDE 2011

  12. Equivalent Trees and Equivalence Classes of Attributes • Whether query conditions written as • A.x = B.x AND B.x = C.x or as • A.x = B.x AND A.x = C.x • should not affect set of mutants generated • Solution: Equivalence classes of attributes ICDE 2011

  13. Data Generation Algorithm • Assumptions • Only PK/FK constraints, single block SQL queries, only simple arithmetic expressions, predicates are purely conjunctive (and a few more in paper) • Algorithm Overview • For each dataset generate constraints. • DB constraints : primary key, foreign key constraints • Query constraints: join conditions, selection conditions, other than conditions inconsistent with non-matching constraints (see below) • Constraints to kill mutations: not-matching constraints derived from join/selection conditions • One constraint per dataset • Generate Test Data • Using a constraint solver (currently CVC3) on above constraints ICDE 2011

  14. Main Procedure: generateDataSet(query q) • preprocess query tree • build equivalence classes • foreign key closure • generateDataSetForOriginalQuery() A constraint set with only DB and Query constraints such that the result set for q is non-empty • killEquivalenceClasses() • killOtherPredicates() • killComparisonOperators() • killAggregates() ICDE 2011

  15. Killing Join Mutants: Equijoin(killEquivalenceClasses) • Query : r s t where r.A = s.B and s.B =t.C • r.A, s.B, t.C are in one equivalence class • Generate one dataset for each element of each equivalence class • Three datasets generated, with not-matching mutation constraints: • D1: ∃r1εr ∃s1εs (r1.A = s1.B ∧ ∃tiεt (ti.C=r1.A)) • D2: ∃s1εs ∃t1εt (s1.B = t1.C ∧ ∃riεr (ri.A=s1.B)) • D3: ∃r1εr ∃t1εt (r1.A = t1.C ∧ ∃siεs (si.B=t1.C)) • But if t.C is an FK referencing s.B, instead generate • D3’: ∃r1εr (∃tiεt, siεs (si.B=r1.A ∧ ti.C=r1.A)) ICDE 2011

  16. Comparison Operation Mutations • Example of comparison operation mutations: • A < 5 vs. A <= 5 vs. A > 5 vs A >= 5 vs. A=5, vs A <> 5 • Idea: generate three datasets (leaving rest of query unchanged), one each for: • A < 5 • A = 5 • A > 5 • These datasets will kill all above mutations ICDE 2011

  17. Aggregation Operation Mutations • Aggregation operations • count(A) ↔ count(distinct A); sum(A) ↔ sum(distinct A) • avg(A) ↔ avg(distinct A); min(A) ↔ max(A) • Eg: selectmin (salary) frominstructorgroup bydept_name; • Dataset Generated: [CS, Srinivasan, 80000] [CS, Wu, 80000] [CS, Alex, 65000] will kill each of the above pairs of mutations • Issues: • Database/query constraints forcing A to be unique • Database/query constraints forcing G to be unique • Carefully crafted set of constraints, which are relaxed to handle such cases • Details in paper One group with two values equal (and <> 0), Third different ICDE 2011

  18. On Completeness and Complexity • With respect to query size: • Number of datasets generated: linear • Number of mutants: exponential • Theorem: The problem of generating data to kill a mutant is NP-hard • reduction from query containment; polynomial in special cases. • Theorem: For the class of queries, and the space of join-type and selection mutations considered, the suite of datasets generated by our algorithm is complete, i.e., the datasets kill all non-equivalent mutations of a given query • Also complete for a restricted classes of aggregation mutations with aggregation as top operation of tree, under some restrictions on joins in input ICDE 2011

  19. Performance Results • University database schema from Database System Concepts 6th Ed consisting of 7 relations • Further Optimizations : Unfolding of bounded first order quantified constraints • E.g. “for all i in 1 to 4, P(i)” unfolded as P(1) ∧ P(2) ∧ P(3) ∧ P(4) • Also tested with addition of input database constraints • Forcing output tuples to come from input database ICDE 2011

  20. Results for inner join queries • # Datasets generated small; # mutants (killed) grows exponentially • Unfolding of constraints gives substantial benefits • Increasing # foreign keys reduces the total time taken • Results similar for queries with left/right outer join ICDE 2011

  21. Results for queries with selections, aggregations • Unfolding reduces times taken, epsecially when the number of tuples are large (aggregations with joins) • (Not shown above) using input database increases time taken but not by much ICDE 2011

  22. Conclusions and Future Work • Presented a principled approach to data generation for testing queries • And showed it works efficiently for a large class of queries • We’ve only made first steps; more work to be done: • Constrained aggregations and sub-queries • Other mutations: e.g., missing/extra join and selection conditions • Multiple queries • Query and form parameters • Acknowledgements • Bhupesh Chawda for discussions/implementation help • Microsoft Research India for their generous travel support ICDE 2011

  23. Thank You Questions ICDE 2011

  24. Related Work • Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of SQL query mutations • Shortcoming: do not address test data generation • More recently (and independent of our work) de la Riva et al [AST10] address data generation using constraints, with the Alloy solver • Do not consider alternative join orders, No completeness results, Limitations on constraints ICDE 2011

  25. Dataset for Original Query • Array of tuples of constraint variables, per relation • One or more constraint tuple from array, for each occurrence of a relation • plus occurrences to ensure f.k. constraints • Equality conditions between variables based on equijoins • Other selection and join conditions become constraints • Constraints for primary and foreign keys • f.k. from R.A to S.B • FORALL i EXISTS j (O_R[i].A = O_S[j].B); • p.k. on R.A • FORALL i FORALL j (O_R[i].A = O_R[j].A]) => “all other attrs equal” • since range is over finite domain, p.k. and f.k. constraints can be unfolded • See sample set of constraints in file cvc3_0.cvc ICDE 2011

  26. Killing Join Mutants: Equijoin • killEquivalenceClasses() • for each equivalence class ec do • Let allRelations := Set of all <rel, attr> pairs in ec • for each element e in allRelations do • conds := empty set • Let e := R.a • S := (set of elements in ec which are foreign keys referencing R.a directly or indirectly) UNION R:a • P := ec - S • if P:isEmpty() then • continue • else … main code for generating constraints (see next slide) ICDE 2011

  27. Killing Join Mutants: EquiJoins • conds.add(generateEqConds(P)) • conds:add( • “NOT EXISTS i: R[i].a = ” + cvcMap(P[0])) • for all other equivalence classes oe do • conds.add(generateEqConds(oe)) • for each other predicate p do • conds:add(cvcMap(p)) • conds.add(genDBConstraints()) /*P.K. and F.K*/ • callSolver(conds) • if solution exists then • create a dataset from solver output ICDE 2011

  28. Killing Other Predicates • Create separate dataset for each attribute in predicate • e.g. For Join condition B.x = C.x + 10 • Dataset 1 (nullifying B:x): • ASSERT NOT EXISTS (i : B_INT) : (B[i].x = C[1].x + 10); • Dataset 2 (nullifying C:x): • ASSERT NOT EXISTS (i : C_INT) : (B[1].x = C[i].x + 10); ICDE 2011

More Related