330 likes | 342 Views
This paper presents an extension to the XData system for automatically generating test data to kill SQL query mutants. It explores techniques for handling commonly used SQL features and evaluates the effectiveness of these techniques in grading real SQL assignments.
E N D
Extending XData to Kill SQL Query Mutants in the Wild Bikash Chandra, Bhupesh Chawda, Shetal Shah, S. Sudarshan, Ankit Shah CSE Department, IIT Bombay http://www.cse.iitb.ac.in/infolab/xdata DBTest 2013
Testing SQL Queries SQL Queries • Queries validated by running on test data and manually checking results • Test data typically generated manually and/or in query independent way • May not catch all errors • The XData system (Shah et al. ICDE 2011) provides a principled approach to automatically generating test data • Based on “killing” query “mutations” deptnamegcount(instrID) deptnamegcount(instrID) Goal: Find # of high paid instructors in each department mutation σsal>100000 σsal>100000 instructor dept instructor dept Query Written by Programmer (omits departments with no high paid instructors) Correct (Intended) Query DBTest 2013
Mutations • Mutation: single change to a given query • Mutations model common programming errors, e.g. • Join vs. left/right/full outerjoin • Selection/join condition: e.g. • Wrong aggregate: e.g. • select vs. select distinct • And many more .. • Note: Mutant may be the intended query σsal>100000 σsal>=100000 deptnamegcount(distinct instrID) deptnamegcount(instrID) DBTest 2013
Mutation Testing of SQL Queries • Traditional use of mutation testing is to check coverage of dataset • Generate mutants of the original query Q by modifying Q in a controlled manner • A dataset kills a mutant Q’, if Q and Q’ give different results on the dataset • A suite of datasets is considered complete if each non-equivalent mutant of the given query is killed by at least one of the datasets • E.g. To kill below mutation, need a department without any high-salary instructors deptnamegcount(instrID) deptnamegcount(instrID) σsal>100000 σsal>100000 instructor dept instructor dept mutation DBTest 2013
Mutation Testing of SQL Queries • Goal of XData system: generate datasets for testing SQL queries • Testing queries: Each test dataset + query result is shown to human tester for verification. • Requirement: number of datasets should be small, and each dataset should also be small in size. • Testing SQL assignments: Run correct query and student query on each of the datasets, flag if results differ • Note that we do not actually generate and execute mutants • XData [ICDE 2011] addresses data generation • Considers a subclass of SQL queries • Number of datasets is linear in size of the query • Completeness results for a subclass of mutations • Mutations between joins and outerjoin across all join orders • Mutations between {=,<>,<,<=,>,>=} • Mutations between {min, max}, and distinct/non-distinct versions of sum, count, avg (with no constraints on aggregate values) DBTest 2013
Contributions of this paper • Extends XData [ICDE 2011] to handle many commonly used SQL features • NULL values • Constrained aggregation • String constraints, • Subqueries • Set operations • Date and time • Views • Insert/delete/update queries • Studies effectiveness of techniques for grading real SQL assignments DBTest 2013
Other Related Work • Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] • Describe a number of SQL query mutations • But do not address test data generation • Olston et al (SIGMOD09], Qex [LPAR10] • Generate data for the original query • But do not generate data to kill mutations • de la Riva et al [AST10] • Address data generation using constraints, with the Alloy solver • Consider only join/selection queries • Do not consider alternative join orders DBTest 2013
Data Generation Steps in XData • Create template datasets with constraints • Each attribute of each tuple is a variable • Add constraints on variables to model • Primary/foreign key constraints • E.g. ASSERT FORALL i EXISTS j (O_CRSE[i].dept_name= O_DEPT[j].dept_name); • Query constraints • Constraints to kill a specific mutation • This part is different for each template dataset generated • E.g. ASSERT NOT EXISTS (i: INT) (O_DEPT[1].dept_name=O_CRSE[i].dept_name) ; • Generate data for each template dataset • Using constraint solver, currently CVC3 DBTest 2013
Extensions to handle several SQL constructs/features Contributions of this paper DBTest 2013
Extension 1: Handling NULLs • CVC3 does not understand NULL values • Define values which act as placeholder NULL values, for e.g., • DATATYPE COURSE_ID = BIO101 | BIO301 | BIO399 | CS101 | CS190 | NULL_COURSE_ID_1 | NULL_COURSE_ID_2 • Add CVC3 constraints which distinguish between NULL and non-NULL values • CVC3 constraint to enforce foreign keys modified to allow null values • Generate constraints assigning null values to some variables, to kill mutations • Different placeholder NULL values assigned to different variables • Can now handle • Mutation of count(attr) to count (*), and • Mutations of IS NULL to NOT IS NULL DBTest 2013
Extension 2: Handling String Constraints • We consider following string constraints • s1 relopconst/s2, where relop in {=, <, <=, >, >=, <>} ands op const where op in {~=, LIKE, ILIKE, NOT LIKE, NOT ILIKE} • strlen(s) relopconst • Constraints on upper(s) and lower(s) • CVC3 does not support string data type • We built our own string constraint solver • Other string solvers (Hampi, Kaluza, Rex) didn’t offer features we needed • For a query containing string and non-string constraints, we • First solve string constraints independently • Then solve non-string constraints using CVC3 • We do not allow constraints between string and non-string variables DBTest 2013
Heuristic to solve string constraints • Collect all string constraints; reduce the number of constraints by propagating constants • Construct a graph • With string variables as vertices • With edges derived from constraints of the form Vi < Vj • Traverse vertices Vi in increasing order of < constraint • For each such vertex Vi, find the lexicographically smallest string that satisfies all constraints for Vi • Done by • translating each constraint (LIKE const, < const, ..) on Vi into an automaton • intersecting automatons for the different constraints • and solving to get lexicographically smallest solution DBTest 2013
Killing String Constraint Mutations • For killing mutations between string comparisons {=, <>, <, <=, >, >=} • Datasets generated for S1 = S2, S1 > S2, S1 < S2 • For killing mutations between {LIKE, ILIKE, NOT LIKE, NOT ILIKE} • Dataset 1 satisfying S1 LIKE pattern • Dataset 2 satisfying S1 ILIKE pattern but not S1 LIKE pattern • Dataset 3 not satisfying both LIKE and ILIKE conditions DBTest 2013
Extension 3: Constrained Aggregation • Example: • selectr.A, sum(s.C) from r, swherer.B = s.Bands.C < 5 groupbyr.Ahavingsum(s.C) > 20 • Major Issue: CVC3 does not support relation types. • To specify aggregation constraints like SUM(s.C) > 20 in CVC3, we have to pre-decide the number of tuples in the input to the aggregation operation. • Before generating CVC3 constraints we need to estimate • Number of tuples required to satisfy the aggregation • Translate this number to appropriate number of tuples for each base relation • E.g. for above query • At least 5 tuples in join result, for a given r.A value • If r.A is a primary key, need 5 s tuples matching a given r tuple • Else need 5 r tuples, each with a matching s tuple DBTest 2013
Constrained Aggregation (cont.) • For each aggregated attribute A, we collect • All aggregation constraints, domain constraints and non-aggregate constraints on A like A < 15 • Invariant constraints on aggregation operators like • AVG * COUNT = SUM, MIN <= MAX, .. • Solve above constraints using CVC3 to find value for COUNT • Gives number of input tuples n required to satisfy aggregation constraint • If input to the aggregation is a join, we decide number of tuples of each base relation • Heuristic based on group by attributes, the joining attributes, and unique/p.k. constraints on joining attributes • Again using CVC3 to generate values • Details in paper • Create final set of constraints for each dataset based on above solution. DBTest 2013
Other Extensions • Nested subqueries in WHERE clause • Some details in paper • Set Operations: • Generate dataset to kill mutations between UNION (ALL), INTERSECT (ALL), EXCEPT (ALL) • Handling Parameterized Queries • assign a variable to each parameter; solution given by CVC3 contains the value for the parameter • Date and Time • converted to integers • Handling Insert/Delete/Update Queries: • convert into SELECT queries and data is generated to catch mutations of the resultant SELECT queries DBTest 2013
Experimental Setup • Used XData to grade SQL query assignments in an undergrad database course at IITB • University schema from textbook (Database System Concepts, 6th ed, Silberschatz et al) • 15 assignment queries chosen, each with about 70 submissions • Ubuntu system with Intel i5, 3.30 GHz CPU and 8 GB memory • Time to generate datasets ranged from 6.7 to 49 seconds with on • Most datasets had <= 2 tuples per relation, those with aggregates had at most 5. • Average of approx. 6 datasets per query • Compared with two sample University datasets provided with textbook, and with result of manual correction by TAs DBTest 2013
Experimental Results XData does worse slightly-worse equal slightly-better better DBTest 2013
Query 8: SELECT DISTINCT course_id, title FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’ AND course_id NOT IN (SELECT course_id FROM prereq) # Queries : 79 Experimental Results • Here, XData performs better than all the other techniques • One of the reasons is it catches DISTINCT mutations DBTest 2013
Query 14: SELECT DISTINCT * FROM takes T WHERE T.grade IS NOT NULL AND (T.grade <> ‘F” OR NOT EXISTS (SELECT id, course_id FROM takes S WHERE S.grade <> ‘F’ AND T.id = S.id AND T.course_id = S.course_id)) # Queries : 67 Experimental Results Here XData performs worse as it does not handle disjunction and handles only limited cases of nested subqueries DBTest 2013
Conclusions and Future Work • Extended XData to handle a number of widely used constructs • These were important for grading student SQL assignments • Performance study shows benefits over using fixed datasets, and manual correction • Currently working on further extensions • disjunctions, extra/missing group by attribute, unintended equating of attributes in natural join • handling multiple queries in an application • Also working on a tool for grading SQL assignments DBTest 2013
Questions? DBTest 2013
Killing Join Mutants: Example 3 • Example 3: Equivalent mutation due to join • Schema: r(A,B), s(C,D), t(E) • Note: right outer join this time • Any result with a r.B being null will be removed by join with t • Similarly equivalence can result due to selections 25 DBTest 2013
Killing Join Mutants: Example 4 • teaches instructoris equivalent to teaches instructor if there is a foreign key from teaches.ID to instructor.ID • BUT: teaches σdept=CS(instructor)is not equivalent to teaches σdept=CS(instructor) • Key idea: have a teaches tuple with an instructor not from CS • Selections and joins can be used to kill mutations 26 DBTest 2013
Killing Join Mutants: Equivalent Trees • Space of join-type mutants: includes mutations of join operator of a single node for all trees equivalent to given query tree • Datasets should kill mutants across all such trees Query Tree 1 Query Tree 2 Query Tree 3 27 DBTest 2013
Equivalent Trees and Equivalence Classes of Attributes • Whether query conditions written as • A.x = B.x AND B.x = C.x or as • A.x = B.x AND A.x = C.x • should not affect set of mutants generated • Solution: Equivalence classes of attributes DBTest 2013
Background: XData (ICDE 2011) • For a class of mutations: • Join to outerjoin • selection condition, unconstrained aggregation • Algorithm to generate test data to kill all non-equivalent mutants in above class, for a (fairly large) subset of SQL. • Under some simplifying assumptions • Only primary and foreign key constraints • Single block SQL queries; no nested subqueries • Expr/functions: Only arithmetic exprs • Join/selection predictates : conjunctions of {expr relop expr} DBTest 2013
Killing Join Mutants: Example • Schema: r(A,B), s(C,D), t(E) • To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple • Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)} Presence of foreign keys, repeated relations make the problem more challenging DBTest 2013
Some Asssignment Queries DBTest 2013
Query 5: SELECT DISTINCT course.dept_name FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’ # Queries : 72 Experimental Results XData performs better than manually generated datasets and TA correction DBTest 2013
Extension 4: Nested Subqueries • Nested subqueries in the WHERE clause connected by IN, NOT IN, EXISTS, NOT EXISTS, ALL or ANY clause • We make some assumptions • Correlation conditions involve a single relation in the outer query • IN connectives involve attributes from only one outer relation • Nesting depth is at most 1 • We first generate constraints to create tuples for the outer query result ignoring the subquery and then generate constraints for the subquery DBTest 2013
Killing Subquery Mutations • Dataset generated for original query will kill mutations between IN and NOT IN, EXISTS and NOT EXISTS • For conditions of the form r.A relop scalar subquery, we generate data to kill mutations between different relops • To kill mutation of ALL versus ANY, we generate two tuples for the inner query such that one satisfies the comparison operator and other does not • Subquery Mutations : generate constraints to kill mutations of the subquery and then add constraints of the outer query DBTest 2013