310 likes | 503 Views
Unit-testing Query Transformation Rules. Mostafa Elhemali Leo Giakoumakis. Outline. Problem definition QRel system overview Case Study Conclusion. Problem definition – QP background. Query processing in SQL Server:. ?. Profit. π T.a,R.z. π T.a,R.z. σ T.c <5. Merge ⋈ T.b = R.y.
E N D
Unit-testing Query Transformation Rules Mostafa Elhemali Leo Giakoumakis
Outline • Problem definition • QRel system overview • Case Study • Conclusion
Problem definition– QP background • Query processing in SQL Server: ? Profit πT.a,R.z πT.a,R.z σT.c<5 Merge ⋈T.b=R.y Parse Optimize SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5 ⋈T.b=R.y σT.c<5 Sort on y T R Scan Index T.Ib Scan Table R
Problem definition – QP background Transformation rules πT.a,R.z πT.a,R.z σT.c<5 σT.c<5 Merge ⋈T.b=R.y ⋈ ⋈ Merge ⋈ ⋈ ⋈T.b=R.y σT.c<5 σT.c<5 Sort on y ⋈ Hash ⋈ T R Scan Index T.Ib Scan Table R
Problem definition– Input mismatch What we want to test System Output System Input πT.a,R.z σT.c<5 Merge ⋈T.b=R.y SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5 ⋈ ⋈ σT.c<5 σT.c<5 Sort on y Scan Index T.Ib Scan Table R
Problem definition - Example • Rule to test: Push Aggregates Below Join ⋈ G G ⋈
Problem definition - Example • Manual Approach – Construct equivalent SQL • Manually construct equivalent SQL constructs • Think of which tables to use • Think of exact join predicate, grouping columns, aggregates, etc. ⋈ G G ⋈ SELECT T.a,SUM(T.c) FROM T JOIN R on T.a= R.y GROUP BY T.a
Problem definition - Example • Manual Approach – Variation on basic case • Project between Aggregation and Join SELECT M.Tb,SUM(M.p) FROM ( SELECT T.b Tb, (T.a + T.c) p FROM T JOIN R on T.b = R.y) M GROUP BY M.Tb ⋈ G π π G ⋈
Problem definition– Challenges • Manual construction of SQL test cases is hard • Manual variation of these cases is harder • SQL test cases are over-specified • Harder to maintain Is the goal to test Aggregation over asum? SELECT M.Tb,SUM(M.p) FROM ( SELECT T.b Tb, (T.a + T.c) p FROM T JOIN R on T.b = R.y) M GROUP BY M.Tb Is the choice of T & R significant?
Goal Allow testers to write test cases in abstract relational trees: σT.c<5 • Yet present SQL Server with concrete SQL queries: ⋈ SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5
QRel overview RelOps OpFilter Expression OpJoin Predicate … SqlGen RelGen σ T.x=2 σ T.x=2 SELECT … FROM T JOIN T2 ON T.a = T2.b WHERE T.x = 2 σ T.x=2 ⋈T.a=T2.b ⋈T.a=T2.b T T T2 T2 T
RelOps– Goal • Allow testers to write their relational tree test cases OpFilter σT.c<5 PredCompare(LT) OpInnerJoin Represented as ⋈ ExpColumn(T.c) ExpScalar(5)
RelOps • .NET classes for all relational and scalar operators expressible in SQL • Relational: Join, Selection, Sort, … • Scalar: Predicates, Expressions • Does not represent any operator not expressible in SQL, e.g. semi-join • Metadata extraction • Basic property derivation • Output columns for relational operators • Data type for scalar expressions
RelGen– Goal • Allow testers to write only the essential parts of the relational tree, and automatically fill out the rest πT.a,R.z σT.c<5 σT.c<5 ⋈ ⋈ T.b=R.y T R
RelGen– Overall algorithm • Top-down generation of relational tree skeleton, • Followed by bottom-up filling out phase
RelGen– Example Predicate Generators Possible operators CompareTwoColumns T.a,R.z → π π CompareColumnAndScalar InSubquery σ σ σ ⋈ ⋈ T.c<5 G … T.b=R.y … a,b,c x,y Available relations R R T T
RelGen– Targeted generation • Tree generation is highly customizable for targeted testing • Can start from a partially filled out tree • Customizable probability distributions for any random parts, e.g. choice of relational operators , choice of predicates • Use of constraints to influence the tree generation
RelGen– Considerations • Trees should be free of logical errors • E.g. No Aggregation over XML columns • Trees (and subtrees) should not be trivially optimized away • Avoid contradicting predicates • Operators should yield (many) rows if executed • Reasoning: makes them more expensive, lures optimizer into deeper optimizations. Also we use QRel to test all of QP at times • Use real data in predicates, e.g. Country = ‘England’ • Use PK-FK columns in joins
SqlGen– Goal • Presents the server with proper SQL queries from the relational tree test cases πT.a,R.z SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5 σT.c<5 ⋈T.b=R.y T R
SqlGen– Challenges • Subquery and derived table formation • SQL clause formation • Table and column aliasing (inverse of binding)
SqlGen– Example π SUM(T.a),R.z ⋈ T.b=R.y SELECT SUM(T1_1.C2) AS C1, T1_2.z AS C2 FROM ( σ R T.b<5 SELECT T2_1.b AS C1, SUM(T2_1.a) AS C2 FROM T AS T2_1 GROUP BY T2_1.b G T.b,SUM(T.a) HAVING T2_1.b < 5 ) AS T1_1 JOIN R AS T1_2 ON T1_1.C1 = T1_2.y T
Case Study • Rule to test: Cannot be represented U σ ApplySJ Preliminary rule Target rule ApplySJ ApplySJ R UA OR R σ1 σ2 R R σ1 σ2 EXISTS EXISTS σ1 σ2 ApplySJ = Apply-Semi-Join UA = Union All U = Union
Case Study– Basic test case • RelOps code: PredExists a = new PredExists(new OpFilter(new OpGet(), null)); PredExists b = new PredExists(new OpFilter(new OpGet(), null)); PredBinaryOp p = new PredBinaryOp(a, b, PredBinaryOp.LogicOp.Or); OpFiltermainFilter = new OpFilter(new OpGet(), p); • … And that’s the basic test case
Case Study – Basic test case • Goes into RelGen (using TPC-H database): Predicate Generators σ O_SHIPPRIORITY, O_COMMENT, … CorrelationPredicate OR EXISTS EXISTS Available relations σ σ S_NATIONKEY >= O_SHIPPRIORITY S_ADDRESS < O_COMMENT ORDERS ORDERS S_NATIONKEY, S_ADDRESS, … SUPPLIER SUPPLIER SUPPLIER …
Case Study – Basic test case • Goes into SqlGen: σ SELECT * FROM [ORDERS] AS T1_1 WHERE (EXISTS ( SELECT 1 AS C1 FROM [SUPPLIER] AS T2_1 WHERE T2_1.S_NATIONKEY >= T1_1.O_SHIPPRIORITY)) OR (EXISTS ( SELECT 1 AS C1 FROM [SUPPLIER] AS T2_1 WHERE T2_1.S_ADDRESS < T1_1.O_COMMENT)) ORDERS OR EXISTS EXISTS σ σ S_NATIONKEY >= O_SHIPPRIORITY S_ADDRESS < O_COMMENT SUPPLIER SUPPLIER
Case Study– Beyond the basic case • More random exploration • Embed the basic tree pattern into completely random trees • Systematic exploration of various dimensions • Number of subquery disjunctives • Add scalar disjunctives alongside the relational ones • Add more operators in the relational disjunctives • …
Verification • How do we verify the correct behavior for semi-random queries? • Be creative!
Verification – Example approaches • Rule modeling • Do the same transformation in the input query (usually very easy in RelOps) • Present the two queries to the optimizer • Should get the same plan • Only works when the rule output is expressible in SQL • Rules on & off • Turn the rule under test off • Should get a different plan, but same results • Only works for non-essential exploration rules
Conclusion • Testing the Query Optimizer transformation rules using abstract relational trees • Utilizing QRel to go from abstract relational trees to concrete SQL queries • Future directions • Libraries of abstract relational trees • More advanced customizations of tree generation • Combinatorial techniques for systematic exploration of various trees