240 likes | 364 Views
Testing Challenges for Extending SQL Server's Query Processor: A Case Study. Torsten Grabs, Steve Herbert, Xin (Shin) Zhang {torsteng; stevhe; xinzh}@microsoft.com . Agenda. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge
E N D
Testing Challenges for Extending SQL Server's Query Processor:A Case Study Torsten Grabs, Steve Herbert, Xin (Shin) Zhang {torsteng; stevhe; xinzh}@microsoft.com DBTest2008
Agenda • Motivation • Background • Relational Data Warehousing (DW) • SQL Server 2008 Starjoin improvement • Testing Challenge • Extending Enterprise-class Commercial DBMS • Solution • Iterative development process • Multi-dimensional testing • Case Study Results • Conclusions DBTest2008
Motivation • Data warehouses are huge • Billions of rows in fact tables • Multi-terabyte database • Query response time requirements are strict • Interactive response times desired: <5 sec • Ideally: speed-of-thought response time • Plan choice is CRUCIAL for good performance • User requirements are challenging • Large input space • Zero administration overhead • Do not break existing customer base DBTest2008
Background: Relational DW Business Question: Give me total sales of SQL Server 2005 in second quarter of year 2006. Example Star Query: SELECT SUM(Dollars) FROMSales S JOINProduct P ONP.Product_Key = S.Product_Key JOINPeriodPe ONPe.Date_Key = S.Date_Key WHEREProduct_Name = 'SQL Server 2005' ANDQuarter_Number = 2 AND Year = 2006 Period Date_Key Quarter_Number Year Sales Date_Key Product_Key Qty_Sold Dollars Product Product_Key Product_ID Product_Name Category Fact Table Dimension Tables DBTest2008
Background: New Feature High selectivity queries Medium selectivity queries Low selectivity queries • Fact selectivity matters for plan choice • SQL 2008 improve medium selectivity queries Seek-based plans with nested loop joins Scan-based plans with bitmap hash joins Scan-based plans with regular hash joins 0% of fact rows qualify 100% of fact rows qualify fact table selectivity DBTest2008
DW-specific Extensions of the SQL Query Processor SQL Server Query Optimizer Standard (join) query optimizations Cost-based plan choice Final query plan Alternative query plans Standard optimization Star query plans Schema detection Selectivity analysis Optimization extension for DW Star query detection
Bitmap-based semi-join reduction Surrogate key values of rows qualifying the filter over the store dimension Hash Join Surrogate key values of rows qualifying the filter over the product dimension Join Reduction Info 1 Hash Join Join Reduction Info 2 Filter Rowset afterjoin reduction Fact Table Scan Join Reduction Processing Filter Product dimension table Join Reduction Info 2 Join Reduction Info 1 Store dimension table Rowset before join reduction
Testing Challenge • Large input space of queries • Full range of selectivity • Mixed ad-hoc and parameterized queries • Complex schema and workloads • Automatic feature • Correct cost based plan choice • Smart plan pattern detection • Accurate join selectivity estimation • No knobs – no application changes required • Happy existing customers • Significant improvements • Negligible regressions DBTest2008
Agenda • Motivation • Background • Relational Data Warehousing (DW) • SQL Server 2008 Starjoin improvement • Testing Challenge • Extending Enterprise-class Commercial Server • Solution • Iterative development process • Multi-dimensional testing • Case Study Results • Conclusions DBTest2008
Iterative Development Process • In-cycle validation of assumptions • Mitigates risk of major end-of-cycle issues • Especially performance problems • Maximal paralleling of testing and developing efforts DBTest2008
Multi-Dimensional Testing • Functional testing • Target testing to ensure core functionality • Model-based testing to ensure coverage • Performance testing • Component • Benchmark • Customer Workloads DBTest2008
Functional: Target Testing DBTest2008
Functional: Model-Based Testing • Large number of test dimensions • 10+ test dimensions … • If assume 3 variations each … • will generate 60K combinations! • Two abstract models covering key requirements • Schema model • Database schema and data • Query model • Star-join queries built on top of the schema model DBTest2008
Functional: Schema Model DBTest2008
Functional: Query Model DBTest2008
Model-based Test Example • Test scenario • Testing selectivity estimation of single fact star schema • Schema model • Number and classification of tables: fact 1, dimension 5 • Relationships between tables: star schema • Cardinality: fact 100K rows, dimension 10 rows each • Data distribution: uniform • Query model • Number of facts: 1 • Number of dimensions: 10 • Dimension selectivity: 0.4~0.8 (5 choices) • Fact aggregation: 1 aggregation (12 possible types) • Nested subqueries: none • Fact selectivity: 0.1~1.0 • Single test covers 55*12 (37,836) tests cases DBTest2008
Performance Testing DBTest2008
Case Study Results • ~10 different workloads • 3 representative results • Decision support workload results • Microsoft sales data warehouse results • Retail workload results DBTest2008
Results: Decision Support Workload • 100GB data • 70+ queries • Typical DSS scenario • Schema • Queries • Limited performance benefit for initial design • Lots of regressions initially • Good convergence over several iterations DBTest2008
Results: Microsoft Sales DW • 750GB data • 50 queries • Complex queries • > 20 joins • Started with good design for performance • But: too many regressions with initial design • Converge to good result over several iterations DBTest2008
Results: Retail Workload • 100GB data • 30 queries • Complex physical design • Indexes • Partitioning • Several iterations to establish the “winning” design • Significant improvements after several iterations • Regressions limited to “2 wrongs make 1 right” (see Giakoumakis/Galindo-LegariaTKDE 2008) DBTest2008
Conclusions • Extension of the SQL Server in relational DW • New feature with zero administration overhead • Widely deployed system • Identified testing challenges • Balance performance improvement and regression risk • Solution • Iterative development and testing cycles • Multi-dimensional testing (functional, performance) • Iterative development and testing insights • Supports learning and adjustment during development • Delivers well-understood results • Leads to high-quality features DBTest2008
Q&A http://www.microsoft.com/sql/2008Torsten torsteng@microsoft.comSteve stevhe@microsoft.comShin shin.zhang@microsoft.com DBTest2008