Testing Challenges for Extending SQL Server's Query Processor: A Case Study

Testing Challenges for Extending SQL Server's Query Processor:A Case Study Torsten Grabs, Steve Herbert, Xin (Shin) Zhang {torsteng; stevhe; xinzh}@microsoft.com DBTest2008

Agenda • Motivation • Background • Relational Data Warehousing (DW) • SQL Server 2008 Starjoin improvement • Testing Challenge • Extending Enterprise-class Commercial DBMS • Solution • Iterative development process • Multi-dimensional testing • Case Study Results • Conclusions DBTest2008

Motivation • Data warehouses are huge • Billions of rows in fact tables • Multi-terabyte database • Query response time requirements are strict • Interactive response times desired: <5 sec • Ideally: speed-of-thought response time • Plan choice is CRUCIAL for good performance • User requirements are challenging • Large input space • Zero administration overhead • Do not break existing customer base DBTest2008

Background: Relational DW Business Question: Give me total sales of SQL Server 2005 in second quarter of year 2006. Example Star Query: SELECT SUM(Dollars) FROMSales S JOINProduct P ONP.Product_Key = S.Product_Key JOINPeriodPe ONPe.Date_Key = S.Date_Key WHEREProduct_Name = 'SQL Server 2005' ANDQuarter_Number = 2 AND Year = 2006 Period Date_Key Quarter_Number Year Sales Date_Key Product_Key Qty_Sold Dollars Product Product_Key Product_ID Product_Name Category Fact Table Dimension Tables DBTest2008

Background: New Feature High selectivity queries Medium selectivity queries Low selectivity queries • Fact selectivity matters for plan choice • SQL 2008 improve medium selectivity queries Seek-based plans with nested loop joins Scan-based plans with bitmap hash joins Scan-based plans with regular hash joins 0% of fact rows qualify 100% of fact rows qualify fact table selectivity DBTest2008

DW-specific Extensions of the SQL Query Processor SQL Server Query Optimizer Standard (join) query optimizations Cost-based plan choice Final query plan Alternative query plans Standard optimization Star query plans Schema detection Selectivity analysis Optimization extension for DW Star query detection

Bitmap-based semi-join reduction Surrogate key values of rows qualifying the filter over the store dimension Hash Join Surrogate key values of rows qualifying the filter over the product dimension Join Reduction Info 1 Hash Join Join Reduction Info 2 Filter Rowset afterjoin reduction Fact Table Scan Join Reduction Processing Filter Product dimension table Join Reduction Info 2 Join Reduction Info 1 Store dimension table Rowset before join reduction

Testing Challenge • Large input space of queries • Full range of selectivity • Mixed ad-hoc and parameterized queries • Complex schema and workloads • Automatic feature • Correct cost based plan choice • Smart plan pattern detection • Accurate join selectivity estimation • No knobs – no application changes required • Happy existing customers • Significant improvements • Negligible regressions DBTest2008

Agenda • Motivation • Background • Relational Data Warehousing (DW) • SQL Server 2008 Starjoin improvement • Testing Challenge • Extending Enterprise-class Commercial Server • Solution • Iterative development process • Multi-dimensional testing • Case Study Results • Conclusions DBTest2008

Iterative Development Process • In-cycle validation of assumptions • Mitigates risk of major end-of-cycle issues • Especially performance problems • Maximal paralleling of testing and developing efforts DBTest2008

Multi-Dimensional Testing • Functional testing • Target testing to ensure core functionality • Model-based testing to ensure coverage • Performance testing • Component • Benchmark • Customer Workloads DBTest2008

Functional: Target Testing DBTest2008

Functional: Model-Based Testing • Large number of test dimensions • 10+ test dimensions … • If assume 3 variations each … • will generate 60K combinations! • Two abstract models covering key requirements • Schema model • Database schema and data • Query model • Star-join queries built on top of the schema model DBTest2008

Functional: Schema Model DBTest2008

Functional: Query Model DBTest2008

Model-based Test Example • Test scenario • Testing selectivity estimation of single fact star schema • Schema model • Number and classification of tables: fact 1, dimension 5 • Relationships between tables: star schema • Cardinality: fact 100K rows, dimension 10 rows each • Data distribution: uniform • Query model • Number of facts: 1 • Number of dimensions: 10 • Dimension selectivity: 0.4~0.8 (5 choices) • Fact aggregation: 1 aggregation (12 possible types) • Nested subqueries: none • Fact selectivity: 0.1~1.0 • Single test covers 55*12 (37,836) tests cases DBTest2008

Performance Testing DBTest2008

Case Study Results • ~10 different workloads • 3 representative results • Decision support workload results • Microsoft sales data warehouse results • Retail workload results DBTest2008

Results: Decision Support Workload • 100GB data • 70+ queries • Typical DSS scenario • Schema • Queries • Limited performance benefit for initial design • Lots of regressions initially • Good convergence over several iterations DBTest2008

Results: Microsoft Sales DW • 750GB data • 50 queries • Complex queries • > 20 joins • Started with good design for performance • But: too many regressions with initial design • Converge to good result over several iterations DBTest2008

Results: Retail Workload • 100GB data • 30 queries • Complex physical design • Indexes • Partitioning • Several iterations to establish the “winning” design • Significant improvements after several iterations • Regressions limited to “2 wrongs make 1 right” (see Giakoumakis/Galindo-LegariaTKDE 2008) DBTest2008

Conclusions • Extension of the SQL Server in relational DW • New feature with zero administration overhead • Widely deployed system • Identified testing challenges • Balance performance improvement and regression risk • Solution • Iterative development and testing cycles • Multi-dimensional testing (functional, performance) • Iterative development and testing insights • Supports learning and adjustment during development • Delivers well-understood results • Leads to high-quality features DBTest2008

Q&A http://www.microsoft.com/sql/2008Torsten torsteng@microsoft.comSteve stevhe@microsoft.comShin shin.zhang@microsoft.com DBTest2008

Testing Challenges for Extending SQL Server's Query Processor: A Case Study

Testing Challenges for Extending SQL Server's Query Processor: A Case Study

Presentation Transcript

Chapter 5 Case Study: MVC Architecture for Web Applications

6-Performance Analysis of Embedded System Designs: Digital Camera Case Study

Mutualism and Commensalism

Statistical Analysis in Case-Control studies

HUD RHIIP Training – PH/HCV

NIC Exposure Level Training

Concurrency Testing Challenges, Algorithms, and Tools

QUERY OPTIMIZATION AND QUERY PROCESSING

Wii3D – Extending the Wiimote into 3D

Schizophrenia

6 . Distributed Query Optimization

Adaptive Query Processing with Eddies

Black-Box Testing Techniques III

ARM Processor Architecture

Chapter 5 Case Study: MVC Architecture for Web Applications

Chapter 20

INTRODUCTION TO PEOPLESOFT QUERY

Black-Box Testing Techniques II

Randomized Controlled Trial

Testing Software Systems

David DiBattista, Ph.D. Brock University Psychology Department