310 likes | 487 Views
DBI310. Efficient T-SQL Querying with Itzik Ben- Gan. Itzik Ben-Gan Mentor and Co-Founder SolidQ. Unveiling…. Efficient T-SQL Querying with SQL Server Denali’s Window Functions. Agenda. Set Calculations Challenges with Traditional Querying Constructs Standard’s Answer to Challenges
E N D
DBI310 Efficient T-SQL Querying with Itzik Ben-Gan Itzik Ben-Gan Mentor and Co-Founder SolidQ
Unveiling… Efficient T-SQL Querying with SQL Server Denali’s Window Functions
Agenda • Set Calculations • Challenges with Traditional Querying Constructs • Standard’s Answer to Challenges • SQL Server’s Answer to Challenges
Set Calculations • Calculations applied to sets of rows • Categories: • Ranking • Aggregates • Offset • Distribution “Return the percent of the current order value out of the grand total, as well as the difference from the grand average” “Compute percentile ranks and percentiles to analyze the distribution of sales” “Calculate how many days since the previous customer order and until the next” “Calculate expected product stock levels/inventory” “Rank orders by order value descending” “Return consecutive periods of activity” “Calculate bank account balances”
Challenges with Traditional Querying Constructs • Constructs: • Cursors • Grouped queries • Subqueries • Joins • Challenges • Set-based thinking • Mixing detail and set calculations • Complexity • Performance
Set-Based Thinking • Cursor solutions are slow and nonrelational • Why insist on set-based? • Why is set-based thinking hard? Set definition: By a "set" we mean any collection M into a whole of definite, distinct objects m (which are called the "elements" of M) of our perception or of our thought. -- Georg Cantor
Mixing Detail and Set Calculations -- percent of total and diff from average WITH CustAggregates AS ( SELECT custid, SUM(val) AS sumval, AVG(val) AS avgval FROM Sales.OrderValues GROUP BY custid ), GrandAggregates AS ( SELECT SUM(val) AS sumval, AVG(val) AS avgval FROM Sales.OrderValues ) SELECT O.orderid, O.custid, O.val, O.val / CA.sumval AS pctcust, O.val - CA.avgval AS diffcust, O.val / GA.sumval AS pctall, O.val - GA.avgval AS diffall FROM Sales.OrderValues AS O JOIN CustAggregates AS CA ON O.custid = CA.custid CROSS JOIN GrandAggregates AS GA; • Grouped queries impose restrictions
Mixing Detail and Set Calculations -- percent of total and diff from average with filterSELECT orderid, custid, val, val / (SELECT SUM(O2.val) FROM Sales.OrderValues AS O2 WHERE O2.custid = O1.custid AND orderdate >= '20070101' AND orderdate < '20080101') AS pctcust, val - (SELECT AVG(O2.val) FROM Sales.OrderValues AS O2 WHERE O2.custid = O1.custid AND orderdate >= '20070101' AND orderdate < '20080101') AS diffcust, val / (SELECT SUM(O2.val) FROM Sales.OrderValues AS O2 WHERE orderdate >= '20070101' AND orderdate < '20080101') AS pctall, val - (SELECT AVG(O2.val) FROM Sales.OrderValues AS O2 WHERE orderdate >= '20070101' AND orderdate < '20080101') AS diffall FROM Sales.OrderValues AS O1 WHERE orderdate >= '20070101' AND orderdate < '20080101'; • Subqueries: • Verbose • Start from scratch • Each requires separate scan
Complexity • Traditional constructs often lead to more complex queries -- islands (e.g., consecutive periods of activity)SELECT MIN(col1) AS start_range, MAX(col1) AS end_range FROM (SELECT col1, (SELECT MIN(B.col1) FROM dbo.T1 AS B WHERE B.col1 >= A.col1 AND NOT EXISTS (SELECT * FROM dbo.T1 AS C WHERE C.col1 = B.col1 + 1)) AS grp FROM dbo.T1 AS A) AS D GROUP BY grp;
Performance • Some traditional query forms don’t get optimized well -- account balances SELECT S1.actid, S1.tranid, S1.val, SUM(S2.val) AS balance FROM dbo.Accounts AS S1 JOIN dbo.Accounts AS S2 ON S2.actid = S1.actid AND S2.tranid <= S1.tranid GROUP BY S1.actid, S1.tranid, S1.val;
Standard’s Answer to Challenges • Window Functions • Aggregates: partitioning, ordering, framing • Ranking: ROW_NUMBER, NTILE, RANK, DENSE_RANK • Offset: LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE • Distribution: PERCENT_RANK, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DISC • Other: • WINDOW clause, window frame exclusion, INTERVAL type • Ordered Set Functions
Set-Based • Ingenious design; not violating relational concepts • Ordering is part of the specification of the calculation • Input is a set, output is a set Set -- account balances SELECT actid, tranid, val, SUM(val) OVER(PARTITION BY actid ORDER BY tranid ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS balance FROM dbo.Accounts; Partitioning Function Ordering Framing Set
Set-Based Thinking • Allows transition from nonrelational, row-at-a-time, ordered, thinking, to relational thinking • Two modes: • Basic and incorrect (conceptually, from model’s perspective) • Deep and correct
Basic and Incorrect Thinking • Think of one image of rows and a process assigning the values a row at a time in certain order—incorrect thinking (conceptually) but correct result :) SELECT orderid, orderdate, val, RANK() OVER(ORDER BY val DESC) AS rnk FROM Sales.OrderValues; orderidorderdatevalrnk ----------- ----------------------- --------- ---- 10865 2008-02-02 00:00:00.000 16387.50 1 10981 2008-03-27 00:00:00.000 15810.00 2 11030 2008-04-17 00:00:00.000 12615.05 3 10889 2008-02-16 00:00:00.000 11380.00 4 10417 2007-01-16 00:00:00.000 11188.40 5 ...
Deep and Correct Thinking • Deep: conceptually a window of rows exists simultaneously in respect to each row—correct thinking but more advanced, when mind is ready for it… SELECT orderid, orderdate, val, RANK() OVER(ORDER BY val DESC) AS rnk FROM Sales.OrderValues; orderidorderdatevalrnk ----------- ----------------------- --------- ---- 10865 2008-02-02 00:00:00.000 16387.50 1 10981 2008-03-27 00:00:00.000 15810.00 2 11030 2008-04-17 00:00:00.000 12615.05 3 10889 2008-02-16 00:00:00.000 11380.00 4 10417 2007-01-16 00:00:00.000 11188.40 5 ...
Mixing Detail and Set Calculations • Window specific to calculation • Start point is query result -- percent of total and diff from average SELECT orderid, custid, val, val / SUM(val) OVER(PARTITION BY custid) AS pctcust, val - AVG(val) OVER(PARTITION BY custid) AS diffcust, val / SUM(val) OVER() AS pctall, val - AVG(val) OVER() AS diffall FROM Sales.OrderValues WHERE orderdate >= '20070101' AND orderdate < '20080101';
Complexity • Queries usually less complex and more concise -- traditionalSELECT orderid, custid, val, val / (SELECT SUM(O2.val) FROM Sales.OrderValues AS O2 WHERE O2.custid = O1.custid AND orderdate >= '20070101' AND orderdate < '20080101') AS pctcust, val - (SELECT AVG(O2.val) FROM Sales.OrderValues AS O2 WHERE O2.custid = O1.custid AND orderdate >= '20070101' AND orderdate < '20080101') AS diffcust, val / (SELECT SUM(O2.val) FROM Sales.OrderValues AS O2 WHERE orderdate >= '20070101' AND orderdate < '20080101') AS pctall, val - (SELECT AVG(O2.val) FROM Sales.OrderValues AS O2 WHERE orderdate >= '20070101' AND orderdate < '20080101') AS diffall FROM Sales.OrderValues AS O1 WHERE orderdate >= '20070101' AND orderdate < '20080101'; -- window functionsSELECT orderid, custid, val, val / SUM(val) OVER(PARTITION BY custid) AS pctcust, val - AVG(val) OVER(PARTITION BY custid) AS diffcust, val / SUM(val) OVER() AS pctall, val - AVG(val) OVER() AS diffall FROM Sales.OrderValues WHERE orderdate >= '20070101' AND orderdate < '20080101';
Performance • Standard doesn’t deal with performance • Still, design of functions lends itself to good optimization • Partitioning • Ordering • Etc. -- account balances SELECT actid, tranid, val, SUM(val) OVER(PARTITION BY actid ORDER BY tranid ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS balance FROM dbo.Accounts;
SQL Server’s Answer to Challenges • Two major milestones: • SQL Server 2005: • Ranking (ROW_NUMBER, NTILE, RANK, DENSE_RANK) • Aggregates, but only with partitioning • SQL Server Denali: • Aggregates, adds ordering and framing • Offset (LAG, LEAD, FIRST_VALUE, LAST_VALUE) • Distribution (PERCENT_RANK, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DISC) • Some standard functionality still missing but getting close! • Optimization in SQL Server • A few examples
Optimization in SQL Server • Same scan for same window specification • Excellent optimization for “fast track” cases – single ordered scan • Optimized Window Spool operator • In memory when frame not too large • Smart treatment of moving frames for applicable calculations
-- traditional SELECT S1.actid, S1.tranid, S1.val, SUM(S2.val) AS balance FROM dbo.Accounts AS S1 JOIN dbo.Accounts AS S2 ON S2.actid = S1.actid AND S2.tranid <= S1.tranid GROUP BY S1.actid, S1.tranid, S1.val; ExamplesRunning Totals -- window functions SELECT actid, tranid, val, SUM(val) OVER(PARTITION BY actidORDER BY tranid) AS balance FROM dbo.Accounts;
ExamplesRecency -- traditional SELECT custid, orderdate, orderid, DATEDIFF(day, (SELECT TOP (1) I.orderdate FROM Sales.Orders AS I WHERE I.custid = O.custid AND I.orderdate <= O.orderdate AND (I.orderdate < O.orderdate OR I.orderid < O.orderid) ORDER BY orderdate DESC, orderid DESC), orderdate) AS diff FROM Sales.Orders AS O; -- window functions SELECT custid, orderdate, orderid, DATEDIFF(day, LAG(orderdate) OVER(PARTITION BY custid ORDER BY orderdate, orderid), orderdate) AS diff FROM Sales.Orders;
ExamplesIslands -- traditionalSELECT MIN(col1) AS start_range, MAX(col1) AS end_range FROM (SELECT col1, (SELECT MIN(B.col1) FROM dbo.T1 AS B WHERE B.col1 >= A.col1 AND NOT EXISTS (SELECT * FROM dbo.T1 AS C WHERE C.col1 = B.col1 + 1)) AS grp FROM dbo.T1 AS A) AS D GROUP BY grp; -- window functions SELECT MIN(col1) AS start_range, MAX(col1) AS end_range FROM (SELECT col1, col1 - ROW_NUMBER() OVER(ORDER BY col1) AS grp FROM dbo.T1) AS D GROUP BY grp;
More Info • SolidQ.com • InsideTSQL.com • SQLMag.com • Book on topic shortly after Denali RTMs Questions? Thank You!
Required Slide Track PMs will supply the content for this slide, which will be inserted during the final scrub. Database Platform (DAT) Resources • Visit the updated website for SQL Server® Code Name “Denali” on www.microsoft.com/sqlserverand sign to be notified when the next CTP is available • Follow the @SQLServer Twitter account to watch for updates Try the new SQL Server Mission Critical BareMetal Hand’s on-Labs • Visit the SQL Server Product Demo Stations in the DBI Track section of the Expo/TLC Hall. Bring your questions, ideas and conversations!
Resources • Connect. Share. Discuss. http://northamerica.msteched.com Learning • Sessions On-Demand & Community • Microsoft Certification & Training Resources www.microsoft.com/teched www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet • http://microsoft.com/msdn