370 likes | 392 Views
The "Numbers" or "Tally" Table:. What it is and how it replaces the WHILE loop - Jeff Moden - 12 February 2009. Introduction. Your speaker, Jeff Moden Very frequent poster on SQLServerCentral.com More than 14,000 posts Some are actually useful ;-) 9 Major articles
E N D
The "Numbers" or "Tally" Table: What it is and how it replaces the WHILE loop - Jeff Moden - 12 February 2009
Introduction • Your speaker, Jeff Moden • Very frequent poster on SQLServerCentral.com • More than 14,000 posts • Some are actually useful ;-) • 9 Major articles • SQL Server MVP - July 2008 • 13 years with SQL Server • Batch Job Performance "freak" Presented by Jeff Moden
Introduction • Examples from Previous Company • Was hired to fix performance problems with 3rd party software that used SQL Server. • Average of 640 Deadlocks per day with spikes to 4,000 per day. (Not a misprint!!!) • "Night Jobs" took all night and part of the next day (that is, if they didn't fail). • Job that ran 4 times a day, for 30 minutes each run, would lock the entire database for 10 minutes each. • Dupe check job took 40 minutes per day across 2 databases and 10 to 24 HOURs across 62. Frequent failures cause missed SLA. Long run was just 2 months back. Presented by Jeff Moden
Introduction • Fixes for those problems • Used "quirky" UPDATE to fix the deadlocks on the Sequence Table. Deadlocks dropped to 12 per day overnight. (Phil Factor/Robyn Page term) • Wrote and enforced company wide standards. Nightly jobs dropped from 12-14 hours to 30 minutes (or less) as repairs were made. • Rewrote 30 minute job with 10 minute lockouts... Dropped to 3.91 seconds with no lockouts. • Rewrote dupe check… Daily run dropped to 7-14 seconds. Monthly run dropped to 11 minutes. And, it hasn't failed in over 2 years that it's been in service. AND, it does 3 months back (93 databases with 4-8 million rows in each database). Presented by Jeff Moden
Introduction • What the problems all had in common… RBAR! • Pronounced "ree-bar" • Is a "Modenism" that stands for: "Row By Agonizing Row" Presented by Jeff Moden
Introduction • Forms of RBAR • Most Cursors • Most While Loops • ALL forms of Recursion • Recursive CTE's • Recursive UDF's • Some UDF's that reference tables • Most CLR's • Triangular Joins (running total) Presented by Jeff Moden
Why RBAR is Bad • SQL Server usually operates best in a "Set Based" fashion. • All forms of RBAR mean that you're telling SQL what is best instead of letting the optimizer do it's job. • Think about it… Indexes are named by column, not by row. • Results are not "sets" Presented by Jeff Moden
"Set Based" SQL • Set Based code usually doesn't use explicit loops of any form. • Set Based code doesn't use recursion. • Set Based code usually touches each row only once. • No Triangular Joins • Excludes many UDF's Presented by Jeff Moden
The Paradigm Shift to Set Based Code • "Stop thinking about how what you want to do to a Row" • "Start thinking about what you want to do to a COLUMN" • "Think about what you want to do to a column in a set of data. Presented by Jeff Moden
Introduction • Tools to Prevent RBAR • Your brain (best tool there is) • T-SQL is an "imagination limited" language. If you have a limited imagination, there will be limits on what you can do with it. • Knowledge of T-SQL • Do you know what PARSENAME does? • Did you know that SET @var = Col = exp is Legal in an Update? Presented by Jeff Moden
Introduction • Tools to Prevent RBAR (cont.) • Techniques like the "quirky" Update and "pseudo cursors". • An understanding of how many internal rows are formed. • Touching rows only once, if possible. • "Jedi Tools" like the Tally Table. • As each "Jedi" must make his own light saber, so must you make your own Tally Table. Presented by Jeff Moden
Any questions, so far? Presented by Jeff Moden
Learning How to Program • What's the first thing they teach you in just about any programming class? PRINT"Hello World" • After a bit about variables and some other things, what's the next significant thing they teach you? Presented by Jeff Moden
Looping and Counting • One of the most basic techniques in batch programming • Basic sequence of loops • Start a counter at 1 and set a limit to quit • Read something to process • Process whatever you just read • Increment the counter • Loop back to read something else until done Presented by Jeff Moden
Counting Using Loops • Simple loop with no display to measure "true" speed. --===== Count from 1 to 11000 using a WHILE loop -- with output to a variable to measure speed and duration timers DECLARE @BitBucket INT DECLARE @StartTime DATETIME DECLARE @Counter INT SET @StartTime = GETDATE() SET @Counter = 1 WHILE @Counter <= 11000 BEGIN SET @BitBucket = @Counter SET @Counter = @Counter + 1 END SELECT DATEDIFF(ms,@StartTime,GETDATE()) AS DurationMs --186 ms Presented by Jeff Moden
Counting a Different Way • Same thing using a Tally table… MUCH faster. --===== Count from 1 to 11000 using a "counter" table -- with SELECT to screen for verification that it worked DECLARE @BitBucket INT DECLARE @StartTime DATETIME SET @StartTime = GETDATE() SELECT @BitBucket = N FROM dbo.Tally WHERE N <= 11000 SELECT DATEDIFF(ms,@StartTime,GETDATE()) AS DurationMs --30ms SELECT @BitBucket--Just to show the final value of reached Presented by Jeff Moden
What's Another Name for Looping??? RBAR Presented by Jeff Moden
Upping the Stakes • Let's measure how long it takes a WHILE loop to count to a MILLION… --===== Count from 1 to 1,000,000 using a WHILE loop -- with output to a variable to measure speed and duration timers DECLARE @BitBucket INT DECLARE @StartTime DATETIME DECLARE @Counter INT SET @StartTime = GETDATE() SET @Counter = 1 WHILE @Counter <= 1000000 BEGIN SET @BitBucket = @Counter SET @Counter = @Counter + 1 END SELECT DATEDIFF(ms,@StartTime,GETDATE()) AS DurationMs --3576 ms Presented by Jeff Moden
The Tally Table Wins • If we count from 1 to a MILLION with a Tally table, it wins, hands down… --===== Count from 1 to 1,000,000 using a "counter" table -- with output to a variable to measure speed and duration timers DECLARE @BitBucket INT DECLARE @StartTime DATETIME SET @StartTime = GETDATE() SELECT @BitBucket = N FROM dbo.Tally WHERE N <= 1000000 SELECT DATEDIFF(ms,@StartTime,GETDATE()) AS DurationMs --703 MS Presented by Jeff Moden
So, what IS a Tally Table? • A Tally table is nothing more than a very well indexed table of Integers (whole numbers) starting at 1 or 0 and ending at some predefined number. • It is used to replace many explicit looping methods with "pseudo-cursors" or "set based loops". • Compared to the RBAR of explicit loops, they can make code blindingly fast. • It's all done using CROSS JOINS. Presented by Jeff Moden
Build the Tally Table Using aWhile Loop --===== Create a Tally table using a loop (890-1000 ms) SET NOCOUNT ON CREATE TABLE dbo.Tally (N INT NOT NULL) DECLARE @Counter INT SELECT @Counter = 1 WHILE @Counter <= 11000 BEGIN INSERT INTO dbo.Tally (N) --(11000 INSERTS!!!!) SELECT @Counter AS N SET @Counter = @Counter + 1 END --===== Add a Primary Key to maximize performance ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 --===== Allow the general public to use it GRANT SELECT ON dbo.Tally TO PUBLIC Presented by Jeff Moden
Build the Tally Table Using a Cross Join --===== Create and populate the Tally table on the fly -- (96-213 MS) SELECT TOP 11000--equates to more than 30 years of dates IDENTITY(INT,1,1) AS N INTO dbo.Tally FROM Master.dbo.SysColumnssc1, --Works in 2k and 2k5 Master.dbo.SysColumnssc2 --Good to 16M in 2k --Good to 121M in 2k5 --===== Add a Primary Key to maximize performance ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 --===== Allow the general public to use it GRANT SELECT ON dbo.Tally TO PUBLIC Presented by Jeff Moden
A "Tally" CTE That's Easy to Remember • Here's another way for 2k5 and up… --===== Count from 1 to 11,000 the set based way (2k5) ;WITH cteTally AS ( SELECT TOP (11000) ROW_NUMBER() OVER (ORDER BY t1.ID) AS N FROM Master.sys.SysColumns t1 CROSS JOIN Master.sys.SysColumns t2 ) SELECT N FROM cteTally --719 MS (CPU) on million row test -- <1 to 31 MS (CPU) on 11000 rows Presented by Jeff Moden
A Tally CTE for HUGE Rowcounts • …and another… --===== Itzik's method - VERY good for HUGE row counts above 121 M -- A bit slower than Tally for smaller row count ;WITH L0 AS(SELECT 1 AS C UNION ALLSELECT 1),--2 rows L1 AS(SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows L2 AS(SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows L3 AS(SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows L4 AS(SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows L5 AS(SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 Nums AS(SELECTROW_NUMBER()OVER(ORDERBY C)AS n FROM L5) SELECT N FROM Nums WHERE N <= 11000 --1123 on 1000000 Reference: Inside Microsoft SQL Server 2005: T-SQL Querying (Page 255) Itzik Ben-Gan, Lubor Kollar, Dejan Sarka Microsoft Press - ISBN 978-0-7356-2313-2 Presented by Jeff Moden
Simple Example: Splitting Characters in a String • Since a Tally table can be used to "count" like a loop, we can split all the characters out of a string… --===== Splitting characters from a string DECLARE @SomeString VARCHAR(8000) SET @SomeString = 'String 1,String 2,String 3' --===== CROSS JOIN of variable with Tally table at -- the character level SELECT SUBSTRING(@SomeString, t.N, 1) AS MyChar FROM dbo.Tally t WITH (NOLOCK) WHERE t.N <= LEN(@SomeString) Presented by Jeff Moden
Tally Table Replacesthe While Loop • Here's how it works Presented by Jeff Moden
Yeah? So What?Another Classic Example • We can split the text on delimiters to separate parameters by finding the delimiters… --===== Declare a variable to hold the delimiter DECLARE @Delim CHAR(1) SET @Delim = ',' --===== Create the test string DECLARE @SomeString VARCHAR(8000) SET @SomeString = 'String 1,String 2,String 3' --===== Do the split, again, using a Cross Join w/ Tally table SELECT SUBSTRING(@Delim+@SomeString, t.N+1, CHARINDEX(@Delim, @SomeString+@Delim, t.N)-t.N) FROM dbo.Tally t WITH (NOLOCK) WHERE SUBSTRING(@Delim+@SomeString, t.N, 1) = @Delim AND t.N <= LEN(@Delim+@SomeString) Presented by Jeff Moden
Results of 8k Character Split Presented by Jeff Moden
There's more... • We could create a function to do the split, but functions have a bit of overhead associated with them which makes them slower. • We need a way to split a whole column without a function. • To test it, we need a LOT of test data… ready? Presented by Jeff Moden
Build 1M Rows of Test Data --===== Create and populate a 1,000,000 row test table. (~36 seconds) -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' -- for all rows. -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F) -- Jeff Moden SELECT TOP 1000000 RowNum = IDENTITY(INT,1,1), SomeInt = ABS(CHECKSUM(NEWID()))%50000+1, SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65), SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)), SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY), SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME), SomeHex12 = RIGHT(NEWID(),12) INTO dbo.JBMTest FROM dbo.Tally t1 CROSS JOIN dbo.Tally t2 --===== A table is not properly formed unless a Primary Key has been assigned ALTER TABLE dbo.JBMTest ADD PRIMARY KEY CLUSTERED (RowNum) SELECT TOP 100 * FROM dbo.JBMTest Help destroy the myth about SELECT INTO locking SysObjects. Read the following article: http://support.microsoft.com/kb/153441/EN-US/ Presented by Jeff Moden
- Split a Whole Table -Try THIS with a Loop! • Now, using the Tally table as a loop generator and our friend the Cross Join, split a whole table column of CSV's… --===== Split the whole table's worth of CSV's (00:02:07) --===== Declare a variable to hold the delimiter DECLARE @Delim CHAR(1) SET @Delim = ',' --===== Do the whole table split with a count (generates 10 MILLION rows) SELECT test.RowNum, SUBSTRING(@Delim + test.SomeCSV, t.N+1, CHARINDEX(@Delim, test.SomeCSV+@Delim, t.N)-t.N) AS Value INTO #MyHead FROM dbo.Tally t WITH (NOLOCK) CROSS JOIN dbo.JBMTest test WHERE SUBSTRING(@Delim + test.SomeCSV, t.N, 1) = @Delim AND t.N <= LEN(@Delim + test.SomeCSV) Help destroy the myth about SELECT INTO locking SysObjects. Read the following article: http://support.microsoft.com/kb/153441/EN-US/ Presented by Jeff Moden
Pass "Arrays" • Admittedly, there's no such thing as true arrays in SQL Server… but you can come close. • You can easily pass 1, 2, or 3 dimensional data from a GUI to SQL Server using a Tally table. • See code for Demo. Presented by Jeff Moden
Other uses...LUHN 10 Checksum • Example (see code for demo) 11 10 9 8 7 6 5 4 3 2 1 Digit # O E O E O E O E O E O "E" = Even 4 9 9 2 7 3 9 8 7 1 6 Original x1 x2 x1 x2 x1 x2 x1 x2 x1 x2 x1 Multiplier 4 18 9 4 7 6 9 16 7 2 6 Digit Result 4+(1+8)+9+(0+4)+7+(0+6)+9+(1+6)+7+(0+2)+6 Sum Splits = 70 %10 = 0 = Valid Mod 10 References: http://www.beachnet.com/~hstiles/cardtype.html http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=76195 Presented by Jeff Moden
Other uses…Generate Date Ranges • Sometimes, you have missing dates in data that you need to mark as "0" • The Tally table can be used to generate those dates as a "Derived Table" to do an outer join with. • See Code for Demo Presented by Jeff Moden
Do Things That Most Would Create a CLR For • Example: Histogram of Letters • http://www.pluralsight.com/community/blogs/dan/archive/2006/08/26/36317.aspx Presented by Jeff Moden
Suggested Reading • The "Numbers" or "Tally" Table: What it is and how it replaces a loop.Jeff Moden, 07 May 2008 • http://www.sqlservercentral.com/articles/TSQL/62867/ • The Helper Table WorkbenchRobyn Page and Phil Factor, 16 Mar 2007 • http://www.simple-talk.com/sql/t-sql-programming/the-helper-table-workbench/#second • Passing Parameters as (almost) 1, 2, and 3 Dimensional ArraysJeff Moden, 22 May 2008 • http://www.sqlservercentral.com/articles/T-SQL/63003/ • Luhn AlgorithmPeter Larson and Jeff Moden, Dec 2006 • http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=76195 • Why should I consider using an auxiliary numbers table?ASP FAQ - Author unknown - Date unknown • http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html Presented by Jeff Moden
Quick Review • Set Based - Think "columns" • Pseudo Cursor - All done in a single query • Tally Table - Counter for Pseudo Cursors • Tally CTE's - Can be used in place of a Tally Table in 2k5 • Questions? Presented by Jeff Moden