470 likes | 697 Views
Parallel Execution Plans. Joe Chang jchang6@yahoo.com www.sql-server-performance.com/joe_chang.asp. Parallel Execution Plans. Allows single query to use multiple processors Query should run faster but may consume more resources Example 1 thread: 10 sec run time, 10 CPU-sec
E N D
Parallel Execution Plans Joe Chang jchang6@yahoo.com www.sql-server-performance.com/joe_chang.asp
Parallel Execution Plans • Allows single query to use multiple processors • Query should run faster but may consume more resources • Example • 1 thread: 10 sec run time, 10 CPU-sec • 2 threads: 6 sec run time, 12 CPU-sec
Parallel Execution Configuration • Cost Threshold For Parallelism • Minimum query plan threshold for considering queries for parallel execution • Default 5: Considering increasing to 20-50 for new systems • Max Degree of Parallelism • Default 0: Can use all available processors • SQL Server determines level based on available memory and recent CPU usage
Parallel Plan Operators The Distribute Streams operator consumes a single input stream of records and produces multiple output streams. The record contents and format are not changed. Each record from the input stream appears in one of the output streams. This operator automatically preserves the relative order of the input records in the output streams. Usually, hashing is used to decide to which output stream a particular input record belongs. The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. Each record from an input stream is placed into one output stream. If this operator is order-preserving, then all input streams must be ordered and merged into several ordered output streams. The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order-preserving, then all input streams must be ordered.
Execution Plan Cost Formulas Index Seek – Plan Formula I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB) = 0.003203425 + 0.000740741 per additional page (>1GB) CPU Cost = 0.000079600 + 0.000001100 per additional row Bookmark Lookup – May have changed ? I/O Cost = multiple of 0.006250000 (≤1GB) = multiple of 0.003124925 (>1GB) CPU Cost = 0.0000011 per row Table Scan or Index Scan I/O: 0.0375785 + 0.0007407 per page CPU: 0.0000785 + 0.0000011 per row Table Scan or Index Scan IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows) IUD CPU Cost = 0.000001 per row
Cost Interpretation • Time in seconds? CPU time? Too fast for 7200RPM disk random I/Os. 0.0062500sec -> 160/sec 0.000740741 ->1350/sec (8KB) ->169/sec(64K)-> 10.8MB/sec About right for 1997 sequential disk transfer rate? S2K BOL: Administering SQL Server, Managing Servers, Setting Configuration Options: cost threshold for parallelism Opt Query cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.
Test Table CREATE TABLE M3A_20 ( GroupID int NOT NULL, ID int NOT NULL, ID2 int NOT NULL, ID3 int NOT NULL, ID4 int NOT NULL, sID smallint NOT NULL, bID1 bigint NOT NULL, bID2 bigint NOT NULL, bID3 bigint NOT NULL, rMoney money NOT NULL, rDate datetime NOT NULL, rReal real NOT NULL, rDecimal decimal (9,4) NOT NULL, CONSTRAINT [PK_M3A_20] PRIMARY KEY CLUSTERED ( [GroupID], [ID] ) WITH FILLFACTOR = 100 ) GO
Data Population Script 1 SET NOCOUNT ON DECLARE @BatchTotal int, @BatchSize int, @TotalRows int, @BatchStart int, @BatchEnd int, @BatchRow int, @I int, @RowsPerPage bigint , @Card int , @DistinctValues int SELECT @BatchStart=1, @BatchEnd=1000, @BatchTotal=1000, @BatchSize=100000, @RowsPerPage=100, @Card=100000 SELECT @TotalRows=@BatchTotal*@BatchSize SELECT @I=(@BatchStart-1)*@BatchSize+1, @DistinctValues=@TotalRows/@Card WHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal) VALUES ( 1, @I, @TotalRows-@I+1, (@I-1)/@Card+1, (@TotalRows-@I)%@Card+1, @I%32768, @I, (@I-1)%@Card+1, 1+(@I-1)*@RowsPerPage/@TotalRows+((@I-1)*@RowsPerPage)%@TotalRows, 10000*rand(), DATEADD(hour,@I%3000000,'1900-01-01'), 10000*rand(), 10000*rand() ) IF @@ERROR > 0 BEGIN GOTO B END SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINT PRINT CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) SET @BatchStart = @BatchStart+1 END B: IF @@TRANCOUNT > 0 COMMIT TRANSACTION PRINT '01 Complete ' + CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) + ', Trancount ' + CONVERT(varchar(10),@@TRANCOUNT)
Data Population Script 1 Notes • Double While Loop • Each Insert/Update/Delete statement is an implicit transaction • Gets separate transaction log entry • Explicit transaction – generates a single transaction log write (max 64KB per IO) • Single TRAN for entire loop requires excessively large log file • Inserts are grouped into intermediate size batches
Data Population Scripts 2 Primary table populated using single row inserts in a WHILE loop, Additional tables populated with INSERT / SELECT statement Single row inserts ~20-30K rows/sec INSERT / SELECT statement ~100K+ rows/sec DECLARE @L int SELECT @L = 1 WHILE @L <= 3 BEGIN INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP 500000 GroupID, ID, 1500001-ID, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal FROM M3A_20 WHERE GroupID = 1 AND ID BETWEEN (@L-1)*500000+1 AND @L*500000 SELECT @L = @L + 1 CHECKPOINT PRINT '11 Step ' + CONVERT(varchar,@L) + ', ' + CONVERT(varchar,GETDATE(),121) END UPDATE STATISTICS M3A_01 (PK_M3A_01) WITH FULLSCAN CREATE STATISTICS ST_01 ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE
Index Seek Plans Many rows returned, Non-parallel plan Cost: 9.34 Parallel Execution disabled Cost: 9.82 Cost: 4.94 Parallel Plan
Index Seek Details Non-parallel plan Parallel plan
Index Seek – Non-parallel Cost assigned to SELECT Index Seek, 1M rows in 11,115 pages (81 bytes/row, 90% Fill) I/O cost is: 8.2365 CPU Cost is 1.1000785 Cost & sub-tree Cost is correct, I/O & CPU is ½ of correct value
Index Seek – Parallel Plan No cost assigned to SELECT Index Seek I/O and CPU cost ½ of non-parallel plan
Index Seek with Aggregate 4 3 2 1
Table Scan Cost: 9.01 Cost: 8.26
Table Scan Details Non-parallel plan Parallel plan I/O cost same CPU cost ½ of non parallel plan
Table Scan Details Non-parallel plan Parallel plan No cost No cost on Select I/O cost same CPU cost ½ of non parallel plan
Parallel Plan Cost Formulas Patterns • CPU costs are ½ of non-parallel plan • Index Seek I/O cost are also ½ • Scan I/O cost is same as non-parallel plan • Parallel plan costs are based on 2 processors • Actual number of processors determined at runtime • Overhead operations • Distribute, Repartition & Gather Streams
Hash Join 200,000 rows 15 byte OS row size Cost: 6.50 Cost: 4.79
Hash Join Details Non-parallel plan Parallel plan
Hash Join Details Non-parallel plan Parallel plan
Hash Join – Parallel Plan 4 3 2 1 2 4 3 1
Hash Join with I/O Cost 900,000 rows MAXDOP 1 Cost 85.1 Cost 74.1
Hash Join – Join I/O Cost 730,000 rows 740,000 rows
Hash Join Cost Formula Index Seek – Plan Formula I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB) = 0.003203425 + 0.000740741 per additional page (>1GB) CPU Cost = 0.000079600 + 0.000001100 per additional row Hash Join CPU Cost = 0.017750000 base + 0.0000001749 (2-30 rows) + 0.0000000720 (100 rows) 0.000015091 per row 0.000015857 (parallel) + 0.000001880 per row per 4 bytes in OS + 0.000005320 per additional row in IS I/O Cost = 0.000042100 per row over 64MB (Row Size+8) 0.0000036609 per 4 byte over 15B
Parallel Cost Formula Base Cost 0.028500 Repartition Stream Cost per row = 0.0000024705 Base (15 Bytes) + 0.000000759 per additional 4 Bytes Gather Stream Cost per row = 0.0000018735 Base(15) + 0.000000759 per additional 4 Bytes Dispatch
Loop Join Details Non-parallel plan Outer Source Parallel plan Outer Source
Loop Join Details Inner Source cost identical for both non-parallel and parallel plans
Loop Join Details Non-parallel plan Parallel plan
Merge Join Details Non-parallel plan Parallel plan
Merge Join Details Non-parallel plan Parallel plan
Merge Join Details Non-parallel plan Parallel plan
Index Seek + Aggregate Test Xeon 2.4GHz/512K Opteron2.2GHz 1M
Index Seek + Aggregate Test, Itanium 2 Itanium 2 1.5GHz/6M
Index Seek + Aggregate Test, SUM(INT) Itanium 2 1.5GHz/6M
Index Seek + Aggregate Test, NULL Itanium 2 1.5GHz/6M
Loop Join, COUNT(*) Itanium 2 1.5GHz/6M
Hash Join, COUNT(*) Itanium 2 1.5GHz/6M
Merge Join, COUNT(*) Itanium 2 1.5GHz/6M
General Recommendations • Useful in DW, ETL, and maintenance activities • Use judgment on transactions processing • Is throughput more important • Or faster expensive queries • Increase Cost Threshold from 5 to 20-50 • Limit MAXDOP to 4 • Verify or limit parallelism on Xeon systems with Hyper-Threading enabled
Additional Information www.sql-server-performance.com/joe_chang.asp SQL Server Quantitative Performance Analysis Server System Architecture Processor Performance Direct Connect Gigabit Networking Parallel Execution Plans Large Data Operations Transferring Statistics SQL Server Backup Performance with Imceda LiteSpeed jchang6@yahoo.com