1 / 47

Parallel Execution Plans

Parallel Execution Plans. Joe Chang jchang6@yahoo.com www.sql-server-performance.com/joe_chang.asp. Parallel Execution Plans. Allows single query to use multiple processors Query should run faster but may consume more resources Example 1 thread: 10 sec run time, 10 CPU-sec

zoey
Download Presentation

Parallel Execution Plans

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Execution Plans Joe Chang jchang6@yahoo.com www.sql-server-performance.com/joe_chang.asp

  2. Parallel Execution Plans • Allows single query to use multiple processors • Query should run faster but may consume more resources • Example • 1 thread: 10 sec run time, 10 CPU-sec • 2 threads: 6 sec run time, 12 CPU-sec

  3. Parallel Execution Configuration • Cost Threshold For Parallelism • Minimum query plan threshold for considering queries for parallel execution • Default 5: Considering increasing to 20-50 for new systems • Max Degree of Parallelism • Default 0: Can use all available processors • SQL Server determines level based on available memory and recent CPU usage

  4. Parallel Plan Operators The Distribute Streams operator consumes a single input stream of records and produces multiple output streams. The record contents and format are not changed. Each record from the input stream appears in one of the output streams. This operator automatically preserves the relative order of the input records in the output streams. Usually, hashing is used to decide to which output stream a particular input record belongs. The Repartition Streams operator consumes multiple streams and produces multiple streams of records. The record contents and format are not changed. Each record from an input stream is placed into one output stream. If this operator is order-preserving, then all input streams must be ordered and merged into several ordered output streams. The Gather Streams operator consumes several input streams and produces a single output stream of records by combining the input streams. The record contents and format are not changed. If this operator is order-preserving, then all input streams must be ordered.

  5. Execution Plan Cost Formulas Index Seek – Plan Formula I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB) = 0.003203425 + 0.000740741 per additional page (>1GB) CPU Cost = 0.000079600 + 0.000001100 per additional row Bookmark Lookup – May have changed ? I/O Cost = multiple of 0.006250000 (≤1GB) = multiple of 0.003124925 (>1GB) CPU Cost = 0.0000011 per row Table Scan or Index Scan I/O: 0.0375785 + 0.0007407 per page CPU: 0.0000785 + 0.0000011 per row Table Scan or Index Scan IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows) IUD CPU Cost = 0.000001 per row

  6. Cost Interpretation • Time in seconds? CPU time? Too fast for 7200RPM disk random I/Os. 0.0062500sec -> 160/sec 0.000740741 ->1350/sec (8KB) ->169/sec(64K)-> 10.8MB/sec About right for 1997 sequential disk transfer rate? S2K BOL: Administering SQL Server, Managing Servers, Setting Configuration Options: cost threshold for parallelism Opt Query cost refers to the estimated elapsed time, in seconds, required to execute a query on a specific hardware configuration.

  7. Test Table CREATE TABLE M3A_20 ( GroupID int NOT NULL, ID int NOT NULL, ID2 int NOT NULL, ID3 int NOT NULL, ID4 int NOT NULL, sID smallint NOT NULL, bID1 bigint NOT NULL, bID2 bigint NOT NULL, bID3 bigint NOT NULL, rMoney money NOT NULL, rDate datetime NOT NULL, rReal real NOT NULL, rDecimal decimal (9,4) NOT NULL, CONSTRAINT [PK_M3A_20] PRIMARY KEY CLUSTERED ( [GroupID], [ID] ) WITH FILLFACTOR = 100 ) GO

  8. Data Population Script 1 SET NOCOUNT ON DECLARE @BatchTotal int, @BatchSize int, @TotalRows int, @BatchStart int, @BatchEnd int, @BatchRow int, @I int, @RowsPerPage bigint , @Card int , @DistinctValues int SELECT @BatchStart=1, @BatchEnd=1000, @BatchTotal=1000, @BatchSize=100000, @RowsPerPage=100, @Card=100000 SELECT @TotalRows=@BatchTotal*@BatchSize SELECT @I=(@BatchStart-1)*@BatchSize+1, @DistinctValues=@TotalRows/@Card WHILE @BatchStart <= @BatchEnd BEGIN BEGIN TRANSACTION SELECT @BatchRow = @BatchStart*@BatchSize WHILE @I <= @BatchRow BEGIN INSERT M3A_20 (GroupID, ID, ID2, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal) VALUES ( 1, @I, @TotalRows-@I+1, (@I-1)/@Card+1, (@TotalRows-@I)%@Card+1, @I%32768, @I, (@I-1)%@Card+1, 1+(@I-1)*@RowsPerPage/@TotalRows+((@I-1)*@RowsPerPage)%@TotalRows, 10000*rand(), DATEADD(hour,@I%3000000,'1900-01-01'), 10000*rand(), 10000*rand() ) IF @@ERROR > 0 BEGIN GOTO B END SET @I = @I+1 END COMMIT TRANSACTION CHECKPOINT PRINT CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) SET @BatchStart = @BatchStart+1 END B: IF @@TRANCOUNT > 0 COMMIT TRANSACTION PRINT '01 Complete ' + CONVERT(varchar,GETDATE(),121) + ', row ' + CONVERT(varchar,@BatchRow) + ', Trancount ' + CONVERT(varchar(10),@@TRANCOUNT)

  9. Data Population Script 1 Notes • Double While Loop • Each Insert/Update/Delete statement is an implicit transaction • Gets separate transaction log entry • Explicit transaction – generates a single transaction log write (max 64KB per IO) • Single TRAN for entire loop requires excessively large log file • Inserts are grouped into intermediate size batches

  10. Data Population Scripts 2 Primary table populated using single row inserts in a WHILE loop, Additional tables populated with INSERT / SELECT statement Single row inserts ~20-30K rows/sec INSERT / SELECT statement ~100K+ rows/sec DECLARE @L int SELECT @L = 1 WHILE @L <= 3 BEGIN INSERT M3A_11 (GroupID,ID,ID2,ID3,ID4,sID,bID1,bID2,bID3,rMoney,rDate,rReal, rDecimal) SELECT TOP 500000 GroupID, ID, 1500001-ID, ID3, ID4, sID, bID1, bID2, bID3, rMoney, rDate, rReal, rDecimal FROM M3A_20 WHERE GroupID = 1 AND ID BETWEEN (@L-1)*500000+1 AND @L*500000 SELECT @L = @L + 1 CHECKPOINT PRINT '11 Step ' + CONVERT(varchar,@L) + ', ' + CONVERT(varchar,GETDATE(),121) END UPDATE STATISTICS M3A_01 (PK_M3A_01) WITH FULLSCAN CREATE STATISTICS ST_01 ON M3A_01 (ID) WITH FULLSCAN, NORECOMPUTE

  11. Index Seek Plans Many rows returned, Non-parallel plan Cost: 9.34 Parallel Execution disabled Cost: 9.82 Cost: 4.94 Parallel Plan

  12. Index Seek Details Non-parallel plan Parallel plan

  13. Index Seek – Non-parallel Cost assigned to SELECT Index Seek, 1M rows in 11,115 pages (81 bytes/row, 90% Fill) I/O cost is: 8.2365 CPU Cost is 1.1000785 Cost & sub-tree Cost is correct, I/O & CPU is ½ of correct value

  14. Index Seek – Parallel Plan No cost assigned to SELECT Index Seek I/O and CPU cost ½ of non-parallel plan

  15. Index Seek with Aggregate 4 3 2 1

  16. Index Seek Aggregate Parallel Plan Details 4 2 3 1

  17. Table Scan Cost: 9.01 Cost: 8.26

  18. Table Scan Details Non-parallel plan Parallel plan I/O cost same CPU cost ½ of non parallel plan

  19. Table Scan Details Non-parallel plan Parallel plan No cost No cost on Select I/O cost same CPU cost ½ of non parallel plan

  20. Parallel Plan Cost Formulas Patterns • CPU costs are ½ of non-parallel plan • Index Seek I/O cost are also ½ • Scan I/O cost is same as non-parallel plan • Parallel plan costs are based on 2 processors • Actual number of processors determined at runtime • Overhead operations • Distribute, Repartition & Gather Streams

  21. Hash Join 200,000 rows 15 byte OS row size Cost: 6.50 Cost: 4.79

  22. Hash Join Details Non-parallel plan Parallel plan

  23. Hash Join Details Non-parallel plan Parallel plan

  24. Hash Join – Non-parallel plan

  25. Hash Join – Parallel Plan 4 3 2 1 2 4 3 1

  26. Hash Join with I/O Cost 900,000 rows MAXDOP 1 Cost 85.1 Cost 74.1

  27. Hash Join – Join I/O Cost 730,000 rows 740,000 rows

  28. Hash Join - Bitmap

  29. Hash Join Cost Formula Index Seek – Plan Formula I/O Cost = 0.006328500 + 0.000740741 per additional page (≤1GB) = 0.003203425 + 0.000740741 per additional page (>1GB) CPU Cost = 0.000079600 + 0.000001100 per additional row Hash Join CPU Cost = 0.017750000 base + 0.0000001749 (2-30 rows) + 0.0000000720 (100 rows) 0.000015091 per row 0.000015857 (parallel) + 0.000001880 per row per 4 bytes in OS + 0.000005320 per additional row in IS I/O Cost = 0.000042100 per row over 64MB (Row Size+8) 0.0000036609 per 4 byte over 15B

  30. Parallel Cost Formula Base Cost 0.028500 Repartition Stream Cost per row = 0.0000024705 Base (15 Bytes) + 0.000000759 per additional 4 Bytes Gather Stream Cost per row = 0.0000018735 Base(15) + 0.000000759 per additional 4 Bytes Dispatch

  31. Loop Join

  32. Loop Join Details Non-parallel plan Outer Source Parallel plan Outer Source

  33. Loop Join Details Inner Source cost identical for both non-parallel and parallel plans

  34. Loop Join Details Non-parallel plan Parallel plan

  35. Merge Join

  36. Merge Join Details Non-parallel plan Parallel plan

  37. Merge Join Details Non-parallel plan Parallel plan

  38. Merge Join Details Non-parallel plan Parallel plan

  39. Index Seek + Aggregate Test Xeon 2.4GHz/512K Opteron2.2GHz 1M

  40. Index Seek + Aggregate Test, Itanium 2 Itanium 2 1.5GHz/6M

  41. Index Seek + Aggregate Test, SUM(INT) Itanium 2 1.5GHz/6M

  42. Index Seek + Aggregate Test, NULL Itanium 2 1.5GHz/6M

  43. Loop Join, COUNT(*) Itanium 2 1.5GHz/6M

  44. Hash Join, COUNT(*) Itanium 2 1.5GHz/6M

  45. Merge Join, COUNT(*) Itanium 2 1.5GHz/6M

  46. General Recommendations • Useful in DW, ETL, and maintenance activities • Use judgment on transactions processing • Is throughput more important • Or faster expensive queries • Increase Cost Threshold from 5 to 20-50 • Limit MAXDOP to 4 • Verify or limit parallelism on Xeon systems with Hyper-Threading enabled

  47. Additional Information www.sql-server-performance.com/joe_chang.asp SQL Server Quantitative Performance Analysis Server System Architecture Processor Performance Direct Connect Gigabit Networking Parallel Execution Plans Large Data Operations Transferring Statistics SQL Server Backup Performance with Imceda LiteSpeed jchang6@yahoo.com

More Related