570 likes | 772 Views
TPC-H. SQL Server Scaling on Big Iron (NUMA) Systems. Joe Chang jchang6@yahoo.com www.qdpma.com. About Joe Chang. SQL Server Execution Plan Cost Model True cost structure by system architecture Decoding statblob (distribution statistics) SQL Clone – statistics-only database Tools
E N D
TPC-H SQL Server Scaling on Big Iron (NUMA) Systems Joe Chang jchang6@yahoo.com www.qdpma.com
About Joe Chang • SQL Server Execution Plan Cost Model • True cost structure by system architecture • Decoding statblob (distribution statistics) • SQL Clone – statistics-only database • Tools • ExecStats – cross-reference index use by SQL-execution plan • Performance Monitoring, • Profiler/Trace aggregation
TPC-H • DSS – 22 queries, geometric mean • 60X range plan cost, comparable actual range • Power – single stream • Tests ability to scale parallel execution plans • Throughput – multiple streams • Scale Factor 1 – Line item data is 1GB • 875MB with DATE instead of DATETIME • Only single column indexes allowed, Ad-hoc
Observed Scaling Behaviors • Good scaling, leveling off at high DOP • Perfect Scaling ??? • Super Scaling • Negative Scaling • especially at high DOP • Execution Plan change • Completely different behavior
TPC-H SF 100GB 2-way Xeon 5355, 5570, 5680, Opt 6176 Between 2-way Xeon 5570, all are close, HDD has best throughput, SATA SSD has best composite, and Fusion-IO has be power. Westmere and Magny-Cours, both 192GB memory, are very close
TPC-H SF 300GB 8x QC/6C & 4x12C Opt, 6C Istanbul improved over 4C Shanghai by 45% Power, 73% Through-put, 59% overall. 4x12C 2.3GHz improved17% over 8x6C 2.8GHz
TPC-H SF 1000 Oracle RAC, 64-nodes, 128 Xeon 5450 quad-core 3.0GHz processors Power 782,608, 5.6X higher than Superdome 2 with 64-cores
TPC-H SF 3TB X7460 & X7560 Nehalem-EX 64 cores better than 96 Core 2.
TPC-H SF 100GB, 300GB & 3TB SF100 2-way Westmere and Magny-Cours are very close Between 2-way Xeon 5570, all are close, HDD has best through-put, SATA SSD has best composite, and Fusion-IO has be power SF300 8x QC/6C & 4x12C 6C Istanbul improved over 4C Shanghai by 45% Power, 73% Through-put, 59% overall. 4x12C 2.3GHz improved17% over 8x6C 2.8GHz SF 3TB X7460 & X7560 Nehalem-EX 64 cores better than 96 Core 2.
TPC-H Published Results • SQL Server excels in Power • Limited by Geometric mean, anomalies • Trails in Throughput • Other DBMS get better throughput than power • SQL Server throughput below Power • by wide margin • Speculation – SQL Server does not throttle back parallelism with load?
Processors GHz Total Cores Mem GB SQL SF Power Through put QphH 2 Xeon 5355 2.66 8 64 5sp2 100 23,378.0 13,381.0 17,686.7 5570 Fusion 2x5570 SSD 2x5570 HDD 2.93 2.93 2.93 8 8 8 144 144 144 8sp1 8sp1 8sp1 100 100 100 72,110.5 70,048.5 67,712.9 38,019.1 36,190.8 37,749.1 51,085.6 51,422.4 50,738.4 2 Xeon 5680 3.33 12 192 8r2 100 99,426.3 55,038.2 73,974.6 2 Opt 6176 2.3 24 192 8r2 100 94,761.5 53,855.6 71,438.3 TPC-H SF100
Processors GHz Total Cores Mem GB SQL SF Power Through put QphH 4 Opt 8220 2.8 8 128 5rtm 300 25,206.4 13,283.8 18,298.5 8 Opt 8360 2.5 32 256 8rtm 300 67,287.4 41,526.4 52,860.2 8 Opt 8384 2.7 32 256 8rtm 300 75,161.2 44,271.9 57,684.7 8 Opt 8439 2.8 48 256 8sp1 300 109,067.1 76,869.0 91,558.2 4 Xeon 7560 4 Opt 6176 2.26 2.3 48 32 512 640 8r2 8r2 300 300 129,198.3 152,453.1 96,585.4 89,547.7 107,561.2 121,345.6 TPC-H SF300 All of the above are HP results?, Sun result Opt 8384, sp1, Pwr 67,095.6, Thr 45,343.5, QphH 55,157.5
Processors GHz Total Cores Mem GB SQL SF Power Through put QphH Xeon 5450 8 Opt 8439 Itanium 9350 Itanium 9140 8 Opt 8439 3.0 1.73 2.8 1.6 2.8 48 512 48 64 64 384 384 512 2048 512 ASE O11R2 8R2? O RAC O11g 1000 1000 1000 1000 1000 111,557.0 108,436.8 95,789.1 139,181.0 782,608.7 96,652.7 128,259.1 1,740,122 69,367.6 141,188.1 140,181.1 1,166,977 81,367.6 102,375.3 123,323.1 TPC-H 1TB
Processors GHz Total Cores Mem GB SQL SF Power Through put QphH POWER6 16 Xeon 7460 8 Xeon 7560 SPARC 5.0 2.26 2.66 2.88 64 128 96 64 512 512 1024 512 8r2 8r2 Sybase O11R2 3000 3000 3000 3000 120,254.8 182,350.7 185,297.7 142,790.7 171,607.4 142,685.6 216,967.7 87,841.4 162,601.7 198,907.5 102,254.8 156,537.3 TPC-H 3TB
TPC-H Published Results Processors GHz Total Cores Mem GB SQL SF Power Through put QphH 2 Xeon 5355 2.66 8 64 5sp2 100 23,378 13,381 17,686.7 2 Xeon 5570 2.93 8 144 8sp1 100 72,110.5 36,190.8 51,085.6 2 Xeon 5680 3.33 12 192 8r2 100 99,426.3 55,038.2 73,974.6 2 Opt 6176 2.3 24 192 8r2 100 94,761.5 53,855.6 71,438.3 4 Opt 8220 2.8 8 128 5rtm 300 25,206.4 13,283.8 18,298.5 8 Opt 8360 2.5 32 256 8rtm 300 67,287.4 41,526.4 52,860.2 8 Opt 8384 2.7 32 256 8rtm 300 75,161.2 44,271.9 57,684.7 8 Opt 8439 2.8 48 256 8sp1 300 109,067.1 76,869.0 91,558.2 4 Opt 6176 2.3 48 512 8r2 300 129,198.3 89,547.7 107,561.2 8 Xeon 7560 2.26 64 512 8r2 3000 185,297.7 142,685.6 162,601.7
SF100 Big Queries (sec) Query time in sec Xeon 5570 with SATA SSD poor on Q9, reason unknown Both Xeon 5680 and Opteron 6176 big improvement over Xeon 5570
SF100 Middle Q Query time in sec Xeon 5570-HDD and 5680-SSD poor on Q12, reason unknown Opteron 6176 poor on Q11
SF100 Small Queries Query time in sec Xeon 5680 and Opteron poor on Q20 Note limited scaling on Q2, & 17
SF300 Big Queries Query time in sec Opteron 6176 poor relative to 8439 on Q9 & 13, same number of total cores
SF300 Middle Q Query time in sec Opteron 6176 much better than 8439 on Q11 & 19 Worse on Q12
SF300 Small Q Query time in sec Opteron 6176 much better on Q2, even with 8439 on others
SF1000 Sybase vs. SQL Server Query time, Sybase relative SQL Server, both on DL785 48-core
SF1000 Itanium - Superdome Query time, Superdome 2 versus Superdome, 16-way quad-core and 32-way dual-core
512-core C2 RAC vs. 64-core It2 Query time, Superdome 2 versus RAC, 16-way quad-core (64 cores) and 64-node 2-way quad-core (512 cores) Oracle RAC 5.6X higher Power
SF 3TB – 8×7560 versus 16×7460 5.6X Broadly 50% faster overall, 5X+ on one, slower on 2, comparable on 3
64 cores, PWR6 vs. Xeon 7560 Query time, POWER6 relative to X7560 Overall, Xeon 7560 is 30% faster on power, but wide variations on individual queries, some with Pwr6 faster
TPC-H Summary • Scaling is impressive on some SQL • Limited ability (value) is scaling small Q • Anomalies, negative scaling
Query 2 Minimum Cost Supplier Wordy, but only touches the small tables, second lowest plan cost (Q15)
Q11 Important Stock Identification Parallel Non-Parallel
Q13 Why does Q13 have perfect scaling?