390 likes | 421 Views
Winning with Column Store in SQL Server 2016 – A Customer Story. Rick Heiges, Data Platform MVP (2007-2017) Principal Solutions Architect, DB Best Technologies, LLC. Thank You to. Our Venue SQLPASS Chapter Volunteers for SQL Saturday Our Speakers. THANK YOU!!!.
E N D
Winning with Column Store in SQL Server 2016 – A Customer Story Rick Heiges, Data Platform MVP (2007-2017) Principal Solutions Architect, DB Best Technologies, LLC
Thank You to Our Venue SQLPASS Chapter Volunteers for SQL Saturday Our Speakers
THANK YOU!!! • For Giving Up Your Saturday • For Not Giving Up on that One Problem • For Wanting to Continue to Learn • For Not Wanting to Stagnate in Your Career • For Going that Extra Mile to SQL Saturday • For NOT Going all those extra miles to • Denmark • Minnesota • Kansas City
BIO Rick Heiges • Data Platform MVP • 2007-2017 • Principal Solutions Architect – DB Best Technologies, LLC • Rick@DBBest.com • PASS Involvement • Member since 2001 • Board of Directors 2003-2011 • Initiated Summit Ambassador Program • Initiated 24 Hours of PASS • Frequent Speaker at Summit and SQL Saturday Events • BLOG: http://sqlblog.com/blogs/rick_heiges @heigesr2 linkedin.com/in/rickheiges
www.dbbest.com Follow us on Twitter @dbbest_tech Like us on Facebook facebook.com/DBBestTech
SQL Server Migration Assistant (SSMA) • Tool Developed by DB Best • Microsoft acquired SSMA in 2005 • DB Best Maintains / Updates SSMA • SSMA for Oracle Extension Pack • Support Data Migration • Oracle Providers to enable Server to Server connectivity • Provides user-defined functions that Emulate Oracle system functions which helps to decrease manual effort for conversion which may be substantial
Business Scenario • Manufacturing Plants Around the World • Hundreds of Measurements taken at various points on Assembly line for each unit • Data Collected and Stored in a Centralized Data Warehouse on a constant basis • Plant and Regional Managers Generate Reports on DW on hourly, daily, weekly, bi-weekly, monthly, quarterly, and adhoc basis • Often references data just collected
Business Scenario Graphic Plant Plant DW Plant ETL Plant Reports Plant Plant Manager
Initial Situation • Customer was running into Performance Issues in timely reporting and analysis on the DW • Customer started testing Oracle’s In-Memory Feature (Partition pinned in Memory) • Early testing was not impressive. Enabled, but not being used at this time • Customer decided to look at SQL Server 2016 for solution and started to look at Migrating DW
Customer First Steps • Contacted Microsoft – Got CAT invovled • Created a small VM (4 vCPUs and 8GB RAM) for initial setup / testing • Customer Initially used SSMA to convert Schema • Created Data Loads for SSIS to get some functional testing done • Identified Queries that exhibited Poor Performance • Used “normal” indexes initially • 4 week Pilot to prove power of Column Store in SQL 2016
System Environment Notes Initial Schema Conversion and Reality Check • Initial VM was 2 HT Cores (4 vCPU) with 8GB RAM • Windows Server 2012 R2 • SQL Server 2016 CTP 2.3 • Upgraded VM to 4HT Cores (8 vCPU) with 32GB RAM for Discovery / Initial Testing • Upgraded VM to 16 vCPU with 256GB RAM for Final Testing • Dev / Test Environment is on 2 nodes of a 4 node Oracle RAC with 256GB RAM per node and 79 cores allocated
Migration Challenges • Customer used various Partitioning strategies • Referential • Range-List • Does not always map well to Range Partitioning in SQL Server • Data Type Changes – Parameter Data Types • SSMA Converted Code has goal of functionality (not performance)
Dealing with Partitioning Differences • Keep ahead of the end of the window – Split Empty partitions in advance • Schedule Jobs to monitor to automate • Partitioning Key must be part of any Unique Index key • RANGE Partitioning only • Usually, Partitioning on Dates is best practice • With Column Store indexes in mind, Don’t partition to small
Best Practices forMigrating to SQL Server • Audit Report • Indexes • Features
Data Loading Audit Report • Row Count – simple to implement, but does not interrogate validity of data from Source to Target other than Row Count • Consider running a similar query against Source and Target systems on Business Critical tables that may have dollar amounts associated with them. • Sum of a column would be bare minimum. Consider also Min, Max, and Average for higher confidence
CI: Clustered Index (Rowstore) • Not Required, but…. • Is a default and best practice • Is generally more performant than a Heap • Is generally a unique, primary key value • Is Required for Partitioning • Think of it as a Page Number in a book • …. Unless There is a Reason Not to Have One • Fill Factor • Page Splits
NCI: Non-Clustered Index (Rowstore) • Often used to aid retrieval of data without knowing the PK • Often used for FK and common columns used in searches • Use Included Columns for Covering Queries with an NCI • Similar to an Index in the back of a book. Look in the index to find the page to turn to.
Columnstore Refresher how is it different? Data stored as columns Data stored as rows … Benefits: • Improved compression: Data from same domain compress better • Reduced I/O: Fetch only columns needed • Improved Performance: More data fits in memory C1 C4 C5 C3 C2
Key Terms / Concepts • Columnstore / Rowstore – Physical Organization • Row Group – Rows Compressed into Columnstore format usually in sets of the Max (1,048,576) • Column Segment – Column of Data within Row Group • Deltastore – Clustered Index that stores data in rows until a threshold(Min 102,400) is met to move to a columnstore • Tuple Mover – Background Process converts rows in the deltastore to the columnstore • Batch Execution – Query Processing Method that operates on multiple rows together (aka vector-based execution)
Deltastore Basics Row Group • Bulk Load Data goes directly into Columnstore based on large volumes of data • Single and smaller loads of data first go to Deltastore • Tuple Mover runs in the background and moves rows from rowstore to columnstore
NCCI: Non-Clustered Columnstore Index • First Introduced in SQL 2012 • Made Table Read-Only in 2012/2014 • Only 1 NCCI allowed in 2012/2014 • Use Partition “Switch” to add data • Now updateable in SQL 2016 • Multiple NCCI allowed • Complements “Row Store”
CCI: Clustered Columstore Index • Introduced in SQL 2014 • Converts Rowstore into Columnstore • No other indexes allowed in 2014 • Incredible Compression • Often up to 90% • Domain of Data is similar repeatable data • Can have NCI on CCI in 2016 • Allows for better SEEK performance • Certain Restrictions apply • Unique, FK, Data Types
Index Compatibility Index Matrix Summary Table is physically organized as a Heap, Cluster Index (Row) or CCI (Column) NCI and NCCI are secondary indexes that can dramatically affect performance
Initial Changes to SQL 2016 Database / Server • Trace Flags • 1117/1118/2371 no longer needed • TempDB • Number of Files / Growth Rate / Location • Customer Database • Files / Filegroups / Growth • SQL Memory Settings • Examine Waitstats • Quiet Baseline
Changes on SQL 2016 Side • Parameter Types changed in .NET/ App to match SQL data type changes • Clustered Indexes and PKs reviewed • Selected Tables were chosen for CCI • Selected Tables were chosen to have NCCI added • Some Additional NCI added to Selected Tables • Selected Tables were also Partitioned • Date is preferred • Arbitrary Boundary Point similar to a time period
Traditional DW Design Dim Table A Dim Table C Dim Table B Fact Table Dim Table E Dim Table D • Almost all queries hit Fact Table • Approx 450 columns (400+ of Measurement Data • 100 GB Fact Table uncompressed • 100s of millions of rows
Fact Table • Initial Load of 2015 Data (on 4vCPU/8GB VM) • Created Partitions – Date Based (2 weeks) • 100+ GB in size • Converted to CCI • Rebuild CCI – Reorganize – Compressed to 8GB • Early Investigative Querying from DBA Team indicated that we were optimistic about Testing • Loaded Additional Years of Data (2014, 2013) • 2014 and 2013 data volumes were smaller due to not as many measurements being captured
Related Tables • Smaller Dim Tables • Clustered Index • Additional Non-Clustered Indexes • Large Dim Tables • Clustered Index / Partitioned • Several with Non-Clustered Index • Several with Non-Clustered Column Store Index • Based on query usage
Testing • Used Consistent Dates - Returned same number of rows across both systems • Queries chosen were based on problematic queries currently used in Production • Queries were very complex with criteria focusing on specific date ranges, plants, and measurement types (some with multiple CASE clauses) • Simple Queries were not officially tested, but were anecdotally tested by the DBA team without issues
The Contestants • Oracle 12c • 2 Nodes of a 4 Node RAC • 40cores of 79 allocated • 256GB per node (512GB) • Dedicated Attached Storage • No ETL • No clearing of cache • No Perf Data • Key Tables are the same volume except for Dim Table A: 3 Billion • SQL Server 2016 CTP 2.3 • VM: 16vCPU/256GB • 240GB allocated to SQL • Storage on shared SAN • ETL Job running loading data • Clear Caches before every set of queries • Basic Perfmon Data Collection • Key Tables are the same volume except for Dim Table A: 600 Million (Due to allocated space)
SQL 2016 Test Results
Perfmon Initial Notes – We could be even faster! • CPU at Approx. 20% • Serious IO Latency on both Read and Write • F: .010 / .041 • G: .010 / .037 • H: .010 / .038 • L: Read Spike .068 @ start of testing • D: write .020 • No Memory Pressure
Future Considerations • Some Data Architecture Changes coming to the Fact Table • Further Index Tuning and Decisions • Further Code and Query Tuning
SQL 2016 Architecture Option: Use AlwaysOn Availability Groups Application Listener Primary Sync Secondary Data Data Async Secondary Data • AlwaysOn AG Primary Replica • Sync to Replica in same DC to provide HA • Async to Replica in DR DC for DR • May also use Replicas for Read Only Queries • May also have additional Replicas closer to plants to reduce network latency
Summary • SQL on a smaller VM beat Oracle on Big Iron • SQL Server 2016 with Columnstore Indexes performs better than the Oracle System • Efforts were made to simulate consistent ETL flow as in current real world conditions • Further Query Tuning and more Robust IO subsystem will solidify SQL Server Dominant Performance
Thank You! • Rick Heiges • Data Platform MVP (2007-2017) • Follow Me on Twitter: @heigesr2 • Rick@DBBest.com