420 likes | 753 Views
DBI409. SQL Server Columnstore Performance Tuning. Eric N Hanson Principal Program Manager Microsoft Corporation. Session Objectives and Takeaways. Session Objective(s ): Learn to make query performance go from good to outstanding Batch mode is crucial to speedups
E N D
DBI409 SQL Server Columnstore Performance Tuning Eric N Hanson Principal Program Manager Microsoft Corporation
Session Objectives and Takeaways • Session Objective(s): • Learn to make query performance go from good to outstanding • Batch mode is crucial to speedups • Good application design can work around limits
Why pay attention? Your customer with well-tuned xVelocitycolumnstoreapplication Your customer with poorly-tuned xVelocitycolumnstoreapplication
demo Outer Join Performance Limit
Fundamentals of xVelocityColumnstore Performance • Each column stored separately, compressed • Batch mode query execution • Segment elimination
Columnstore physical structure segment 1 • Segment (a.k.a. row group) = 1 million rows • Table partition has 1 or more segments • Each column segment is a BLOB • Dictionaries are BLOBs • Catalog has one row per column segment with min/max values catalog … dictionaries segment N
Batch mode processing • Process ~1000 rows at a time • Batch stored in vector form • Vector operators implemented • Filter, hash join, hash aggregation • Greatly reduced CPU time (7 to 40X) Batch object Column vectors bitmap of qualifying rows
#1 Takeaway! Make sure most of the work of the query happens in batch mode.
Segment Elimination • Segment(rowgroup)= 1 million row chunk • Min, Max kept for each column in a segment • Scans can skip segments based on this select Date, count(*) from dbo.Purchasewhere Date >= 20120201 group by Date skipped
DOs • Use star schema • Put columnstores on large tables only • Include every column of table in columnstore index • Use integer surrogate keys for joins • Structure your queries as star joins with grouping and aggregation as much as possible
DON’T • Join on string columns • Filter on string columns of columnstore table • Join pairs of very large tables if you don’t have to • Use NOT IN <subquery> on columnstore table • Use UNION ALL to combine columnstore tables with other tables
Outer Join Workaround • Outer join prevents batch processing • Rewrite query to do most work in batch mode DEMO
IN and EXISTs Issue & Workaround • Using IN and EXISTS with subqueries can prevent batch mode execution • IN ( <list of constants> ) typically works fine • DEMO
Union All Issue & Workaround • UNION ALL often prevents batch mode • Workaround • Push GROUP BY and aggregation over UNION ALL • Do final GROUP BY and aggregation of results • Called “local-global aggregation” • DEMO
Scalar Aggregates Issue & Workaround • Aggregate without group by doesn’t get batch processing • select count(*) from dbo.Purchase1.2 seconds • Workaround • with CountByDate (Date, c) 0.093 secondsas ( select Date, count(*) from dbo.Purchasegroup by Date ) select sum(c) from CountByDate;
Multiple DISTINCT aggregates issue & workaround • Table Spool if query has 2 or more DISTINCT aggregates • Spool takes time to write • Spool write is single threaded • Reads of spool in plan are single threaded, in row mode • Workaround: • form each DISTINCT agg in separate subquery • Join results on grouping keys • DEMO SQL Server 2012 runs queries with 1 DISTINCT agg and 1 or more non-distinct aggs in batch mode without any spool!
Adding Data Using Partition Switching • Columnstores must be partition-aligned • Partition switching fully supported • To add data daily • Partition by day • Every day • Split last partition • Create staging table and columnstore index it • Switch it in • Avoids costly drop/rebuild
Trickle Loading with Columnstore Indexes • Master table (columnstore) • Delta table (rowstore) • Query using UNION ALL local-global aggregation workaround • Add Delta to Master nightly • DEMO
Avoid Nonclustered B-trees • Covering B-trees no longer needed • Extra B-trees can cause optimizer to choose poor plan • Save space • Reduce ETL time
Eliminating Unsupported Data Types • Omit column, or • Modify column type to supported type • Reduce precision of numerics to 18 digits or less • Convert guid’s to ints • Reduce precision of datetimeoffset to 2 or less • Convert hierarchyid to int or string
Achieving Fast Columnstore Index Builds • One thread per segment • Low memory throttles parallelism • Consider • high min server memory setting • Set REQUEST_MAX_MEMORY_GRANT_PERCENT to 50 or more • Add memory • Omit columns • Normalize fact columns to dimensions • Vertically partition • sys.dm_exec_query_memory_grants shows DOP
Not enough memory to build a columnstore index? • Same ideas apply as for achieving fast builds • Run out of memory during execution, not startup? • Errors 701 or 802 indicate this • Workaround: reduce DOP explicitly, e.g. • create columnstore index <name> on <table>(<columns>) with(maxdop= 1);
Verifying Segment Elimination • You may want to check if segments are eliminated • Ways to get segment elimination confirmation: • Xevent sqlserver.column_store_segment_eliminate • Text output messages • Trace flag 646 outputs segment elimination messages • Trace flag 3605 directs messages to error log file • DEMO
Ensuring segment elimination by date • Use clustered B-tree on date • Columnstore inherits order • Or, partition by date • Ordering by load date, ship date, order date etc. can all work • Dates are naturally correlated
Partitioning and Segment Elimination -- Example Select … From Fact Where Date = 20120301
Multi-dimensional segment elimination • Pre-sort or partition on a keydrawn from >1 dimension • E.g. YYYYMMRR • Can be persisted computed columnnot used in columnstore select f.region_id, avg(f.duration)from fact_CDR f where f.region_id = 1 and f.date_id between 20120101 and 20120131 group by f.region_id
String performance issues • String filters don’t get pushed to storage engine • more batches to process • defeats segment elimination • Joining on string columns is slow • Factor strings out to dimensions
Forcing use or non-use of Columnstores • Query hint • OPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) • Index hint • … FROM F WITH(index=MyColumnStore) … • … FROM F WITH(index=MyClusteredBtree) …
Summary • Key’s to fast processing • Batch mode • Segment elimination • Make sure most work is done in batch mode! • Can work around read-only property by • Partitioning • Using Delta-file for trickle load • Future work will reduce need for tuning
Related Content • Columnstore Tuning Guide (your reading homework ) • http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-performance-tuning.aspx • Columnstore FAQ • http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-index-faq.aspx • Introductory Columnstores Talk Video • TechEd 2011 session DBI312 • Customer Case Studies • See list in FAQ
Track Resources Hands-On Labs @sqlserver @ms_teched SQL Server 2012 Eval Copy Get Certified! mva Microsoft Virtual Academy
Resources Learning TechNet • Connect. Share. Discuss. • Microsoft Certification & Training Resources http://northamerica.msteched.com www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet http://microsoft.com/msdn
Required Slide Complete an evaluation on CommNet and enter to win!
Please Complete an Evaluation Your feedback is important! Multipleways to Evaluate Sessions Be eligible to win great daily prizes and the grand prize of a $5,000 Travel Voucher! Scan the Tag to evaluate this session now on myTechEdMobile
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.