190 likes | 420 Views
Columnstore Index in SQL Server 2012. Kevin S. Goff. Kevin S. Goff: 30 seconds of Shameless Promotion. Developer/architect since 1987 / Microsoft SQL Server MVP Columnist for CoDe Magazine since 2004 Wrote a book, collaborated on a 2 nd book
E N D
Columnstore Index in SQL Server 2012 Kevin S. Goff
Kevin S. Goff: 30 seconds of Shameless Promotion • Developer/architect since 1987 / Microsoft SQL Server MVP • Columnist for CoDe Magazine since 2004 • Wrote a book, collaborated on a 2nd book • Writing a 3rd book on Business Intelligence with SQL Server • Frequent speaker for SQL Server/SharePoint community events • kgoff@kevinsgoff.net • www.KevinSGoff.Net • Launching a new SQL/BI weekly webcast/radio show in January 2013 Columnstore index in SQL 2012
My daughter Katy (3 years old) Columnstore index in SQL 2012
New columnstore index – Main Topics • Overview–what is the columnstore index, who benefits? • Characteristics of the columnstore index • Applications that will benefit • How it differs from traditional row-based indexes • Demos • New Batch Processing Mode, optimized for performance • General usage and syntax • Restrictions and issues with the columnstore index • Performance charts • What’s coming in SQL v.next? Columnstore index in SQL 2012
1. Overview - new columnstoreindex • New relational, xVelocity memory-optimized database index in SQL Server 2012, “baked in” to the database engine • More and more functionality in DB engine (xVelocity, CDC) • Potentially Significant performance enhancements for data warehousing and data mart scenarios – a real game changer • (not really for OLTP databases, we’ll see why later) • Best for queries that scan/aggregate large sets of data • My opinion? One of the coolest things ever in SQL Server • In a regular index, all indexed data from each row kept together on single page - data in each column spread across all pages of index • In a columnstore index, data from each column is kept together so each data page contains data only from a single column (compressed, more fits in memory, more efficient IO) Columnstore index in SQL 2012
2. Characteristics of the new columnstore index • Highly compressed - Exploits similarity of data within column • IO Statistics - dramatically reduces # of logical reads!!! • Not stored in standard buffer pools, but rather in a new optimized cache • Smart IO and caching using read-ahead reads • Part of Microsoft’s xVelocity technology – compression is factor of 8 (and twice as efficient as page compression) • Once posted, only READONLY • Best for data warehouse/mart queries that scan/aggregate large amounts of data–might lower need for OLAP aggregation • Some queries might run at least 10x faster (or more) Columnstore index in SQL 2012
3. Areas that might benefit from columnstore index • Queries and reports against Data Warehouses/Data Marts (works best with Fact/Dimension tables modeled in a star schema) • Load from Data Warehouses/Marts into OLAP Cubes • SSAS OLAP Databases that use the ROLAP methodology or pass-through mode • New Analysis Services Tabular Model uses xVelocity engine • Some companies took the release candidate and put into production, simply for this feature (some case studies show queries that went from 17 minutes to 3 seconds!) Columnstore index in SQL 2012
4. Columnstorevs row store index • Columnstore index stores each column in separate set of pages (vs. storing multiple data rows per page) • Only columns needed are fetched • Easier to compress redundant column data • Uses xVelocity found in PowerPivot • Improved IO scan/buffer hit rates • Segment elimination: each partition is broken into million row segments with metadata for min/max values – segment is not read if query scope does not include min/max values • Query will only fetch necessary columns • In reality, not “really” an index – more like a compressed “cube” Columnstore index in SQL 2012
5. Columnstore index - Samples Demo uses a table called BigPurchaseOrderHeader 10 million rows 3 indexes – Clustered index Columnstore index Covering index 3 queries to aggregate Orders by Vendor, using the 3 different index definitions Columnstore index in SQL 2012
5. Columnstore index - Samples • Columnstore index was 5% of the batch • Clustered index was 65% of the batch • Covering index (which would have been the best approach prior to SQL Server 2012) was 35% of the batch • Time Statistics, 12x faster than covering index, 20x faster than clustered index Columnstore index in SQL 2012
5. Columnstore index - Samples • Certain execution plan operators (Hash Match in particular) use new Batch execution mode • Reads rows in blocks of 1,000 • Big performance benefit over row-based execution (later slides will talk about this) Uses batch mode, blocks of 1000 rows Columnstore index in SQL 2012
5. Columnstore index - Samples • Issue w/OUTER JOIN: can’t use directly against table • Will “work”, but will use slower row execution mode • Must pre-aggregate separately and then do OUTER JOIN (will use batch mode) Columnstore index in SQL 2012
6. Batch Processing Mode • New vector-based query execution method • Some of the more expensive operators (Hash Match for joins and aggregations) utilize a new execution mode called Batch Mode (if hash table can fit in memory) • Data moves as a batch through query plan operators • Batch mode leverages advanced hardware architectures, processor cache, RAM, parallelism • Packets of about 1,000 rows are passed between operators, with column data represented in a vector form. • Reduces CPU usage over row processing by factor of 10 (from TechNet, sometimes up to factor of 40!) • More optimized operators-bitmap filter, filter, compute scalar • Remember….a query against a columnstore index “might” use batch mode, but a query against a regular index will not use batch mode Columnstore index in SQL 2012
7. Columnstore index: Usage • Syntax is simple: use new COLUMNSTORE keyword • 1 Columnstore index per table: cannot be clustered • Order of columns does not matter • Include all columns from table • No INCLUDE statement, No ASC/DESC • General MS recommendation: if queries will frequently use a certainly column on the predicate, create a clustered index on that column and then create the columnstore index. • Even though column store index isn’t “ordered” itself, you’ll get better segment elimination CREATE NONCLUSTERED COLUMNSTOREINDEX [IX_BPO_ColumnStore] ON [BigPurchaseOrderHeader] (VendorID, OrderDate, TotalDue) Columnstore index in SQL 2012
8. Columnstore index: Restrictions • Cannot be clustered, cannot be created against a view • Cannot act as a PK or FK, cannot include sparse columns • Can’t work on tables with Change Data Capture/Change Tracking or FileStream, can’t participate in replication, nor when page/row compression exists • Cannot be used with certain data types, such as binary, text/image, rowversion/timestamp, CLR data types (hierarchyID/spatial), nor with data types created with MAX keyword…e.g. varchar(max • Cannot be modified with ALTER – must be dropped and recreated • It’s a read-only index - cannot insert rows and expect columnstore index to be maintained. Columnstore index in SQL 2012
8. Columnstore index: Restrictions • Note: range partitioning is supported….(use partitioning to load a table, index it with a columnstore index, and switch it in as newest partition. ) • Partition by day, split the last partition • Load data into staging table and then create columnstore index • Switch it in • SQL Server 2012 permits 15,000 partitions per table) • Not optimized for certain statements (OUTER JOIN, UNION, NOT IN <subquery>) • Not optimized for certain scenarios (high selectivity, queries lacking any joining, aggregating) • Not optimized for a JOIN statement on a composite set of columns • Best practice – always use integer keys for FKs Columnstore index in SQL 2012
9. Columnstore index: Performance Columnstore index in SQL 2012
10. Next version of SQL Server • Columnstore indexes will be updatable!!! • No longer a readonly index • Will be able to create a columnstore index as a clustered index • Not related directly to columnstore index, but next major version of SQL Server will have a new in-memory row store for OLTP databases (codename “Hekaton”) Columnstore index in SQL 2012
Recommended reading • I’ve written some articles in CoDe Magazine on SQL 2012 • 2 part series on Columnstore index, T-SQL Features, and SSIS Features • http://code-magazine.com/articleprint.aspx?quickid=1203051&printmode=true • http://code-magazine.com/articleprint.aspx?quickid=1206021&printmode=true • Look for more articles on SQL 2012 later in the year • I ran a 13 part webinar series on new features in SQL 2012 • Link to the recordings • Some outstanding new books on SQL 2012: Columnstore index in SQL 2012 Return to TOC