1 / 19

Columnstore Index in SQL Server 2012

Columnstore Index in SQL Server 2012. Kevin S. Goff. Kevin S. Goff: 30 seconds of Shameless Promotion. Developer/architect since 1987 / Microsoft SQL Server MVP Columnist for CoDe Magazine since 2004 Wrote a book, collaborated on a 2 nd book

petula
Download Presentation

Columnstore Index in SQL Server 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Columnstore Index in SQL Server 2012 Kevin S. Goff

  2. Kevin S. Goff: 30 seconds of Shameless Promotion • Developer/architect since 1987 / Microsoft SQL Server MVP • Columnist for CoDe Magazine since 2004 • Wrote a book, collaborated on a 2nd book • Writing a 3rd book on Business Intelligence with SQL Server • Frequent speaker for SQL Server/SharePoint community events • kgoff@kevinsgoff.net • www.KevinSGoff.Net • Launching a new SQL/BI weekly webcast/radio show in January 2013 Columnstore index in SQL 2012

  3. My daughter Katy (3 years old) Columnstore index in SQL 2012

  4. New columnstore index – Main Topics • Overview–what is the columnstore index, who benefits? • Characteristics of the columnstore index • Applications that will benefit • How it differs from traditional row-based indexes • Demos • New Batch Processing Mode, optimized for performance • General usage and syntax • Restrictions and issues with the columnstore index • Performance charts • What’s coming in SQL v.next? Columnstore index in SQL 2012

  5. 1. Overview - new columnstoreindex • New relational, xVelocity memory-optimized database index in SQL Server 2012, “baked in” to the database engine • More and more functionality in DB engine (xVelocity, CDC) • Potentially Significant performance enhancements for data warehousing and data mart scenarios – a real game changer • (not really for OLTP databases, we’ll see why later) • Best for queries that scan/aggregate large sets of data • My opinion? One of the coolest things ever in SQL Server • In a regular index, all indexed data from each row kept together on single page - data in each column spread across all pages of index • In a columnstore index, data from each column is kept together so each data page contains data only from a single column (compressed, more fits in memory, more efficient IO) Columnstore index in SQL 2012

  6. 2. Characteristics of the new columnstore index • Highly compressed - Exploits similarity of data within column • IO Statistics - dramatically reduces # of logical reads!!! • Not stored in standard buffer pools, but rather in a new optimized cache • Smart IO and caching using read-ahead reads • Part of Microsoft’s xVelocity technology – compression is factor of 8 (and twice as efficient as page compression) • Once posted, only READONLY • Best for data warehouse/mart queries that scan/aggregate large amounts of data–might lower need for OLAP aggregation • Some queries might run at least 10x faster (or more) Columnstore index in SQL 2012

  7. 3. Areas that might benefit from columnstore index • Queries and reports against Data Warehouses/Data Marts (works best with Fact/Dimension tables modeled in a star schema) • Load from Data Warehouses/Marts into OLAP Cubes • SSAS OLAP Databases that use the ROLAP methodology or pass-through mode • New Analysis Services Tabular Model uses xVelocity engine • Some companies took the release candidate and put into production, simply for this feature (some case studies show queries that went from 17 minutes to 3 seconds!) Columnstore index in SQL 2012

  8. 4. Columnstorevs row store index • Columnstore index stores each column in separate set of pages (vs. storing multiple data rows per page) • Only columns needed are fetched • Easier to compress redundant column data • Uses xVelocity found in PowerPivot • Improved IO scan/buffer hit rates • Segment elimination: each partition is broken into million row segments with metadata for min/max values – segment is not read if query scope does not include min/max values • Query will only fetch necessary columns • In reality, not “really” an index – more like a compressed “cube” Columnstore index in SQL 2012

  9. 5. Columnstore index - Samples Demo uses a table called BigPurchaseOrderHeader 10 million rows 3 indexes – Clustered index Columnstore index Covering index 3 queries to aggregate Orders by Vendor, using the 3 different index definitions Columnstore index in SQL 2012

  10. 5. Columnstore index - Samples • Columnstore index was 5% of the batch • Clustered index was 65% of the batch • Covering index (which would have been the best approach prior to SQL Server 2012) was 35% of the batch • Time Statistics, 12x faster than covering index, 20x faster than clustered index Columnstore index in SQL 2012

  11. 5. Columnstore index - Samples • Certain execution plan operators (Hash Match in particular) use new Batch execution mode • Reads rows in blocks of 1,000 • Big performance benefit over row-based execution (later slides will talk about this) Uses batch mode, blocks of 1000 rows Columnstore index in SQL 2012

  12. 5. Columnstore index - Samples • Issue w/OUTER JOIN: can’t use directly against table • Will “work”, but will use slower row execution mode • Must pre-aggregate separately and then do OUTER JOIN (will use batch mode) Columnstore index in SQL 2012

  13. 6. Batch Processing Mode • New vector-based query execution method • Some of the more expensive operators (Hash Match for joins and aggregations) utilize a new execution mode called Batch Mode (if hash table can fit in memory) • Data moves as a batch through query plan operators • Batch mode leverages advanced hardware architectures, processor cache, RAM, parallelism • Packets of about 1,000 rows are passed between operators, with column data represented in a vector form. • Reduces CPU usage over row processing by factor of 10 (from TechNet, sometimes up to factor of 40!) • More optimized operators-bitmap filter, filter, compute scalar • Remember….a query against a columnstore index “might” use batch mode, but a query against a regular index will not use batch mode Columnstore index in SQL 2012

  14. 7. Columnstore index: Usage • Syntax is simple: use new COLUMNSTORE keyword • 1 Columnstore index per table: cannot be clustered • Order of columns does not matter • Include all columns from table • No INCLUDE statement, No ASC/DESC • General MS recommendation: if queries will frequently use a certainly column on the predicate, create a clustered index on that column and then create the columnstore index. • Even though column store index isn’t “ordered” itself, you’ll get better segment elimination CREATE NONCLUSTERED COLUMNSTOREINDEX [IX_BPO_ColumnStore] ON [BigPurchaseOrderHeader] (VendorID, OrderDate, TotalDue) Columnstore index in SQL 2012

  15. 8. Columnstore index: Restrictions • Cannot be clustered, cannot be created against a view • Cannot act as a PK or FK, cannot include sparse columns • Can’t work on tables with Change Data Capture/Change Tracking or FileStream, can’t participate in replication, nor when page/row compression exists • Cannot be used with certain data types, such as binary, text/image, rowversion/timestamp, CLR data types (hierarchyID/spatial), nor with data types created with MAX keyword…e.g. varchar(max • Cannot be modified with ALTER – must be dropped and recreated • It’s a read-only index - cannot insert rows and expect columnstore index to be maintained. Columnstore index in SQL 2012

  16. 8. Columnstore index: Restrictions • Note: range partitioning is supported….(use partitioning to load a table, index it with a columnstore index, and switch it in as newest partition. ) • Partition by day, split the last partition • Load data into staging table and then create columnstore index • Switch it in • SQL Server 2012 permits 15,000 partitions per table) • Not optimized for certain statements (OUTER JOIN, UNION, NOT IN <subquery>) • Not optimized for certain scenarios (high selectivity, queries lacking any joining, aggregating) • Not optimized for a JOIN statement on a composite set of columns • Best practice – always use integer keys for FKs Columnstore index in SQL 2012

  17. 9. Columnstore index: Performance Columnstore index in SQL 2012

  18. 10. Next version of SQL Server • Columnstore indexes will be updatable!!! • No longer a readonly index • Will be able to create a columnstore index as a clustered index • Not related directly to columnstore index, but next major version of SQL Server will have a new in-memory row store for OLTP databases (codename “Hekaton”) Columnstore index in SQL 2012

  19. Recommended reading • I’ve written some articles in CoDe Magazine on SQL 2012 • 2 part series on Columnstore index, T-SQL Features, and SSIS Features • http://code-magazine.com/articleprint.aspx?quickid=1203051&printmode=true • http://code-magazine.com/articleprint.aspx?quickid=1206021&printmode=true • Look for more articles on SQL 2012 later in the year • I ran a 13 part webinar series on new features in SQL 2012 • Link to the recordings • Some outstanding new books on SQL 2012: Columnstore index in SQL 2012 Return to TOC

More Related