380 likes | 539 Views
Data Management and Index Options for SQL Server Data Warehouses. Atlanta MDF. Lance England. http://lance-england.com. Goal. To present options to consider for your data warehouse. The options can be used separately or combined. Agenda. Data Warehouse Overview Partitioned Views
E N D
Data Management and Index Options for SQL Server Data Warehouses Atlanta MDF
Lance England http://lance-england.com
Goal To present options to consider for your data warehouse. The options can be used separately or combined.
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Dimension Tables • Many columns, denormalized for speed and simplicity • Have both surrogate keys and business keys • Often capture changes over time (slowly changing dimension)
Fact Tables • A few columns that capture the measures of interest • Many key columns to dimension tables • Can grow very large To improve the scalability and manageability of large tables, we can partition the data. We can do this with partitioned views and/or partitioned tables.
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Partitioned Views (SQL 7) • Multiple tables, but presented as a single logical entity with a view • CHECK constraint on each table enables partition elimination in queries • Use UNION ALL in view definition to prevent engine from attempting de-duplication
Partitioned Views (SQL 7) Pros • Adding/dropping tables is easy • Able to update statistics for a single partition (table) • Can partition on multiple columns Cons • Changing schema/indexes is multiplied by number of tables
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Partitioned Tables (SQL 2005) • The TABLE is the single logical entity, with partitioning abstracted away by the storage engine • Implemented with partition function and partition scheme • Can participate in partition elimination
Partitioned Tables (SQL 2005) Pros • ALTER only one table • Uniform indexes help query analyzer • Can load/drop partitions with partition switching Cons • Can’t update statistics at partition level • Partition switching more complicated than adding/dropping tables • Single partitioning column only
Partition Switching • Quickly add/remove partitions with data • Create a separate schema-identical table (with a CHECK constraint if switching IN) • Minimally logged meta-data operation
SPLIT/MERGE Warning Avoid split or merge of populated partitions. Splitting or merging populated partitions can be extremely inefficient, as this may cause as much as 4 times more log generation, and also cause severelocking. http://sqlcat.com/sqlcat/b/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Compression (SQL 2008) • Trading CPU usage for less I/O (more rows per page on disk and cache) • Object-level, not a database setting • Two types: Row or Page compression • Most opinions are to opt for page compression first, and scale back to row compression (or none) if CPU usage is unacceptable
Row Compression • Removes unused space at data-type level • Variable-length storage for numbers and dates • Variable-length storage for CHAR, NCHAR • Optimizes storage of NULL and zero
Page Compression • Row compression • Prefix compression – replaces repeated sequences in each column with a reference to the CI structure • Dictionary compression – replaces repeated sequences on the page with a reference to the CI structure
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Indexed Views (SQL 2005) • Great for aggregating data • Longlist of restrictions (see link): • When aggregating • Clustering key can only reference columns in GROUP BY • View definition must contain COUNT_BIG(*) • Can participate in partition switching (with extra steps) http://msdn.microsoft.com/en-us/library/ms191432.aspx
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Filtered Indexes (SQL 2008) • Index with a WHERE clause • Filtering column does not have to be included in index • Best for • Sparse columns (many NULLs) • Heterogeneous columns with categories of data (think FK columns) • Columns with ranges of values such as dollar amounts, time, and dates.
Agenda Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Columnstore Indexes (SQL 2012) Pros • Stores data by column; more efficient page retrieval • Optimized compression • Segment elimination (~1 million rows/per) • Batch-mode evaluation • Query operators pass batch of 1000 instead of row-by-row (not all operators supported) • Affected by #of cores and MAXDOP setting
ColumnstoreIndexes Cons • Table can’t be updated once created • Workaround choices: • Disable index/load/rebuild • Partition switching • Partitioned view • Not good for single-record SEEK queries
Clustered Columnstore (SQL 2014) • Less storage: Not a copy of data • Updatable • Improved batch mode support
Recap Data Warehouse Overview Partitioned Views Partitioned Tables Compression Indexed views Filtered indexes Columnstore indexes
Linkapalooza DATA WAREHOUSE Microsoft EDW Architecture, Guidance and Deployment Best Practices http://msdn.microsoft.com/en-us/library/hh147624.aspx The Data Loading Performance Guide http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx Top 10 Best Practices for Building a Large Scale Relational Data Warehouse http://sqlcat.com/sqlcat/b/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx
Linkapalooza PARTITIONED TABLES Partitioned Table and Index Strategies Using SQL Server 2008 http://technet.microsoft.com/en-us/library/dd578580%28v=sql.100%29.aspx Strategies for Partitioning Relational Data Warehouses in Microsoft SQL Server http://technet.microsoft.com/en-us/library/cc966457.aspx Table Partitioning Resources http://www.brentozar.com/sql/table-partitioning-resources/ Query Processing Enhancements on Partitioned Tables and Indexes http://msdn.microsoft.com/en-us/library/ms345599.aspx
Linkapalooza DATA COMPRESSION Data Compression: Strategy, Capacity Planning and Best Practices http://msdn.microsoft.com/en-us/library/dd894051%28v=sql.100%29.aspx
Linkapalooza INDEXED VIEWS Creating Indexed Views (long list of restrictions) http://msdn.microsoft.com/en-us/library/ms191432.aspx Partition Switching with Indexed Views http://technet.microsoft.com/en-us/library/bb964715(v=sql.105).aspx FILTERED INDEXES Filtered Index Design Guidelines http://msdn.microsoft.com/en-us/library/cc280372(v=sql.100).aspx
Linkapalooza COLUMNSTORE INDEXES Books Online http://msdn.microsoft.com/en-us/library/gg492088.aspx Columnstore Internals http://rusanu.com/2012/05/29/inside-the-sql-server-2012-columnstore-indexes/ SQL Server Columnstore Performance Tuning http://social.technet.microsoft.com/wiki/contents/articles/4995.sql-server-columnstore-performance-tuning.aspx#Ensuring_use_of_the_Fast_Batch_Mode_of_Query_Execution
Linkapalooza CLUSTERED COLUMNSTORE INDEXES (SQL 2014) http://rusanu.com/2013/06/11/sql-server-clustered-columnstore-indexes-at-teched-2013/ TechEd Video http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/DBI-B322#fbid=rAFPjiEmlNt
Thanks! http://lance-england.com