190 likes | 343 Views
Project Management Database and SQL Server Katmai New Features. Qingsong Yao qyao@microsoft.com. Disclaimer . The content in this slides is demonstration only. Please do your own research before trying to apply either Sparse or Compression technology to your product server. Topics.
E N D
Project Management Database and SQL Server Katmai New Features Qingsong Yao qyao@microsoft.com
Disclaimer • The content in this slides is demonstration only. Please do your own research before trying to apply either Sparse or Compression technology to your product server.
Topics • Katmai Related New Features • A Project Management System • Experimental result • Reference
SparseColumns • Sparse is purely a storage attribute • DDL support for specifying a column as “Sparse” • CREATE TABLE t (id int, sparseProp1 int SPARSE); • ALTER TABLE t ALTER COLUMNS sparseProp1 DROP SPARSE; • No Query/DML behavior changes for a sparse column • Column metadata will have a bit to indicate the sparse attribute • Storage Optimization: • Sparse vector design: 0 bytes stored for a NULL value • Have overhead for not null values (4 byte per not null column + 4 byte header)
Sparse Columns Restrictions • Sparse columns cannot be part of a key in clustered index or a PK index or part of a partition key. • Sparse column cannot be the key column for an unique index. • Unique constraints are also not allowed on sparse columns • Sparse columns cannot be defined as "non-null" and cannot have any "default" values. • Rules" are not supported on sparse columns.
Filtered Indices • “Filtered index” is such a mechanism that allows the table designer to define a regular index that optionally includes a simple filter (predicate) on that table to specify the qualifying rows that need to be indexed for that index. • Examples: • Create Index filtered_index on WSS.List(Author) where ListId = 5 • Examples of valid filter expressions include • Listid = 10 and folderid > 20 • Listid = 10 and folderid > 20 and folderid < 50 • Listid in (10, 20, 30) • Listid in (10, 20) and folderid in (15, 25)
Filtered Indices • Support Online Operation, Alter Index, Partition Table, Index hints, and DTA • Has side impact on query parameterization (because we have to do predicate matching). • Sparse Column and Not-null Filtered Indices like: • Create index on t1(c1) where c1 is not null • are very helpful on querying/storing “sparse” columns (no impact on query parameterization).
Data Compression • Different Compression Types • Vardecimal Compression (SQL Server 2005 SP2) • Row compression • Row compression contains vardecimal compression • Page compression • Page compression contains row compression • Main Focus was data warehouse scenarios • But very useful for certain OLTP scenarios as well • Main goal: Enabling compression does not require application changes • Compression only supported in Enterprise Edition Microsoft Confidential
Row compression • Light-weight compression • Useful for certain OLTP scenarios • All columns stored as variable data in new record format • Reduce overhead per column (4-bits vs 2 bytes) • Store minimal number of bytes per value: • Leading zero bytes removed for int, smallint, tinyint, bigint, datetime, smalldatetime, money, smallmoney, real, float • Trailing spaces removed for char, nchar, binary • Decimal / Numeric vardecimal compressed (same compression as vardecimal compression) • NULL / 0 value take no space (besides overhead) • No compression for varchar, nvarchar, varbinary, text, ntext, image Microsoft Confidential
Page Compression • Compress all data on a single data page • Compress ‘similar’ column values by only storing the value once on the page instead of multiple times • Two page compression algorithms • Column Prefix • Dictionary • User cannot choose algorithm, both algorithms are always applied • Page Compression includes row compression Microsoft Confidential
Topics • Katmai Related New Features • A Project Management System • Experimental result • Reference
Data Sources • Three tables store main workitem information • Table has 17 regular columns (have meaningful name, and always have not null values), and all other columns has predefined random name, and random data type • Views are defined on these columns to assign meaningful names to the columns
Column Distribution Information • Column data type distribution: • Null value distribution
Topics • Katmai Related New Features • SQL Project Management database overview • Experimental result • Reference
Summary • Using WorkItemsWere as source table , try following cases: • Compress table using page compression ALTER TABLE WorkItemsWere REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE) • Compress table using row compression ALTER TABLE WorkItemsWere REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW) • Find all columns have more than 67% null, and change to sparse • insert into temp table select * into WorkItemsWere_Temp from WorkItemsWere • truncate data truncate table WorkItemsWere • change sparse script alter table WorkItemsWere alter column [Fld10004] add sparse • insert data back insert into WorkItemsWere select * from WorkItemsWere_temp • rebuild index ALTER INDEX [PK_WorkItemsWere] ON [dbo].[WorkItemsWere] REBUILD
Result • Space Saving: Page Compress > Sparse > Row Compress > normal • Procedure sp_estimate_data_compression_savings can estimate space saving without doing the actual compression (but it is not very accurate)
CPU Overhead • The database server have 16G memory, while table WorkItemsWere (the largest table) is 4G which mean all data can be in the cache, and physical I/Os are likely be 0. • Sparse and Compression can save Logical I/O since they require reading less pages. • Sparse and Compression can increase CPU time since the data need to be uncompressed.
Table Scan CPU Time Result Select 17 regular columns + 27 predefined columns (with at least 20% not null value) + 10 random predefined columns Result shows that sparse case has less CPU Overhead. Next slide shows the reason.
Table Scan CPU Time Result (2) • Sparse does not have negative impact on regular columns. • Extracting Null values from sparse columns has higher CPU overhead than page compression and row compression case. • Page Compression and Row Compression are in table level, the CPU overhead of uncompressing Not Null values are higher.