800 likes | 876 Views
Understanding SQL Server Data Compression. To compress or decompress – that is this session! Melissa Connors. Melissa Connors. Senior Technical Writer / Special Projects Lead Dog person / Befriender of chipmunks / Reader of books / Baker of cakes.
E N D
Understanding SQL Server Data Compression To compress or decompress – that is this session! Melissa Connors
Melissa Connors Senior Technical Writer / Special Projects Lead Dog person / Befriender of chipmunks / Reader of books / Baker of cakes
S1 Provides End-to-End Visibility Across the Data Platform Monitor, diagnose, and optimize SQL Server performance. Find and fix SQL Server Analysis Services (SSAS) performance problems Monitor resource utilization (memory, CPU, network, storage) on VMware environments. Boost Windows Server and Hyper-V virtualized environment performance. Keep Azure SQL Databases running at peak efficiency with key performance metrics and top queries. Accelerate Azure SQL Data Warehouse performance with visibility into workload impacts. See performance information for Microsoft APS, including data movement and details about distributed queries.
Agenda We’re going to save the storage! • SQL Server Editions • Compression Basics • Compression Examples • Case Study with SentryOne Database • Compression Checklist • Some Helpful T-SQL
SQL Server Editions • SQL Server 2008, 2012, 2014 Enterprise-Only Feature • SQL Server 2016 (SP 1+) • https://sqlperformance.com/2016/11/sql-server-2016/big-deal-sp1 This is a BIG DEAL!
SQL Server Compression • Often avoided because of performance concerns • Done carefully, performance overhead is not an issue • Data Compression: Strategy, Capacity Planning, and Best Practices • https://msdn.microsoft.com/en-us/library/dd894051.aspx
Compression Methods • None (the “default” or “storage inconsiderate”) • Row (the “kind of compressed” or “storage considerate”) • Page (the “actually compressed” or “storage activist”)
Estimating Compression What’s going on over here? Calculations w/ fragmentation, fill factor variables ROW PAGE https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-estimate-data-compression-savings-transact-sql
Row Compression • I think of it as more “smart-storage” than “data compression” • Essentially stores data types using only the bytes necessary for the value. • An int uses 4 Bytes of storage, whether it is 2 or 2,147,483,647. With row compression, 2 should use 1 Byte of storage space. • Trailing zeros, padded, and blank characters are not stored, where applicable. • Row Compression Implementation • https://msdn.microsoft.com/en-us/library/cc280576(v=sql.130).aspx • ^Table showing “Data type” | “Is storage affected?” | “Description”
Row Compression Example • AdventureWorks2014 database
Page Compression • Actual Compression (like, for real this time) • Row Compression PLUS the following • Prefix Compression • Prefix value selected for efficient storage • Stored values modified based on prefix values • Dictionary Compression • Eliminates redundant storage of repeated values
Page Compression • Page Compression Implementation • https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/page-compression-implementation?view=sql-server-2017
Page Compression Example • Compression = NONE • Compression = ROW • Compression = PAGE ?
DBCC IND | PAGE • You don’t need to know anything about DBCC IND or DBCC PAGE to use Data Compression. • The following slides are informational-only. (i.e. extra credit) • They’re here for your future reference. Allows you to see how individual pieces of data are affected by compression. For the DBCC curious http://erinstellato.com/resources/dbcc/ https://www.pluralsight.com/courses/sql-server-understanding-using-dbcc-commands
DBCC IND (Database|Table|IndexID) DBCC PAGE Info: http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/
DBCC PAGE TRACEON so you see the results. (DatabaseName|DatabaseID|FileNumber|PageNumber|PrintOption) DBCC PAGE Info: http://www.sqlskills.com/blogs/paul/inside-the-storage-engine-anatomy-of-a-page/
DBCC PAGE | sys.fn_PhysLocCracker NONE PAGE ROW sys.fn_PhysLocCracker Info: http://www.sqlservercurry.com/2015/02/sql-server-find-physical-location-of.html
DBCC PAGE Results (Summary) https://blogs.sentryone.com/melissaconnors/visualizing-data-compression/
What About Person.Person? • I’ll look for the most Melissa page in the table. Page 3089 has 6 Melissas! CompressionInfo Raw Bytes 0000000000000000: 02000048 05210d40 00080002 0a18494e 104d656c ...H.!.@......IN.Mel 0000000000000014: 69737361 8180a23b 00000000 01010023 05dfff02 issa..¢;.......#.ßÿ. Slot 0 Column 1 Offset 0x9 Length 4 Length (physical) 2 BusinessEntityID = 7342 Slot 0 Column 2 Offset 0x0 Length 4 Length (physical) 0 PersonType = IN Slot 0 Column 3 Offset (see CD array entry) Length 1 NameStyle = 0 Slot 0 Column 4 Offset 0x0 Length 0 Length (physical) 0 Title = [NULL] Slot 0 Column 5 Offset 0x0 Length 14 Length (physical) 0 FirstName = Melissa Table Compression = PAGE Tip: Check the PowerPoint notes to see none (14) vs. row (7) vs. page (0) info for this data
Will My Data Compress Well? • You’ve seen how compression works with repeating values and prefixes • Knowing how compression works and what your data looks like will assist in deciding what to compress • Become a Data Compression Whisperer
Data Compression: A Case Study • Examples explore data from a SentryOne database • What to compress? (decisions, decisions, decision tables) • Which compression method should you use? Why not all? • If you have three indexes on a table, you could row compress one, page compress another, and leave one of them not compressed. Mix ‘n’ Match Compression
Quick Look at the DBCC PAGE… No Compression Slot 0 Column 3 Offset 0x14 Length 16 Length (physical) 16 EventSourceID = 0c260691-16c3-4bc9-8457-220bc4457749 Slot 0 Column 6 Offset 0x24 Length 8 Length (physical) 8 RemoteSequenceID = 1 Slot 0 Column 7 Offset 0x9b Length 72 Length (physical) 72 RemoteObjectID = 731dfa50-9458-48b3-9640-a3aec5b0da11 Page Compression Slot 0 Column 3 Offset 0x0 Length 16 Length (physical) 0 EventSourceID = 0c260691-16c3-4bc9-8457-220bc4457749 Slot 0 Column 6 Offset 0x12 Length 8 Length (physical) 1 RemoteSequenceID = 1 Slot 0 Column 7 Offset 0x0 Length 72 Length (physical) 0 RemoteObjectID = 731dfa50-9458-48b3-9640-a3aec5b0da11 Storage size of 96 vs. 1 with page compression!
Quick Look at the DBCC PAGE… No Compression Slot 1 Offset 0x626 Length 1343 (the entire row) PlanHash = [Binary data] Slot 1 Column 2 Offset 0x8 Length 16 Length (physical) 16 0000003922AFA62E: a37e9b29 b662bd8c cd0bf936 e3261b60 £~)¶b½Í.ù6ã&.` Page Compression Slot 1 Offset 0x629 Length 1314 (the entire row) PlanHash = [Binary data] Slot 1 Column 2 Offset 0x1c Length 16 Length (physical) 16 000000350F460078: a37e9b29 b662bd8c cd0bf936 e3261b60 £~)¶b½Í.ù6ã&.` Metadata + page compression on index with only that column uses more space than no compression.
How Much is the CPU Overhead? • You will need to test this! • You can somewhat predict this based on how the data is used: • Data is mostly read? • Lower CPU impact • Data is mostly updated? • Higher CPU impact • You will *STILL* need to test this! & if using partitioning, maybe that workload pattern varies by partition.
Scans vs. Updates • Two important queries within Data Compression: Strategy, Capacity Planning, and Best Practices • U: Percent of Update Operations on the Object • S: Percent of Scan Operations on the Object • https://msdn.microsoft.com/en-us/library/dd894051.aspx