Table Compression in Oracle9 i R2

Table Compression in Oracle9i R2 INSIDE OUT Plamen Zyumbyulev ,, Let someone k n o w ”

Agenda • Overview Table Compression • How does it work? • Test Environment • Space Savings • Query Performance • Conclusion

Table Compression Facts • Table compression is useful • Everyone benefits from space saving • It not only saves space but can increase performance • It can’t be implemented everywhere

Why Table Compression ? • Table Compression increases: • I/O-subsystem capacity • I/O throughput • query scan performance (mainly FTS) • buffer cache capacity • Table Compression: • reduces cost of ownership • is easy to use • requires minimal table definition changes • is transparent to applications

Overview: Table Compression • Compression algorithm is based on removing data redundancy • Tables and Materialized Views can be compressed • Compression can also be specified at the partition level and tablespace level • Indexes and index-organized tables are not compressed with this method (there are other methods for index and IOT compression) • Compression is dependent upon the actual data • DDL/DML commands are supported on compressed tables • Table columns cannot neither be added nor deleted from a compressed table.

Which Applications benefit from Table Compression? • Table Compression targets read intensive applications such as Decision Support and OLAP • All schema designs benefit from Compression

How does Table Compression work? • Data is compressed by eliminating duplicate values in a database block If column values from same or different columns have the same values, they share the same symbol table entry. Only entire column values are compressed. First Name Last Name Scott Smith Henry Smith Henry Scott Henry-Scott McGryen Sequences of columns are compressed as one entity if asequence of column valuesoccurs multiple times in manyrows. Dictionary is built per block  information to uncompress data is available in each block

Non-Compressed Block Compressed Block Block Header Block Header 1233033 Meyer 11 Homestead Rd 13.99 1212300 Meyer11 Homestead Rd 1.99 1243012 Meyer 11 Homestead Rd 1.99 9923032 McGryen 3 Main Street 1.99 9833023 McGryen 3 Main Street 1.99 2133056 McGryen 3 Main Street 1.99 Meyer 11 Homestead Rd 1.99 McGryen 3 Main Street 1212300 1233033 13.99 1243012 98 9923032 2133056 33023 Free Space Free Space Block Level Compression Invoice CustName CustAddr Sales_amt 1233033 Meyer 11 Homestead Rd 13.99 1212300 Meyer 11 Homestead Rd 1.99 1243012 Meyer 11 Homestead Rd 1.99 9923032 McGryen 3 Main Street 1.99 9833023 McGryen 3 Main Street 1.99 2133056 McGryen 3 Main Street 1.99 Symbol Table

How Table Compression works • All columns are considered for compression • Only worthwhile compression is performed • Symbol table is created within each database block depending on block content • Self tuning symbol table is created automatically by the system • No explicit declaration of symbol table entries • Compression algorithm automatically adapts to changes in data distribution

Which data is compressed • Compression occurs only when data is inserted with a bulk (direct-path) insert operation. • Direct Path SQL*Loader • insert /*+ append */ … • create table … as select … • alter table move… • A table can consist of compressed and uncompressed blocks transparently. • Any DML operation can be applied to a table storing compressed blocks. However, conventional DML operations cause records to be stored uncompressed*.

SQL Commands • For a new table: • Create with compress attribute in table definition create table … compress • For an existing table: • Alter table to add compress attribute  only new rows are compressedalter table foo compress; • Compress table  old and new rows are compressedalter table foo move compress;

Process of Compressing a Block

Deletes, Inserts and Updates • Deletes, Inserts and Updatesare possible but can cause fragmentation and waste disk space when modifyingcompressed data. • Large PCTFREE will lead to low compression ratios. Setting PCTFREE to 0 (default) is recommended for all tables storing compressed data.

Updates • When a column is updated the algorithm checks whether asymbol table entry for the new value exists. • If it exists,the reference of the updated column is modified to thenew symbol table entry and its reference count isincreased by one. At the same time the reference count ofthe old value is decreased by one. • If no symbol table entry exists for the new column value,that value is inserted non-compressed into the row.

Some update operations can take advantageof compression UPDATE TABLE item SET i_color = ‘green’ WHERE i_color =’blue’ • If the old column value(‘green’) was also compressed and its reference count afterthe update operation became zero, the old symbol tableentry is replaced with a new symbol table entry withouttouching all rows of one block.

Deletes • During delete operations all references counters of thedeleted rows are decreased by one. Once a referencecounter becomes zero, the corresponding symbol tableentry is purged. • A symbol table isnever deleted from a block even if no reference into itexists because the overhead of an empty symbol table isonly 4 bytes.

Test Environment: • One very big table – 2.3 TB • Table is partitioned per day. • One partition is around 3,2 GB • Once the data is loaded and processed it becomes read only. • Most of the table access is – FTS

Space Savings • Table Compression significantly reduces disk and buffer cache requirements • Compression results mostly depend on data content on block level • Definitions: Compression Factor Space Savings Non Compressed Blocks CF= Compressed Blocks Non Compressed Blocks – Compressed Blocks SS= x100 Non-Compressed Blocks

What affects Compression? Table Characteristic Compression Factor high low Column length long short Number distinct values low high Block size large small Sorted data yes no Column sequence yes no Modified data yes no

Estimating CF by using data samples create function compression_ratio (tabname varchar2) return number is -- sample percentage pct number := 0.000099; -- original block count (should be less than 10k) blkcnt number := 0; -- compressed block count blkcntc number; begin execute immediate ' create table TEMP_UNCOMPRESSED pctfree 0 as select * from ' || tabname || ' where rownum < 1'; while ((pct < 100) and (blkcnt < 1000)) loop execute immediate 'truncate table TEMP_UNCOMPRESSED'; execute immediate 'insert into TEMP_UNCOMPRESSED select * from ' || tabname || ' sample block (' || pct || ',10)'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_UNCOMPRESSED' into blkcnt; pct := pct * 10; end loop; execute immediate 'create table TEMP_COMPRESSED compress as select * from TEMP_UNCOMPRESSED'; execute immediate 'select count(distinct(dbms_rowid.rowid_block_number(rowid))) from TEMP_COMPRESSED' into blkcntc; execute immediate 'drop table TEMP_COMPRESSED'; execute immediate 'drop table TEMP_UNCOMPRESSED'; return (blkcnt/blkcntc); end; /

Ordered vs. Not ordered • The biggest CF increase comes from ordering the data

How Data volume affects CF 3.6 3.4 3.2 Compression Factor 3.0 2.8 2.6 2.4 1 3 5 7 9 Days in one partition

20 values 1 2 3 4 5 Ordered 1 column row 5 5 5 5 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 CF=4 CF=2.5 1 2 3 4 5 5 rowsper block compressed Block1 4 rowsper block compressed 1 2 3 4 5 Block1 Block2 Ordered Data • Not all values fit into first block • Symbol tables contains • 1 2 3 4 • Block contains 16 values Each value is compressed in one block Symbol table contains 1 2 3 4 5 Block contains 20 values Input Data Symbol tables contains only 5 Block contains 4 values Sorting can also improve the clustering factor of your indexes.

20 values 1 2 3 4 5 Not ordered 1 column row 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 CF=4 CF=1 1 2 3 4 5 5 rowsper block compressed Block1 4 rowsper block compressed 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 Block1 Block2 Block3 Block4 Block5 Not Ordered Data Input Data

Choosing the columns to order by • Sorting on fields with very low cardinality does not necessarily yield to better compression • The optimal columns to sort on seem to be those that have a table/partition-wide cardinality equal to the number of rows per block • Column correlation should be considered • The process is iterative

Know your data • Without a detailed understanding of the data distribution it is very difficult to predict the most optimal order. • Table/partition statistics are useful • dba_tables • dba_tab_partitons • Looking into a particular data block is very helpful • substr(rowid, 1, 15)

Improving ordering speed • Set SORT_AREA_SIZE for the session as big as possible. Use dedicated temp tbs with big extent size (multiple of SORT_AREA_SIZE + 1 block) If the sort needs more space: • The data is split into smaller sort runs; each piece is sorted individually. • The server process writes pieces to temporary segments on disk; these segments holdintermediate sort run data while the server works on another sort run. • The sorted pieces are merged to produce the final result. • If SORT_AREA_SIZE is notlarge enough to merge all the runs at once, subsets of the runs are merged in a numberof merge passes.

How CF affects FTS performance • Queries are executed against compressed schema and non-compressed schema • Overall query speedup 65%

Query Elapsed Time Speedup • The larger the compression factor the larger the elapsed time speedup • Query speedup results from reduction in I/O-operations required • Speedup depends on the weakness of the I/O-subsystem • Speedup depends on how sparse the blocks are that the query accesses

Performance impact on loads and DML • On system with unlimited IO bandwidth, data load may be two times longer (even more if data need to be ordered). • Bulk loads are IO-bound on many systems. • Deleting compressed data is 10% faster. • Inserting new data is as fast as inserting into non compressed table. • UPDATE operations are 10-20% slower for compressed tables on average, mainly due to some complex optimizations that have been implemented for uncompressed tables, and not yet implemented for compressed tables.

Other Performance Tests Parallel load performance (CPU)

Delete operation CPU Utilization Delete/Update Performance Update operation CPU Utilization

FTS Performance Parallel Full Table Scan CPU Utilization Parallel Full Table Scan IO Performance

Table Access by ROWID

Best Practices • Use Compression in read intensive applications • Execute bulk loads (SQLLDR and Parallel Insert) to compress rows • Compress older data in large Data Warehouses • Integrate Table Compression into the ‘rolling window’ paradigm: Compress all but most recent partition • Compress Materialized views • Only compress infrequently updated tables

Data normalization and Table Compression • “Normalize till it hurts, denormalize till it works” • High normalization may result in a high number of table joins (bad performance) • Both data normalization and table compression reduce redundancy

Conclusion • Table Compression: • reduces costs by shrinking the database footprint on disk • is transparent to applications • often improves query performance due to reduced disk I/O • increases buffer cache efficiency

Q & Q U E S T I O N S A N S W E R S A zyumbyulev@mobiltel.bg

Table Compression in Oracle9 i R2

Table Compression in Oracle9 i R2

Presentation Transcript

Lossless Compression - I

Oracle9 i Developer: PL/SQL Programming

Table: I

Oracle9 i Developer: PL/SQL Programming

Three Types of Table Compression

Gas Compression Part I

From R2 to R2

Module 6 Implementing Table Structures in SQL Server ® 2008 R2

Client/Server Databases and the Oracle9 i Relational Database

TABLE I

Oracle9 i Performance Tuning

r2

Oracle9 i JDeveloper for Database Developers and DBAs

Data Compression I

Motivation I: Data Compression

Oracle9 i Performance Tuning

Oracle9 i Performance Tuning

Oracle9 i Developer: PL/SQL Programming

Lossless Compression - I

Oracle9 i Performance Tuning

Oracle9 i Extensions to DML and DDL Statements

Oracle9 i Forms Developer - New Features