1 / 76

Practical Space Management in Data Warehouse Environments

Practical Space Management in Data Warehouse Environments. Hamid Minoui Database Specialists, Inc. www.dbspecialists.com hminoui@dbspecialists.com. Objectives. To point out data warehouse space management issues Suggest resolutions Recommend space management methodologies

max
Download Presentation

Practical Space Management in Data Warehouse Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practical Space Management in Data Warehouse Environments Hamid Minoui Database Specialists, Inc. www.dbspecialists.com hminoui@dbspecialists.com

  2. Objectives To point out data warehouse space management issues Suggest resolutions Recommend space management methodologies Provide proactive prevention strategies Cover both Oracle 9i and Oracle 10g space management features

  3. Characteristics of a Data Warehouse • The data: • Large amount of data loads and ETL operations • Very large size (Terabytes) • Change in structure of source data • Contains lots of historical data • Data massaging and aggregations • Multiple sources of data • Dynamic nature of data

  4. Characteristics of a Data Warehouse (continued) • Maintenance activities: • Space management • Table re-organizations • Index rebuilds • Partition maintenance • Refresh maintenance on materialized views • Job and scheduling management

  5. Characteristics of a Data Warehouse (continued) • Typical issues: • Data integrity issues • Data security issues • Space issues • Query performance issues • Duplicate rows

  6. Characteristics of a Data Warehouse (continued) • Database features frequently used: • Materialized views (MV) • Bitmap indexes and bitmap-join indexes • Index organized tables (IOT) • Parallel execution • Table and index partitioning • Table and index compression • Load utilities and facilities

  7. Other Characteristics • Star schemas, snow flakes or 3rd Normal Form • Have dimensions and hierarchy • Frequent need to collect statistics • Use of bulk and parallel loads • Variety in the generated queries • Dynamic nature of queries • Divided into areas (staging, ODS, and target area) • Often associated with smaller data marts

  8. Performance Tuning and Resolutions • Frequent query tuning • Star transformation • De-normalization • Pre-aggregations via materialized views • B*Tree, IOT, function based, bitmap, bitmap-join indexes • Use of database resource management

  9. Why is Space a Coveted Resource in a Data Warehouse? • Lots of disk space is consumed • Stores all enterprise data • Segments are mostly large • Many indexes • Years of historical data kept online • Many versions of the same data • Duplicated and de-normalized data • Various levels and dimensions of data (monthly, weekly, daily)

  10. Why is Space a Coveted Resource in a Data Warehouse? • Enough reserve space needed: • For daily/weekly/monthly growth • Recall offline old data when needed • Data correction • Materialized views and their growth • Emergency needs • Data files and tablespace growth • Temporary tablespaces

  11. Reacting to Space Issues • Down sides: • Often, not enough time to react • Delay in the load • Wasted resources to reload • Up sides: • Loads are usually scheduled • Once data is loaded, most of it won’t change

  12. Issues with Database Backups in a Data Warehouse • Too many files to backup every night • Backup takes a long time to complete • System resources busy during backup • Possible licensing issues with third-party backup software • Restoring and recovery after a failure can take a long time

  13. A Typical Backup Strategy • Make non-current table spaces READONLY every month • Perform a special backup of READONLY tablespaces • Exclude the READONLY table spaces from regular hot backups • Never backup temporary tablespaces Caveat: You must wait until all transactions are committed

  14. Avoiding Unnecessary Redo Log Generation • Create some tables and all indexes with NOLOGGING for any segment that can be re-generated without doing a database recovery: • SQL*Loader with direct path load • CREATE TABLE AS SELECT from external or transient tables • INSERT using +append hint • Use global temporary tables insert /* +append */ into transiant_table select * from source_table ; create table transient_table as select * from source_table ;

  15. Speeding Up Bulk Load Operations • Before the load: • Make all non-unique indexes unusable • Disable the primary and unique constraints if the source data is trusted • Disable all triggers on the table • Set the session to skip unusable indexes

  16. Speeding Up Bulk Load Operations • Implement the load: • Use append and parallel hints with insert • Commit the transaction • After the load: • Rebuild indexes • Enable triggers and constraints

  17. Space Issues in Data Warehouses • Permanent tablespaces (data, indexes) • Temporary tablespaces (temp segments) • UNDO segments and tablespace

  18. Space Issues with Permanent Tablespaces • Caused by: • Poor extent sizing • Setting maxextents • PCT_INCREASE > 0 • Small data files (tablespaces) • User quota on tablespace

  19. Space Issues with Temporary Tablespace • Caused by: • Not enough space for the sort segments • Other temp segments such as global temporary tables • Multiple users sharing the same temporary space • Multiple queries with sort requirements running at any time

  20. Space Issues with Temporary Tablespace • Partially resolved by: • Oracle 9i - Dynamic PGA memory allocation • PGA_AGGREGATE_TARGET=<integer value> • WORKAREA_SIZE_POLICY=AUTO • Oracle 10g - Tablespace Group assignment

  21. Space Issues Associated with Undo Segments • Long running queries causing ORA-1555 (snapshot too old) • Small UNDO tablespace • Small rollback segments

  22. Database Block Size (DB_BLOCK_SIZE) • Should seriously be considered • An important decision with new data warehouse projects • Inappropriate value can be disastrous and detrimental • Small value can: • Impact I/O efficiency for majority of queries • Negatively influence overall database performance

  23. Appropriate DB_BLOCK_SIZE Value • Multiple of the OS block size • As large as your I/O subsystem can handle in a single read • As large as supported by Oracle • Best benefit from larger block size if: • Database is configured on raw devices, or • Direct I/O is available to you.

  24. Benefits of Larger DB_BLOCK_SIZE Value • Efficiency with index scan • A larger block size reduces the number of reads required to probe an index and scan a range of values from its leaf blocks • Less memory requirement for buffer cache • Fewer buffers needed for index branch blocks • Better compression ratio for tables, indexes • Improvement in block density • Amount of space used by fixed portion of bock header is reduced

  25. Benefits of Larger DB_BLOCK_SIZE Value • Blocks can accommodate longer rows; less chance for row chaining • Less occurrence of ORA-1555 • Increase in size of the transaction table in undo segments header blocks • Fewer writes required for data loads • Because of the reduced block level overhead, less redo logs are generated when blocks are modified sequentially

  26. Disks, I/O and Database Files Configuration • A poorly configured I/O subsystem can badly impact I/O performance • Poor I/O performance can impair a data warehouse • Configure disk and distribute data for read and write efficiency • Use raw I/O if possible, otherwise use direct I/O • Make use of asynchronous I/O, parallel read and parallel writes

  27. Disks, I/O and Database Files Configuration • Stripe and Mirror or Mirror and Stripe the disks • RAID-1+0 or RAID-0+1 • Evenly spread your data and Stripe And Mirror Everything (SAME) on many disks • Reserve room on file systems for auto extendable files

  28. Managing the UNDO Segments • Manual undo (rollback segments) management • Pre Oracle 9i practices • Too many manual interventions by DBA

  29. Managing UNDO (continued) • Automatic Undo Management (AUM) • Much better – Highly recommended • Allows controlling retention of committed transactions undo information (UNDO_RETENTION) • Better monitoring statistics • Infrequent occurrence of ORA-1555 • SMON periodically manage space and shrinks undo segments

  30. UNDO_RETENTION Parameter Setting • Set to a value equal to the time used by the longest running query • Undo is ‘expired’ when retention time is reached • Expired undo will be de-allocated if needed by new transactions • Unexpired undo are re-used if space is needed (undo reuse) • Default value is 300 seconds

  31. Undo Reuse and Undo Stealing • Undo Reuse:Unexpired undo of the same segment will be reused • Undo Stealing:Unexpired undo of another segment is used • Undo reuse is more common. Occurs when • UNDO tablespace is too small, or • UNDO_RETENTION value is too large

  32. Monitoring the UNDO Segments Statistics • Statistics are gathered in V$UNDOSTAT every 10 minutes • Helps sizing UNDO tablespaces and tune UNDO_RETENTION • Statistics are retained for 7 days

  33. V$UNDOSTAT

  34. V$UNDOSTAT (continued)

  35. Tuning UNDO_RETENTION • Oracle 9i: • Manually adjust to the time taken by the longest query SELECT MAX (MAXQUERYLEN) FROM V$UNDOSTAT; • Oracle 10g: • Automatically tracked and tuned by RDBMS

  36. The UNDO Tablespace • Created at DB creation or with CREATE UNDO TABLESPACE • Use V$UNDOSTAT for sizing and monitoring • Space issues if UNDO_RETENTION is too large • Use AUTOEXTEND • RETENTION_GUARANTEE clause • Sizing formula:Undo Segment Space Required (MB) = (undo_retention * undo_blcks/secs * DB_BLOCK_Size)/1024

  37. Database Fragmentation Issues • Best to reduce or eliminate fragmentation to avoid wastage and improve performance • Tablespace level (or file level) fragmentation • Segment level fragmentation • Block level fragmentation

  38. Tablespace Level Fragmentation • Bubble Fragmentation • Free block of space not large enough for another extent • Honeycomb Fragmentation • Free un-coalesced space next to each other but considered separate

  39. Segment Level Fragmentation • Space allocated to segment is not fully utilized (wasted) • Space above the high water mark (unused blocks) • Free segment blocks below the high water mark

  40. Block Level Fragmentation • Blocks are not empty but there is space within a block that is not used • Caused by: • Setting of PCTFREE and PCTUSED • Deletions • Row migrations

  41. Tablespace Planning • Use locally managed tablespaces (LMTs) with UNIFORM size extents • 64K bitmaps on file header are used to manage extents • Improves performance and significantly reduces overhead associated with updating dictionary tables (recursive SQL) • No need to use ST enqueue • No more tablespace fragmentation

  42. Tablespace Planning • Use Automatic Segment Space Management (ASSM) • Set at the tablespace level • Tablespace must be locally managed • Uses bitmap instead of freelist to manage space within segments

  43. Benefits of ASSM • No more need for FREELISTS, FREELIST GROUPS and PCTUSED • Reduces segment level and block level fragmentations • Reduces the number of buffer free waits • Adds efficiency to space usage • Provides better use of space within the blocks

  44. LMT Considerations • The bitmap is 64K • Make the size of each file a multiple of UNIFORM extent+64K • Storage parameters • Avoid setting them • If already defined on segments reorganize, or rebuild with storage parameters matching tablespace

  45. Multiple Tablespace Size Models • SAFE (methodology) • Group segments according to size (3 groups) • Use 3 tablespace model having different UNIFORM extents • Assign each group to one of the size model • Develop a naming convention

  46. Tablespaces for Different Types of Segments • Separate indexes and tables • Better manageability • Different type of usage • Reduces wastage (indexes are rebuilt often in data warehouses)

  47. Adjust Settings of PCTFREE and PCTUSED Parameters • Avoid using default values • Set according to usage • Most of the times PCTFREE=0 and PCTFREE=99 should be enough • If ASSM, no need for PCTUSED • More compact data in blocks reduces waste and improves I/O

  48. Use Index Organized Tables (IOTs) • When most of the columns are indexed • When associated tables are used • Columns are pre-sorted • Makes better use of space and improve performance • Good for certain data warehouse tables

  49. Table Compression • Introduced in Oracle 9i R2 • Improves read only operations and factors out repetitive values within a block • Replaces duplicate values in a block with a reference to a symbol table in the block • Very low CPU overhead to reconstruct the block • Significantly fewer blocks, leading to better I/O • Very flexible (not all blocks are compressed) • Associated with bulk load operations

  50. Table Compression • To compress a table use:ALTER TABLE t1 MOVE compress; • To compress a table partition use:ALTER TABLE T1 MOVE PARTTION P1 compress; • Alternative way CTAS compress:CREATE TABLE T1 compressAS SELECT * FROM T1_UNCOMPRESSED; • Table or partition not available (locked) during move • Use DBMS_REDEFINITION for online move

More Related