420 likes | 545 Views
RMOUG Training Days 2005. Scaling to Infinity Partitioning in Oracle DW 09-February 2004 Tim Gorman SageLogix, Inc. Who am I?. “C” programmer since 1983 Databases since 1984, including BTrieve, C-ISAM, Unify, DEC VMS/RMS Oracle application developer since 1990
E N D
RMOUG Training Days 2005 Scaling to Infinity Partitioning in Oracle DW 09-February 2004 Tim Gorman SageLogix, Inc. www.SageLogix.com
Who am I? • “C” programmer since 1983 • Databases since 1984, including BTrieve, C-ISAM, Unify, DEC VMS/RMS • Oracle application developer since 1990 • RMOUG “Training Days” presenter since 1993 • Oracle DBA since 1994 • Oracle data warehouses since 1994 • RMOUG board member since 1995 With Gary Dodge (2000 and 1998) With Oak Table (2004) www.SageLogix.com
Agenda • DW “utopia” on Oracle • Riding the “virtuous cycle” • Sliding down the “death spiral” Partitioning, and the use of EXCHANGE PARTITION for ETL, is the differentiator www.SageLogix.com
Four DW characteristics • Non-volatile, time-variant, subject-oriented, integrated • Bill Inmon “Building the Data Warehouse” 3rd Ed 2002 (Wiley) • Think about what these mean? • Consider the converse of these characteristics? • Volatile? Static-image? Process-oriented? Application-specific? • Time-variant dimensional data model implies: • Insert, index, and analyze each row of data only once From an implementation perspective, this is vital to remember! • Consider an extreme situation? • Analytical database for quantum research in physics • 50 Tbytes of data to load every day www.SageLogix.com
The Virtuous Cycle Insert-only processing enables… Direct-path loads of data Partitioned tables/indexes stored in time-variant tablespaces Direct-path (a.k.a. append) loads enable… Larger volume loads with less overall impact Table compression NOLOGGING and PARALLEL operations Partitioned tables/indexes stored in time-variant tablespaces enable… EXCHANGE PARTITION during ETL READ ONLY tablespaces as data ages Performance scalability from partition pruning www.SageLogix.com
EXCHANGE PARTITION during ETL enables… Bitmap indexes and bitmap-join indices Elimination of ETL “load window” and 24x7 availability for queries Bitmap indices enable… Star transformations on “star” (dimensional) schemas READ ONLY tablespaces enable… Near-line storage (i.e. NAS, SAMFS/HFS, etc) “Right-sizing” of storage to the need, classified by IOPS Backup efficiencies READ WRITE tablespaces scheduled for backup every week READ ONLY tablespaces scheduled for backup every year The Virtuous Cycle www.SageLogix.com
The Death Spiral Volatile data presented in a static-image according to process-oriented concepts leads to… ETL using “conventional-path” INSERT, UPDATE, and DELETE operations (including MERGE and multi-table INSERT) Conventional-path operations are trouble with: Bitmap indexes and bitmap-join indexes Forcing frequent complete rebuilds until they get too big Contention in Shared Pool, Buffer Cache, global structures Mixing of queries and loads simultaneously on table and indexes Periodic rebuilds/reorgs of tables if deletions occur Full redo logging and undo transaction tracking ETL will dominate the workload in the database Queries will consist mainly of “dumps” or extracts to downstream systems Query performance will be abysmal and worsening… www.SageLogix.com
UpSert/Merge logic during large-scale ETL represents death for scalability in a large DW The Death Spiral www.SageLogix.com
Virtue is easier on everyone • Use dimensional data models for presentation of data to users • Anything goes during ETL staging, but users want simplicity and speed • Oracle mechanisms optimize dimensional data models • Star transformations using bitmap and bitmap-join indexes • Partition pruning during queries • Non-intrusive ETL processing (24x7 query operations) • Direct-path “bulk” loading without interrupting queries • Newly-loaded data “published” simultaneously to users • Conserving resources • Any UPDATE or DELETE logic can be converted to INSERT • Table compression • READ ONLY tablespaces www.SageLogix.com
Direct-path loads • Bulk loading feature first introduced in Oracle v6 in FASTLOAD utility on MVS to compete with DB2 • Incorporated into SQL*Loader DIRECT=TRUE in v7.0 • Extended to CREATE TABLE AS SELECT in v7.2 • Extended to INSERT /*+ APPEND */ in v8.0 • Enhanced in v8.1 to leave behind a direct-path log for use by MV “fast” refresh • Loads data outside of “managed space” • Load above the “high-water mark” in target table in SQL*Loader DIRECT=TRUE PARALLEL=FALSE • After successful completion, high-water mark is raised to include newly-loaded rows in the table • Load into TEMPORARY segments in all other load mechanisms • After successful completion, TEMPORARY segments are merged into the table segment www.SageLogix.com
Direct-path loads • Fast bulk load mechanism bypasses: • Buffer Cache • Though which “conventional-path” INSERTs, and all UPDATE and DELETE operations pass • Log Buffer and entire redo log generation process • If keyword NOLOGGING is utilized • Processes format blocks with newly-inserted rows in private process memory • Writes them directly to datafiles www.SageLogix.com
Direct-path loads • NOLOGGING options exist for all direct-path or APPEND operations • Available only for INSERT operations, never UPDATE or DELETE • A potential performance enhancement • If the redo logging stream is truly causing performance problems • Don’t assume that this is so, please verify! • Flip side: NOLOGGING means no recoverability • RMAN incremental backup capability can help here… www.SageLogix.com
Exchange Partition • EXCHANGE PARTITION is crucial to non-intrusive ETL • Data is transformed, cleansed, loaded, indexed, analyzed offline from “live” tables and indexes • Direct-path load operations are especially tough on “live” indexes • “In-flight” queries continue to process during and after EXCHANGE PARTITION operations • Oracle’s read-consistency mechanisms cause existing operations to use data that was exchanged away from the table, and new operations to use data exchanged into the table • Local-partitioned indexes and statistics are exchanged as well • Global-partitioned indexes are maintained during exchange operation www.SageLogix.com
Exchange Partition • EXCHANGE PARTITION can be transparent to in-flight queries • DML locks prevent exchange on objects where INSERT, UPDATE, DELETE, and SELECT … FOR UPDATE in progress • What happens to in-flight queries if standalone table “TT” is truncated or dropped immediately after the exchange completes? Queries started after EXCHANGE utilize the segment that is partition P18 after the exchange P11 P12 P13 P14 P15 P15 P17 P18 Queries that were in-flight before EXCHANGE continue to utilize the segment that was partition P18 before the exchange TT www.SageLogix.com
Exchange Partition • The basic technique of bulk-loading new data into a temporary “load table”, which is then indexed, analyzed, and then “published” all at once to end-users using the EXCHANGE PARTITION operation, should be the default load technique for all large tables in a data warehouse • fact tables • slowly-changing or quickly-changing dimensions • Assumptions for this example: • Composite partitioned fact table named TXN • Range partitioned on DATE column TXN_DATE • Hash partitioned on NUMBER column ACCT_KEY • Data to be loaded into partition P20040225 on TXN www.SageLogix.com
Exchange Partition • Create temporary table TXN_TEMP as a hash-partitioned table • Perform parallel, direct-path load of new data into TXN_TEMP • Gather CBO statistics on table TXN_TEMP • Create indexes on the temporary hash-partitioned table TXN_TEMP corresponding to the local indexes on TXN • using PARALLEL, NOLOGGING, and COMPUTE STATISTICS options • alter table TXN exchange partition P20040225 with table TXN_TEMP including indexes without validation update global indexes; • Table TXN_TEMP is left ready for next load cycle www.SageLogix.com
Exchange Partition Composite-partitioned table TXN 5. EXCHANGE PARTITION 2. Bulk Loads 3. Analyze Hash-partitioned table TXN_TEMP 4. Index Creates 22-Feb 2004 23-Feb 2004 24-Feb 2004 (empty) 25-Feb 2004 www.SageLogix.com
It is a good idea to encapsulate this logic inside PL/SQL packaged- or stored-procedures: SQL> execute exchpart.prepare(‘TXN_FACT’,’TMP_’, - 2 ’25-FEB-2004’,’27-FEB-2004’); SQL> alter session enable parallel dml; SQL> insert /*+ append nologging parallel(n,4) */ 2 into tmp_txn_fact n 3 select /*+ full(x) parallel(x,4) */ * 4 from stage_txn_fact x 5 where load_date >= ‘25-FEB-2004’ 6 and load_date < ‘28-FEB-2004’; SQL> commit; SQL> execute exchpart.finish(‘TXN_FACT’,’TMP_’); DDL for “exchpart.sql” posted at http://www.EvDBT.com/tools.htm Exchange Partition www.SageLogix.com
Exchange Partition It is wise to encapsulate this partition-exchange functionality in a PL/SQL package- or stored-procedure Along with the related functionality to: Gather CBO statistics on the table Build indexes (in the proper related tablespaces with the proper parameters) Also, the use of stored procedures to encapsulate this logic is crucial for security You do NOT want to grant anybody the ability to ALTER TABLE or CREATE TABLE You do NOT want anybody connecting as the table owner schema! Stored procedures, once created, can be granted www.SageLogix.com
Publishing Loaded Data Coordinating the final EXCHANGE PARTITION operation permits all of the newly-loaded data to appear to the end-users simultaneously Publishing data If newly-loaded data is becoming visible to users gradually Then a “load window” when new queries cannot be started becomes necessary Exchange Partition load techniques make “load windows” of restricted activity unnecessary www.SageLogix.com
Star Transformations • Why are “star” join-transformations desirable? • Typical Oracle “nested loops”, “sort/merge”, or “hash” join methods tend to start the query from a dimension table and join into the fact table using only one index • Further “filtering” is performed by joining to other dimensions • Very “sequential” and not optimal • Star transformations do the following: • Using database statistics, identifies the pattern of one large table at the center of a query involving two or more smaller tables • Resolves a result set from each of the dimension tables • Merges all of the results sets from all of the dimensions • Uses powerful BITMAP MERGE operation on fact table www.SageLogix.com
Star Transformations • Drive query from one of the dimensions • Join to the fact from that dimension • Filter on the fact by joining to other dimensions www.SageLogix.com
Star Transformations • Find result set in each dimension • Merge results from all dimensions • Join to the fact from merged result set, using BITMAP MERGE index scan www.SageLogix.com
Star Transformations • Enabling “star” join transformations in Oracle • Parameter settings: • COMPATIBLE = 8.1.0 or higher • OPTIMIZER_FEATURES_ENABLE = 8.1.0 or higher • STAR_TRANSFORMATION_ENABLED = TEMP_DISABLE • The optimizer will consider performing a cost-based query transformation on the star query but will not use temporary tables in the star transformation. • Lots of bugs associated with setting to TRUE • BITMAP_MERGE_AREA_SIZE = <huge!> • Default is 1M. Set to 16M? 32M? 128M? 512M? • HASH_JOIN_ENABLED = TRUE • Bitmap indexes • All fact table foreign-key columns must have bitmap indices • (Optional) all dimension table non-key attribute columns should have bitmap indices • ALTER TABLE … MINIMIZE RECORDS_PER_BLOCK www.SageLogix.com
Star Transformations • STAR_TRANSFORMATION_ENABLE = TEMP_DISABLE • Setting to TRUE causes star transformation to create then drop a global temporary table to store intermediate results • Buggy, nasty plan… • ALTER TABLE … MINIMIZE RECORDS_PER_BLOCK • Metalink note #103490.1 provides explanation • ALTER TABLE command scans rows in the table and calculates a value for “maximum number of rows in a block”, which is used by CREATE BITMAP INDEX operations • If MINIMIZE RECORDS_PER_BLOCK not performed, a default max number of rows value is used instead • MINIMIZE allows smaller bitmap indexes to be created www.SageLogix.com
Star Transformations • A “star transformation” is not named as such in the EXPLAIN PLAN display • Instead, it is indicated by the following operations: TABLE ACCESS BY ROWID OF (<fact-table-name>) BITMAP AND • BITMAP MERGE • (data retrieved from a dimension) • BITMAP MERGE • (data retrieved from a dimension) • BITMAP MERGE • (data retrieved from a dimension) www.SageLogix.com
Star Transformations • Bitmap-join index (BJI) is an optimization of the initial phase of a star transformation • Index itself is comprised of saved data from the initial merge structure • BJI is an index on a table using data from one or more table joins • Almost like a materialized view of one of the steps of a BITMAP MERGE operation during a star query SQL> CREATE BITMAP INDEX c_s_p_bjix1 2 ON SALES (c.region, p.category) 3 FROM SALES s, CUSTOMERS c, PRODUCTS p 4 WHERE c.cust_id = s.cust_id 5 AND p.prod_id = s.prod_id; Index created. www.SageLogix.com
Partition Pruning • Oracle offers a total of five ways to partition tables and indexes • RANGE of data values • LIST of specified data values • HASH (pseudo-random distribution of data values) • Composite RANGE-HASH • Composite RANGE-LIST • Oracle cost-based optimizer can prune partitions that will not be utilized from the query • Explicit pruning (partition/subpartition name specified in query) • Implicit pruning (partition-key columns are referenced in query) Don’t scan what you don’t need www.SageLogix.com
Partition Pruning • The decision of what columns to partition upon must be carefully considered • Question: “Which queries do we want to optimize?” • Try to choose outer RANGE partition-key column based on a frequently-queried DATE column • Bear in mind that RANGE partitioning is very important to ETL processing also… • EXCHANGE PARTITION permits a great deal of flexibility, so there is no need to choose to benefit ETL exclusively • Choose the inner HASH or LIST subpartition-key column based on any other frequently-queried column • Use HASH subpartitioning for open-ended data values • Use LIST subpartitioning for bounded static data values www.SageLogix.com
Partition Pruning Prune by RANGE partition key on TXN_DT Prune by HASH subpartition key on ACCT_ID Or, prune by both! www.SageLogix.com
Partition Pruning • Explicit partition pruning SELECT … FROM sales_fact PARTITION (sales_200403) WHERE … SELECT … FROM sales_fact SUBPARTITION (sales_200403_sp12) WHERE … www.SageLogix.com
Partition Pruning • Implicit partition pruning select … from txn_fact where posting_date between ‘03-May-2004’ and ‘05-May-2004’ and … select … from txn_fact where state = ‘NJ’ and … www.SageLogix.com
Partition Pruning • Implicit partition pruning can be disabled if there is an expression involving the partition-key column • Partition pruning is disabled in this situation: select … from sales_fact where trunc(posting_date) = ‘15-MAY-2004’ and … • Partitiong pruning is enabled in this situation: select … from sales_fact where posting_date >= ‘15-MAY-2004’ and posting_date < ‘16-MAY-2004’ and … www.SageLogix.com
Partition Pruning • EXPLAIN PLAN displays are a little confusing: PARTITION RANGE (ALL) PARTITION LIST (ALL) PARTITION HASH (ALL) • No partition pruning occurring PARTITION RANGE (ITERATOR) PARTITION LIST (ITERATOR) PARTITION HASH (ITERATOR) • Pruning involving two or more partitions No mention of partitions at all in the EXPLAIN PLAN indicates that pruning has occurred to include one and only one partition www.SageLogix.com
Table Compression • Available in Oracle9i Release 2 (v9.2.0) • Physical storage attribute for tables and materialized views [ CREATE | ALTER ] TABLE … • [ COMPRESS | NOCOMPRESS ] … • Restrictions: • Can be used for RANGE or LIST partitions • But cannot be used with HASH partitions • But cannot be used for HASH or LIST sub-partitions • Can be specified for NESTED tables • But cannot be used with any LOB construct • Such as CLOB, BLOB, BFILE, and VARRAY • Not valid for index-organized or external tables www.SageLogix.com
Table Compression • Storing repeated data values once in each block • A symbol table of distinct data values created in each block • The symbol table is stored as another table in the block • Each column in a row in a block references back to an entry in the symbol table in the block Header & Tailer ITL Table & Column Map Free Symbol table Row data www.SageLogix.com
Table Compression • Only bulk-loading INSERT operations perform compression • CREATE TABLE … AS SELECT … • INSERT /*+ APPEND */ (single-threaded and parallel) • ALTER TABLE … MOVE … • ALTER TABLE … MOVE PARTITION … • ALTER TABLE … MERGE PARTITION … • ALTER TABLE … SPLIT PARTITION … • SQL*Loader DIRECT=TRUE • Conventional INSERT operations unaffected • SELECT, UPDATE, and DELETE behavior also unaffected www.SageLogix.com
Table Compression • SELECT • Impressive performance improvements!!! • Less I/O due to fewer blocks • Compression ratio is linear with performance improvements • Better impact on FULL table scans • Indexed scans still exhibit less-impressive improvements • Conventional and direct-path INSERT • Noticeable performance slowdown (2-3x) • UPDATE • Very negative performance impact observed (4-8x) • DELETE • Some performance improvements observed www.SageLogix.com
Table Compression • Columns cannot be added, renamed, modified, or dropped on compressed tables or partitioned tables with compressed partitions • Might be fixed in 10g? • Local partitioned indexes are marked UNUSABLE during compression • Includes indexes on non-partitioned tables • Must be rebuilt • Global partitioned indexes can be maintained using UPDATE GLOBAL INDEXES • Includes non-partitioned indexes on partitioned tables • A rare situation when GLOBAL indexes can be more highly available than LOCAL indexes! www.SageLogix.com
READ ONLY tablespaces • Partitioning by a datetime value allows the time-variant nature of data to be exploited • Within the same table, different partitions can exist in different tablespaces • Different tablespaces can reside on different types of storage media • Most-expensive (i.e. SSD) • Very expensive (i.e. SAN) • Less expensive (i.e. JBOD/HDD, NAS, SAMFS, etc) • Set tablespaces to READ ONLY as soon as possible • Verify read-only nature using V$SEGMENT_STATISTICS and/or V$FILESTAT • Over time, the majority of data in any DW can be READ ONLY www.SageLogix.com
READ ONLY tablespaces • READ ONLY tablespaces can be: • Backed up less frequently (i.e. quarterly, annually, etc) than “active” READ WRITE tablespaces, with no compromise on recovery strategy • Moved from faster, more-expensive storage to cheaper, less-expensive storage • Without interrupting operations • OS-level copy of datafiles can proceed without interruption • Finalizing ALTER DATABASE RENAME FILE would actually interrupt in-flight queries • Only way to scale to infinity from a storage perspective! www.SageLogix.com
Q&A Questions? Email: tim@sagelogix.com Personal website: http://www.EvDBT.com/ Corporate website: http://www.SageLogix.com/ www.SageLogix.com