1 / 58

A Multi-Source Time-Variant Datawarehouse Case Study

This case study explores data warehouse techniques in Oracle 10G for a high-volume claims data warehouse with irregular data sources. Learn how to address challenges such as irregular data, continuous additions, and varied archival requirements efficiently using Oracle tools. Discover design strategies like partitioning, local indexing, and materialized views to streamline data refreshes, additions, and archiving processes. Gain insights into ETL setup, dimension variations, and index storage management for a robust and scalable Oracle data warehouse solution.

yolandaa
Download Presentation

A Multi-Source Time-Variant Datawarehouse Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AMulti-Source Time-Variant DatawarehouseCase Study Session# 36605byArup NandaProligence, Inc. Norwalk, CT

  2. Objectives • Exploring DW Techniques in Oracle • Case Study • Oracle 10G Additions

  3. A Real Life Case • Claims Datawarehouse • Several Customers/Sources • Several Quarters • Data Volume Was High • Irregular Frequency • Data Comes Often Late • Near Real Time Requirements

  4. DB1 DB2 Cust1 Cust11 ? DB6 Cust10 DB3 Cust2 Cust9 Cust3 Cust8 Datawarehouse Cust4 Cust7 DB4 DB5 Cust5 Cust6

  5. Problem of Irregular Data Detail Table Summary Table Detail Table CUST2

  6. Problems • Incoming Data Irregular • Summary Tables Need Refreshing • Quarters Added Continuously • Archival Requirements Vary Across Customers • Quick Retrieval of Archival Needed

  7. Problems contd. • Summary on Summary Tables as Materialized Views • Need Refresh Whenever New Data Arrives • Or When Data is Purged/Reinstated • Customers Added and Deleted Frequently

  8. Objective • To Minimize Downtime for Refreshes • Incrementally Refresh • Partitioning Techniques • To Add Customers Easily • To Add Quarters Easily • To Archive Off and Purge Easily and Atomically • To Restore Archives Quickly

  9. Objective contd. • To have an ETL Setup for Easy Addition of Objects Such As Tables, Indexes, Mat Views. • Use Only Available Oracle and Unix Tools • PL/SQL • Unix Shell Scripts • SQL*Plus

  10. Design • Varying Dimensions – • Customer • Quarter • Composite Partitioning • Range (for Quarters) • List (for Customers) • Local Indexes

  11. Partitioning • Partitioned on CLAIM_DATE • RANGE • Partitioned named YyyQq • Storage Clauses Not Defined • Supartitioned on CUST_NAME • LIST • Named YyyQq_CustName, e.g. Y03Q3_CUST1

  12. Indexing • All Indexes Local CREATE INDEX IN_CLAIM_SUM_01 LOCAL ON SUMTAB1 (COL1, COL2)… • No Indexes UNIQUE and GLOBAL

  13. Storage Each Subpartition – of Index or Table is kept in separate tablespaces named in the format Y<Year>Q<Qtr>_<CustName>_DATA e.g.Y02Q2_CUST1_DATA Y02Q2_CUST2_DATA Y03Q3_CUST1_DATA

  14. Index Cust3 Y03 Q3 In Tablespace Y03Q3_CUST3_INDX Table Cust3 Y03 Q3 Customers Quarter In Tablespace Y03Q3_CUST3_DATA

  15. Tablespace create tablespace y03q3_cust1_data datafile ‘/oradata/y03q3_cust1_data_01.dbf’ size 500m autoextend on next 500m extent management local segment space management auto

  16. Table DDL CREATE TABLE TAB1 ( … ) PARTITION BY RANGE (CLAIM_DATE) SUBPARTITION BY LIST (CUST_NAME) ( PARTITION Y03Q1 VALUES LESS THAN (TO_DATE(‘2003/04/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q1_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q1_CUST1_DATA, SUBPARTITION Y03Q1_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q1_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), PARTITION Y03Q2 VALUES LESS THAN (TO_DATE(‘2003/07/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q2_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q2_CUST1_DATA, SUBPARTITION Y03Q2_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q2_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF VALUES LESS THAN (MAXVALUE), ( SUBPARTITION DEF_CUST1 VALUES (‘CUST1’) TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 VALUES (‘CUST2’) TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ) )

  17. Index DDL CREATE INDEX IN_TAB1_01 ON TAB1 (COL1) LOCAL NOLOGGING ( PARTITION Y03Q1 ( SUBPARTITION Y03Q1_CUST1 TABLESPACE Y03Q1_CUST1_INDX, SUBPARTITION Y03Q1_CUST2 TABLESPACE Y03Q1_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF TABLESPACE USER_DATA ), PARTITION Y03Q2 ( SUBPARTITION Y03Q2_CUST1 TABLESPACE Y03Q2_CUST1_INDX, SUBPARTITION Y03Q2_CUST2 TABLESPACE Y03Q2_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF ( SUBPARTITION DEF_CUST1 TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF TABLESPACE USER_DATA ) )

  18. Creating DDLs Static Part create table tab1 (………) DDL to Create Table partition y03q1 ( subpartition y03q1_cust1 tablespace …) Variable Part

  19. Constraints Constraints defined as DISABLE NOVALIDATE RELY ALTER TABLE … ADD CONSTRAINT … RELY DISABLE NOVALIDATE;

  20. Constraint • VALIDATE/NOVALIDATE • Table TAB1 (Column: STATUS) • Current Values A, I, F • Check Constraint: STATUS IN (‘A’,’I’) • ENABLE/DISABLE • New Value ‘F’ • RELY

  21. RELY Reasons • To Include Relation Information to the Metadata • To Enable Query Rewrite

  22. Summary Tab and View On Source On DW

  23. Casting SELECT CAST (CUST_NAME AS VARCHAR2(20)) AS CUST_NAME FROM <viewname> CAST (column_name AS datatype (precision))

  24. DW Index of Temporary Table cust1 Owned by Cust Schema View INDEX Filter: Where CLAIM_DATE is in that quarter Temporary Table TABLE Summary Table Massaging For Customer Cust1 and Quarter Q1 Analyzing

  25. cust Old Sub Partition View INDEX Old Sub Partition TABLE ALTER TABLE … EXCHANGE SUBPARTITION subpartname WITH TEMPTABLE INCLUDING INDEXES DW

  26. Temp Table CREATE TABLE T1_Y03Q1_CUST1 TABLESPACE Y03Q1_CUST1_DATA PARALLEL 8 NOLOGGING AS SELECT … FROM CUST1.VIEW1@DB1 WHERE CLAIM_DATE >= add_months(trunc(to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')-1)) and batch_date < last_day(add_months(trunc( to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')) - 1 )) + 1

  27. Script CREATE TABLE T1_Y&&YY.Q&&Q._&&CUST TABLESPACE Y&&YY.Q&&Q._&&CUST._DATA PARALLEL 8 NOLOGGING AS SELECT … FROM &&CUST..VIEW1@&&DBLINK WHERE CLAIM_DATE >= ADD_MONTHS(TRUNC(TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')-1)) AND BATCH_DATE < LAST_DAY(ADD_MONTHS(TRUNC( TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')) -1 )) + 1

  28. External Table Reason Source is a non-Oracle DB, e.g. DB2 Source is External, no DB Link Allowed Fixed Format –vs- Delimited Fixed Format Faster Easier More Space Delimited Less Space Slower Slightly More Complex

  29. Massaging • Removing NOT NULL Constraints • Making Datatypes Consistent • The CAST operation converts NUMBER(m,n) to NUMBER • cast(col1 as number(10,2)) as col1_m • COL1 NUMBER(5,2) • COL1_M NUMBER

  30. Analyzing • Using DBMS_STATS.GATHER_TABLE_STATS • PARALLEL Degree dbms_stats.gather_table_stats ( ownname => ‘DWOWNER', tabname => '&&TABNAME', estimate_percent => dbms_stats.auto_sample_size, method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO', degree => dbms_stats.default_degree, cascade => TRUE );

  31. Mat Views MVs Created as Tables CREATE TABLE MV_SUMMTAB1 Storage clauses just like the underlying table CREATE MATERIALIZED VIEW MV_SUMMTAB1 ON PREBUILT TABLE AS SELECT …… http://www.proligence.com/painless_alter.pdf

  32. Query Rewrite Table SUM_CLAIMS PROVIDER_ID, STATE, TYPE, TOT_AMT Table MV_SUM_CLAIMS PROVIDER_ID, STATE, SUM(TOT_AMT) TOT_AMT GROUP BY PROVIDER_ID, STATE SELECT SUM(TOT_AMT) FROM SUM_CLAIMS SELECT SUM(TOT_AMT) FROM MV_SUM_CLAIMS

  33. Query Rewrite Init.ora Parameters query_rewrite_enabled='TRUE' query_rewrite_integrity='STALE_TOLERATED‘ ENFORCED – Rewrite only if guaranteed TRUSTED – Uses only if RELY STALE_TOLERATED – Even if not RELY

  34. Checking QR dbms_mview.explain_rewrite ( ‘select cust_name, count(*) from summtab1 group by cust_name’ ); select message from rewrite_table; QSM-01033: query rewritten with materialized view, MV_SUMMTAB1 QSM-01101: rollup(s) took place on mv, MV_SUMMTAB1

  35. Design … MV_* subpartitions are on the same tablespace as the parents. Subparts of MV_SUMMTAB1_0? are in the same TS as SUMMTAB1 Subparts of MV_SUMMTAB2_0? in SUMMTAB2

  36. Quarter Customer name TableSpace1 TableSpace2 MV2 MV1 PARENT

  37. MV and Parents • Partition Pruning • Partition-wise Joins • Partition Independence

  38. Adding Quarters/Customers • Partition • Default Partition – VALUES LESS THAN (MAXVALUE) • Subpartition • Default Subpartition – VALUES (DEFAULT)

  39. Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 DEF

  40. Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 DEF

  41. Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 Cust4 DEF alter table … split subpartition

  42. Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 DEF

  43. Qtr1 Qtr2 Qtr3 Qtr4 DEF Cust1 Cust2 Cust3 DEF alter table … split partition

  44. Backup/Restore • Backup • ALTER TABLESPACE <TSName> READ ONLY • Copy the files to tape/CD. • Restore • Copy the file back into the directory • ALTER TABLESPACE <TSName> RECOVER

  45. SP1 SP2 SP3 SP4 Table SP1 SP2 SP3 Table SP1 SP2 SP3 Table Archival/Purge Table4 SP4 Table4 Table4

  46. Archival/Purge CREATE TABLE S1_Y<yy>Q<q>_<CustName> TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> AS SELECT * FROM SUMMTAB1 WHERE 1=2 / CREATE INDEXES, CONSTRAINTS, etc. / ALTER TABLE SUMMTAB1 EXCHANGE SUBPARTITION Y<yy>Q<q>_<CustName> WITH TABLE Y<yy>Q<q>_<CustName> INCLUDING INDEXES /

  47. Check TTS ALTER TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> READ ONLY; DBMS_TTS.TRANSPORT_SET_CHECK ( <DataTS>,<IndexTS>) ; SELECT * FROM TRANSPORT_SET_VIOLATIONS;

  48. Transport TS Export Parameter File TRANSPORT_TABLESPACE=y TTS_FULLCHECK=Y FILE=‘<FileLocation>/exp<TS>.dmp’ TABLESPACES=(<DataTS>, <IndexTS>) Copy the exp.dmp and Datafiles to tape/CD.

  49. Purge Drop Subpartition Drop the Tablespace DROP TABLESPACE <TSName> INCLUDING CONTENTS AND DATAFILES;

  50. Restore • ALTER TABLE SPLIT SUBPARTITION <DefaultSP> • Copy Datafiles & Export Dump Files from CD/Tape • Import Parameter File TRANSPORT_TABLESPACES=Y TABLESPACES=(<DataTS>,<IndexTS>) DATAFILES=(…)

More Related