590 likes | 732 Views
“You Can Do It” Datawarehouse: Beginner to Advanced In Two Hours. by Arup Nanda Manager – Database Systems Starwood Hotels & Resorts International White Plains, NY. Objectives. Exploring DW Techniques in Oracle Case Study Oracle 10 G Additions. DB1. DB2. Cust1. Cust11. ?. DB6.
E N D
“You Can Do It”Datawarehouse:Beginner to AdvancedIn Two Hours byArup NandaManager – Database Systems Starwood Hotels & Resorts International White Plains, NY
Objectives • Exploring DW Techniques in Oracle • Case Study • Oracle 10G Additions
DB1 DB2 Cust1 Cust11 ? DB6 Cust10 DB3 Cust2 Cust9 Cust3 Cust8 Datawarehouse Cust4 Cust7 DB4 DB5 Cust5 Cust6
A Real Life Case • Claims Datawarehouse • Several Customers/Sources • Several Quarters • Data Volume Was High • Irregular Frequency • Data Comes Often Late • Near Real Time Requirements
Problem of Irregular Data Detail Table Summary Table Detail Table DBMS_MVIEW.REFRESH (…) CUST2
Problems • Incoming Data Irregular • Summary Tables Need Refreshing • Quarters Added Continuously • Archival Requirements Vary Across Customers • Quick Retrieval of Archival Needed
Problems contd. • Summary on Summary Tables as Materialized Views • Need Refresh Whenever New Data Arrives • Or When Data is Purged/Reinstated • Customers Added and Deleted Frequently
Objective • To Minimize Downtime for Refreshes • Incrementally Refresh • Partitioning Techniques • To Add Customers Easily • To Add Quarters Easily • To Archive Off and Purge Easily and Atomically • To Restore Archives Quickly
Objective contd. • To have an ETL Setup for Easy Addition of Objects Such As Tables, Indexes, Mat Views. • Use Only Available Oracle and Unix Tools • PL/SQL • Unix Shell Scripts • SQL*Plus
Design • Varying Dimensions – • Customer • Quarter • Composite Partitioning • Range (for Quarters) • List (for Customers) • Local Indexes
Partitioning • Partitioned on CLAIM_DATE • RANGE • Partitioned named YyyQq • Storage Clauses Not Defined • Supartitioned on CUST_NAME • LIST • Named YyyQq_CustName, e.g. Y03Q3_CUST1
Indexing • All Indexes Local CREATE INDEX IN_CLAIM_SUM_01 LOCAL ON SUMTAB1 (COL1, COL2)… • No Indexes UNIQUE and GLOBAL
Storage Each Subpartition – of Index or Table is kept in separate tablespaces named in the format Y<Year>Q<Qtr>_<CustName>_DATA e.g.Y02Q2_CUST1_DATA Y02Q2_CUST2_DATA Y03Q3_CUST1_DATA
Index Cust3 Y03 Q3 In Tablespace Y03Q3_CUST3_INDX Table Cust3 Y03 Q3 Customers Quarter In Tablespace Y03Q3_CUST3_DATA
Tablespace create tablespace y03q3_cust1_data datafile ‘/oradata/y03q3_cust1_data_01.dbf’ size 500m autoextend on next 500m extent management local segment space management auto
Table DDL CREATE TABLE TAB1 ( … ) PARTITION BY RANGE (CLAIM_DATE) SUBPARTITION BY LIST (CUST_NAME) ( PARTITION Y03Q1 VALUES LESS THAN (TO_DATE(‘2003/04/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q1_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q1_CUST1_DATA, SUBPARTITION Y03Q1_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q1_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), PARTITION Y03Q2 VALUES LESS THAN (TO_DATE(‘2003/07/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q2_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q2_CUST1_DATA, SUBPARTITION Y03Q2_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q2_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF VALUES LESS THAN (MAXVALUE), ( SUBPARTITION DEF_CUST1 VALUES (‘CUST1’) TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 VALUES (‘CUST2’) TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ) )
Index DDL CREATE INDEX IN_TAB1_01 ON TAB1 (COL1) LOCAL NOLOGGING ( PARTITION Y03Q1 ( SUBPARTITION Y03Q1_CUST1 TABLESPACE Y03Q1_CUST1_INDX, SUBPARTITION Y03Q1_CUST2 TABLESPACE Y03Q1_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF TABLESPACE USER_DATA ), PARTITION Y03Q2 ( SUBPARTITION Y03Q2_CUST1 TABLESPACE Y03Q2_CUST1_INDX, SUBPARTITION Y03Q2_CUST2 TABLESPACE Y03Q2_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF ( SUBPARTITION DEF_CUST1 TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF TABLESPACE USER_DATA ) )
Creating DDLs Static Part create table tab1 (………) DDL to Create Table partition y03q1 ( subpartition y03q1_cust1 tablespace …) Variable Part
Constraints Constraints defined as DISABLE NOVALIDATE RELY ALTER TABLE … ADD CONSTRAINT … RELY DISABLE NOVALIDATE;
Constraint • VALIDATE/NOVALIDATE • Table TAB1 (Column: STATUS) • Current Values A, I, F • Check Constraint: STATUS IN (‘A’,’I’) • ENABLE/DISABLE • New Value ‘F’ • RELY
RELY Reasons • To Include Relation Information to the Metadata • To Enable Query Rewrite
Summary Tab and View On Source On DW
Casting SELECT CAST (CUST_NAME AS VARCHAR2(20)) AS CUST_NAME FROM <viewname> CAST (column_name AS datatype (precision))
DW Index of Temporary Table cust1 Owned by Cust Schema View INDEX Filter: Where CLAIM_DATE is in that quarter Temporary Table TABLE Summary Table Massaging For Customer Cust1 and Quarter Q1 Analyzing
cust Old Sub Partition View INDEX Old Sub Partition TABLE ALTER TABLE … EXCHANGE SUBPARTITION subpartname WITH TEMPTABLE INCLUDING INDEXES DW
Technique • Not Using DBMS_MVIEW.REFRESH • MV is always STALE
Temp Table CREATE TABLE T1_Y03Q1_CUST1 TABLESPACE Y03Q1_CUST1_DATA PARALLEL 8 NOLOGGING AS SELECT … FROM CUST1.VIEW1@DB1 WHERE CLAIM_DATE >= add_months(trunc(to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')-1)) and batch_date < last_day(add_months(trunc( to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')) - 1 )) + 1
Script CREATE TABLE T1_Y&&YY.Q&&Q._&&CUST TABLESPACE Y&&YY.Q&&Q._&&CUST._DATA PARALLEL 8 NOLOGGING AS SELECT … FROM &&CUST..VIEW1@&&DBLINK WHERE CLAIM_DATE >= ADD_MONTHS(TRUNC(TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')-1)) AND BATCH_DATE < LAST_DAY(ADD_MONTHS(TRUNC( TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')) -1 )) + 1
External Table Reason Source is a non-Oracle DB, e.g. DB2 Source is External, no DB Link Allowed Fixed Format –vs- Delimited Fixed Format Faster, Easier More Space Delimited Less Space Slower, Slightly More Complex
Massaging • Removing NOT NULL Constraints • Making Datatypes Consistent • The CAST operation converts NUMBER(m,n) to NUMBER • cast(col1 as number(10,2)) as col1_m • COL1 NUMBER(5,2) • COL1_M NUMBER
Analyzing • Using DBMS_STATS.GATHER_TABLE_STATS • PARALLEL Degree dbms_stats.gather_table_stats ( ownname => ‘DWOWNER', tabname => '&&TABNAME', estimate_percent => dbms_stats.auto_sample_size, method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO', degree => dbms_stats.default_degree, cascade => TRUE );
Mat Views MVs Created as Tables CREATE TABLE MV_SUMMTAB1 Storage clauses just like the underlying table CREATE MATERIALIZED VIEW MV_SUMMTAB1 ON PREBUILT TABLE AS SELECT …… http://www.proligence.com/painless_alter.pdf
Query Rewrite Table SUM_CLAIMS PROVIDER_ID, STATE, TYPE, TOT_AMT Table MV_SUM_CLAIMS PROVIDER_ID, STATE, SUM(TOT_AMT) TOT_AMT GROUP BY PROVIDER_ID, STATE SELECT SUM(TOT_AMT) FROM SUM_CLAIMS SELECT SUM(TOT_AMT) FROM MV_SUM_CLAIMS
Query Rewrite Init.ora Parameters query_rewrite_enabled='TRUE' query_rewrite_integrity='STALE_TOLERATED‘ ENFORCED – Rewrite only if guaranteed TRUSTED – Uses only if RELY STALE_TOLERATED – Even if not RELY
Checking QR dbms_mview.explain_rewrite ( ‘select cust_name, count(*) from summtab1 group by cust_name’ ); select message from rewrite_table; QSM-01033: query rewritten with materialized view, MV_SUMMTAB1 QSM-01101: rollup(s) took place on mv, MV_SUMMTAB1
Design … MV_* subpartitions are on the same tablespace as the parents. Subparts of MV_SUMMTAB1_0? are in the same TS as SUMMTAB1 Subparts of MV_SUMMTAB2_0? in SUMMTAB2
Quarter Customer name TableSpace1 TableSpace2 MV2 MV1 PARENT
MV and Parents • Partition Pruning • Partition-wise Joins • Partition Independence
Adding Quarters/Customers • Partition • Default Partition – VALUES LESS THAN (MAXVALUE) • Subpartition • Default Subpartition – VALUES (DEFAULT)
Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 DEF
Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 DEF
Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 Cust4 DEF alter table … split subpartition
Qtr1 Qtr2 Qtr3 DEF Cust1 Cust2 Cust3 DEF
Qtr1 Qtr2 Qtr3 Qtr4 DEF Cust1 Cust2 Cust3 DEF alter table … split partition
Backup/Restore • Backup • ALTER TABLESPACE <TSName> READ ONLY • Copy the files to tape/CD. • Restore • Copy the file back into the directory • ALTER TABLESPACE <TSName> RECOVER
SP1 SP2 SP3 SP4 Table SP1 SP2 SP3 Table SP1 SP2 SP3 Table Archival/Purge Table4 SP4 Table4 Table4
Archival/Purge CREATE TABLE S1_Y<yy>Q<q>_<CustName> TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> AS SELECT * FROM SUMMTAB1 WHERE 1=2 / CREATE INDEXES, CONSTRAINTS, etc. / ALTER TABLE SUMMTAB1 EXCHANGE SUBPARTITION Y<yy>Q<q>_<CustName> WITH TABLE Y<yy>Q<q>_<CustName> INCLUDING INDEXES /
Check TTS ALTER TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> READ ONLY; DBMS_TTS.TRANSPORT_SET_CHECK ( <DataTS>,<IndexTS>) ; SELECT * FROM TRANSPORT_SET_VIOLATIONS;
Transport TS Export Parameter File TRANSPORT_TABLESPACE=y TTS_FULLCHECK=Y FILE=‘<FileLocation>/exp<TS>.dmp’ TABLESPACES=(<DataTS>, <IndexTS>) Copy the exp.dmp and Datafiles to tape/CD.
Purge Drop Subpartition Drop the Tablespace DROP TABLESPACE <TSName> INCLUDING CONTENTS AND DATAFILES;