650 likes | 805 Views
More Relational Dimensional Modeling. Objectives. After completing this lesson, you should be able to: Explain how OWB handles incremental data refreshes Detect and process errors when loading dimensional data into hierarchies (orphan management) Use cube-organized materialized views
E N D
Objectives • After completing this lesson, you should be able to: • Explain how OWB handles incremental data refreshes • Detect and process errors when loading dimensional data into hierarchies (orphan management) • Use cube-organized materialized views • Explain how OWB handles slowly changing dimensions • Define a Type 2 slowly changing dimension
Lesson Agenda • Managing incremental data refreshes • Detecting and processing errors when loading dimensional data into hierarchies (orphan management) • Using cube-organized materialized views • Managing slowly changing dimensions
Initial Versus Incremental Loads • Initial loads: • Trial sample • Usability sample • Historical data load • Incremental loads: • Facts • Dimensions
Two Categories of DW Updating: Data and Metadata Updating metadata structures Refreshing the fact data PRODUCTSdimension CHANNELSdimension Add new data Purge old data SALESfact Our focus in this lesson CUSTOMERSdimension TIMESdimension PROMOTIONSdimension
Preserving History WhileUpdating Fact Data Fact table Removing oldestmonth’s data Maintainingthree years ofhistoric data Appending newmonth’s data toend of table
Two Ways to Refresh the Fact Table • Loading through a direct connection to the operational OLTP system—this method: • Has availability issues • Involves a time-stamp comparison operation to differentiate new data • Loading offline from a flat file containingpre-extracted new data only
Capturing Changed Data for Refresh • Capture new fact data. • Capture changed dimension data. • Determine the method for the capture of each. • Methods: • Wholesale data replacement • Comparison of database instances • Time stamping • Database triggers • Database log • Hybrid techniques
Operational databases T1 T2 T3 Wholesale Data Replacement • Expensive, if a large reload • Limited historical data, if most old data is dropped • Time period replacement
Comparison of Database Instances • Simple to perform, but expensive in timeand processing • Delta file: • Changes to operational data since last refresh • Used by various techniques Yesterday’s operational database Database comparison Today’s operational database Delta file holds changed data.
Time and Date Stamping • Fast scanning for records that has been changed since last extraction • Date Updated field • No detection of deleted data Operational data Delta file holds changed data.
Database Triggers • Changed data intersected at the server level • Extra I/O required • Maintenance overhead Operationalserver(DBMS) Operational data Delta file holds changed data. Trigger Trigger Trigger Triggers on server
Using a Database Log • Contains before and after images • Requires system checkpoint • Common technique Operationalserver(DBMS) Operational data Log analysisanddata extraction Log Delta file holds changed data.
Verdict • Consider each method on merit. • Consider a hybrid approach if one approach is not suitable. • Consider current technical, existing operational, and current application issues.
Applying the Changes to Data • You have a choice of techniques: • Overwrite a record • Add a record • Add a field • Maintain history • Add version numbers
Overwriting a Record • As taught earlier, this is referred to as a type 1 slowly changing dimension. • Implementation is easy. • History is lost. • This technique is not recommended. Customer ID John Doe Single ..................................................................., ...............................................................,.... Customer ID John Doe Married ...................................................................... ......................................................................
Adding a New Record • As taught earlier, this is an example of a type 2 slowly changing dimension. • History is preserved; dimensions grow. • Time constraints are required. • A generalized key is created. • Metadata tracks the use of keys. Eff_from Eff_to Before 1 Customer ID John Doe Single 1-Feb-41 After Eff_from Eff_to 1 Customer ID John Doe Single 1-Feb-41 31-Dec-95 42 Customer ID John Doe Married 1-Jan-96
Adding a Current Field • As taught earlier, this is an example of a type 3 slowly changing dimension. • Some history is maintained. • Intermediate values are lost. • This method is enhanced by adding an Effective Date field. Customer ID John Doe Single Customer ID John Doe Single Married 01-JAN-96
Maintaining History • History tables: • Can maintain a one-to-many relationship between the tables • Always retain the current record • Enable reference to record history consistently HIST_CUST CHANNELS CUSTOMERS SALES PRODUCTS TIMES PROMOTIONS
History Preserved • History enables realistic analysis. • History retains context of data. • History provides for realistic historical analysis. • Model must be able to: • Reflect business changes • Maintain context between fact and dimension data • Retain sufficient data to relate old to new
Dimensions and Cubes Automatically Handle Update Via MERGE • The good news: Dimension and cube operators handle data update operations automatically! • They are set up for doing an update/insert (MERGE) operation by default. • The other news: Incremental update of target relational tables (not associated with dimensions and cubes) requires manipulation of their loading type. • Now we will examine a situation for incremental update of a relational table. • More good news: The useful advice in the following slides on incremental update of relational tables is not well documented elsewhere!
The source and the target have the same primary keys: The source and the target have different primary keys: The target uses a sequence-generated synthetic key: Three Refresh Scenarios for Refreshing Target Tables VENDORPK: Acct_Num SUPPLIERPK: Acct_Num RETIRED EMPLOYEESPK: Emp_ID Emp_Num Emp_Name EMPLOYEESPK: Emp_Num STG_CUSTOMERSPK: Customer_Src_ID PEOPLEPK: Person_WH_IDUK: Person_SRC_ID Effective_From_Date
Target Uses a Sequence-Generated Key • How can you match when the target uses a sequence-generated key? • Match on the unique natural key (Person_SRC_ID, Effective_To_Date). Sequence values unknown STG_CUSTOMERSPK: Customer_SRC_ID PEOPLEPK: Person_WH_IDUK: Person_SRC_ID Effective_From_Date
Setting Attribute Propertiesfor Synthetic Keys The synthetic key with sequence generator has an unknown value. Therefore, specify the natural key for update matching, rather than this synthetic primary key.
Setting Loading Properties • The following attribute settings are useful whenever a surrogate/ synthetic key (for instance, a sequence operator) is used on a target in a map: • Load column when inserting row • Load column when updating row • Match column when updating row • Update: Operation (eight available target conditions are shown on the next slide) • Match column when deleting row
If source value = 5 and target value = 10, then each condition results in the following target values: • Condition • Meaning • Target Value • = • Target = source • Target = 5 • += • Target = source + target • Target = 15 • -= • Target = target - source • Target = 5 • =- • Target = source – target • Target = negative 5 • =|| • Target = target || source • Target = 105 • ||= • Target = source || target • Target = 510 • *= • Target = target * source • Target = 50 • /= • Target = target / source • Target = 2 Update Operation Conditions
Choosing the DML Load Type • INSERT (the default) • UPDATE • INSERT/UPDATE • UPDATE/INSERT • DELETE
Choosing the DML Load Type • NONE • TRUNCATE/INSERT • DELETE/INSERT • CHECK/INSERT
CDC Template Mappings: Another Method for Updating Changed Data 1. Choose a Change Data Capture (CDC) mechanism. • Trigger-based (Oracle, IBM, Microsoft) • Log-based (Oracle and IBM) 2. Choose the table upon which to perform CDC. 3. Start the capture process. 4. Define the subscribers to receive the changed data. 5. Define mappings to consume the changes.
Quiz • Which of the following statements are false? • There are two categories of data warehouse and data mart update tasks: changes to the data inside the dimension and changes to the metadata of the dimension. • The synthetic key with sequence generator has an unknown value; therefore, use this synthetic primary key for update matching, rather than the natural key. • Type 1 SCD is easy to implement, just overwriting a record with changes; however, history is lost.
Lesson Agenda • Managing incremental data refreshes • Detecting and processing errors when loading dimensional data into hierarchies (orphan management) • Using cube-organized materialized views • Managing slowly changing dimensions
The Challenge of Managing Orphans Three items with null parent key values Three items with invalid parent key values
Specify rules for orphan values within a dimension and between the cube and the dimensions. Specify rules to apply during the loading and removal of dimensional data. Specify different actions for records with null parent and invalid parent values. How OWB Manages Orphans Now play the viewleton managing orphans!
Lesson Agenda • How OWB handles incremental data refreshes • Detecting and processing errors when loading dimensional data into hierarchies (orphan management) • Using cube-organized materialized views • Managing slowly changing dimensions
ROLAP Implementation of Dimensional Objects • ROLAP implementation of dimensional objects can be classified as follows: • ROLAP implementation • The dimensional object and its data are stored in a relational form in the database, and the CWM2 metadata for the dimensional object is stored in the OLAP catalog. This enables you to query the dimensional object from OLAP tools. • ROLAP with Cube MVs implementation • The dimensional object and its data are stored in a relational form in the database. Additionally, cube-organized materialized views are created in an analytic workspace.
ROLAP Schematable Synchronize to repository object,data stored inrelational table Implementingtable Dimension Bind ROLAP w/MVs AW • Cube-organizedmaterialized view • Relational fact table • Summaries stored in AW Support for Cube-Organized Materialized Views • OWB 11g Release 2 supports OLAP cube storage in cube-organized materialized views. • It provides out-of-the-box summary management capabilities for facts stored in a relational data warehouse, and summarized in cube-organized materialized views.
Configuring the Cube • We see that: • Enable MV Refresh is true • Query Rewrite is enabled • Constraints are trusted • Refresh Mode is force • Refresh On is Demand Demand means that after the mapping has been executed to load the fact table, a manual refresh must be performed.
ROLAP with cube MVs can be used if: PL/SQL Generation Mode is default and Location is Oracle 11g R1 or higher. Or, PL/SQL Generation Mode is Oracle 11g R1 or higher A relational fact table with cube-organized materialized views for summary management Storage Type: ROLAP with CUBE MVs
Using Compressed Cube Technology with Sparse Dimensions The dimensions have been tagged as sparse. Cube compressionhas been specified for the sparsedimensions. • The partition cube check box applies to MOLAP and ROLAP with MV cubes. • This is Analytic Workspace–related partitioning. • This cube is partitionedby calendar year.
Use SQL Developer to Test Queries with and Without Query Rewrite
Execution Plan Without Query Rewrite In this query without query rewrite, notice that the cost is 4174, and that a full table scan took place.
Execution Plan with Query Rewrite In this query with query rewrite, notice that the cost is reduced from 4174 to 121, and that the summaries are rewritten to the cube MVs.
Lesson Agenda • How OWB handles incremental data refreshes • Detecting and processing errors when loading dimensional data into hierarchies (orphan management) • Using cube-organized materialized views • Managing slowly changing dimensions
What Is a Slowly Changing Dimension? A slowly changing dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse.
Types of Slowly Changing Dimensions • There are three types of slowly changing dimensions. • Type 1 overwrites old values. • Type 2 creates another dimension record. • Type 3 creates a current value field.
Type 1 SCD: Does Not Store History Type 1 overwrites old values. Old record ID Customer ID Customer Name Marital Status 3 Steve 1125 Single New record ID Customer ID Customer Name Marital Status 3 Steve Married 1125