150 likes | 334 Views
DW Lab # 4 Overview of Extraction, Transformation, and Loading. By Lecturer/ Aisha Dawood. LAB EXERCISE #4 Oracle Data Warehousing. Transformation Flow
E N D
DW Lab # 4Overview of Extraction, Transformation, and Loading By Lecturer/ Aisha Dawood
LAB EXERCISE #4 Oracle Data Warehousing Transformation Flow From an architectural perspective, you can transform your data in two ways: ■ Multistage Data Transformation ■ Pipelined Data Transformation
LAB EXERCISE #4 Oracle Data Warehousing Multistage Data Transformation The data transformation logic for most data warehouses consists of multiple steps. For example, in transforming new records to be inserted into a sales table, there may be separate logical transformation steps to validate each dimension key.
LAB EXERCISE #4 Oracle Data Warehousing Pipelined Data Transformation
LAB EXERCISE #4 Oracle Data Warehousing Loading Mechanisms You can use the following mechanisms for loading a data warehouse: ■ Loading a Data Warehouse with SQL*Loader ■ Loading a Data Warehouse with External Tables ■ Loading a Data Warehouse with OCI and Direct-Path APIs ■ Loading a Data Warehouse with Export/Import
LAB EXERCISE #4 Oracle Data Warehousing ■ Loading a Data Warehouse with SQL*Loader
LAB EXERCISE #4 Oracle Data Warehousing Transformation Mechanisms You have the following choices for transforming data inside the database: ■ Transforming Data Using SQL ■ Transforming Data Using PL/SQL ■ Transforming Data Using Table Functions Transforming Data Using SQL Once data is loaded into the database, data transformations can be executed using SQL operations. There are four basic techniques for implementing SQL data transformations: ■ CREATE TABLE ... AS SELECT And INSERT /*+APPEND*/ AS SELECT (Data substitution) ■ Transforming Data Using UPDATE (Data substitution) ■ Transforming Data Using MERGE ■ Transforming Data Using Multitable INSERT
LAB EXERCISE #4 Oracle Data Warehousing CREATE TABLE ... AS SELECT And INSERT /*+APPEND*/ AS SELECT The CREATE TABLE ... AS SELECT statement (CTAS) is a powerful tool for efficiently executing a SQL query and storing the results of that query in a new database table. The INSERT /*+APPEND*/ ... AS SELECT statement offers the same capabilities with existing database tables. The following SQL statement inserts data from sales_activity_direct into the sales table of the sample schema, using a SQL function to truncate the sales date values to the midnight time and assigning a fixed channel ID of 3. INSERT /*+ APPEND NOLOGGING PARALLEL */ INTO sales SELECT product_id, customer_id, TRUNC(sales_date), 3, promotion_id, quantity, amount FROM sales_activity_direct; Note: receiving data from multiple source systems for your data warehouse.
LAB EXERCISE #4 Oracle Data Warehousing Transforming Data Using UPDATE Another technique for implementing a data substitution is to use an UPDATE statement to modify the sales.channel_idcolumn. An UPDATE will provide the correct result.
LAB EXERCISE #4 Oracle Data Warehousing Transforming Data Using MERGE Oracle Database's merge functionality extends SQL, by introducing the SQL keyword MERGE, in order to provide the ability to update or insert a row conditionally into a table or out of line single table views. Example: assume that new data for the dimension table products is propagated to the data warehouse and has to be either inserted or updated. The table products_delta has the same structure as products. Merge Operation Using SQL
LAB EXERCISE #4 Oracle Data Warehousing Transforming Data Using Multitable INSERT Many times, external data sources have to be segregated based on logical attributes for insertion into different target objects. It offers the benefits of the INSERT ... SELECT statement when multiple tables are involved as targets.
LAB EXERCISE #4 Oracle Data Warehousing Example (Unconditional Insert) The following statement aggregates the transactional sales information, stored in sales_activity_direct, on a daily basis and inserts into both the sales and the costs fact table for the current day. INSERT ALL INTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day) INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price) SELECT TRUNC (s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_price FROM sales_activity_direct s, products p WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE) GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;
LAB EXERCISE #4 Oracle Data Warehousing Example (Conditional ALL Insert) The following statement inserts a row into the sales and costs tables for all sales transactions with a valid promotion and stores the information about multiple identical orders of a customer in a separate table cum_sales_activity. It is possible two rows will be inserted for some sales transactions, and none for others. INSERT ALL WHEN promotion_id IN (SELECT promo_id FROM promotions) THEN INTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day) INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price) WHEN num_of_orders > 1 THEN INTO cum_sales_activity VALUES (today, product_id, customer_id, promotion_id, quantity_per_day, amount_per_day, num_of_orders) SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, COUNT(*) num_of_orders, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_price FROM sales_activity_direct s, products p WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE) GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;
LAB EXERCISE #4 Oracle Data Warehousing Conditional FIRST Insert The following statement inserts into an appropriate shipping manifest according to the total quantity and the weight of a product order. An exception is made for high value orders, which are also sent by express, unless their weight classification is not too high. All incorrect orders, in this simple example represented as orders without a quantity, are stored in a separate table. It assumes the existence of appropriate tables large_freight_shipping, express_shipping, default_shipping, and incorrect_sales_order. INSERT FIRST WHEN (sum_quantity_sold > 10 AND prod_weight_class < 5) AND sum_quantity_sold >=1) OR (sum_quantity_sold > 5 AND prod_weight_class > 5) THEN INTO large_freight_shipping VALUES (time_id, cust_id, prod_id, prod_weight_class, sum_quantity_sold) WHEN sum_amount_sold > 1000 AND sum_quantity_sold >=1 THEN INTO express_shipping VALUES (time_id, cust_id, prod_id, prod_weight_class, sum_amount_sold, sum_quantity_sold) WHEN (sum_quantity_sold >=1) THEN INTO default_shipping VALUES (time_id, cust_id, prod_id, sum_quantity_sold) ELSE INTO incorrect_sales_order VALUES (time_id, cust_id, prod_id) SELECT s.time_id, s.cust_id, s.prod_id, p.prod_weight_class, SUM(amount_sold) AS sum_amount_sold, SUM(quantity_sold) AS sum_quantity_sold FROM sales s, products p WHERE s.prod_id = p.prod_id AND s.time_id = TRUNC(SYSDATE) GROUP BY s.time_id, s.cust_id, s.prod_id, p.prod_weight_class;
LAB EXERCISE #4 Oracle Data Warehousing Example (Mixed Conditional and Unconditional Insert) The following example inserts new customers into the customers table and stores all new customers with cust_credit_limit higher then 4500 in an additional, separate table for further promotions. INSERT FIRST WHEN cust_credit_limit >= 4500 THEN INTO customers INTO customers_special VALUES (cust_id, cust_credit_limit) ELSE INTO customers SELECT * FROM customers_new;