260 likes | 273 Views
Understand data transformation methods in ETL processes using SQL, PL/SQL, and table functions for efficient data manipulation within Oracle database. Learn about CTAS, UPDATE, MERGE, and multistage data transformations.
E N D
Transformation Extraction, Transformation,and Loading (ETL)
Objectives • After completing this lesson, you should be able to do the following: • Explain transformation using SQL • CREATE TABLE AS SELECT (CTAS) • UPDATE • MERGE • Multitable INSERT • Describe transformation using table functions and PL/SQL • Implement DML Error Logging
Data Transformation • Data transformations are often the most complex and costly part of the ETL process. • Transformations can range from simple data conversion to complex scrubbing techniques. • Many, if not all, transformations can occur within the Oracle database. • External transformations outside the database are supported (flat files, for instance). • Data can be transformed in two ways: • Multistage data transformation • Pipelined data transformation
NEW_SALES_STEP1 NEW_SALES_STEP2 SALES NEW_SALES_STEP3 Multistage Data Transformation Load into staging table. Flat files Validate customer keys (look up in customer dimension table). Convert source product keys to warehouse product keys. Insert into SALES warehouse table.
External table SALES Pipelined Data Transformation Validate customer keys (look up in customer dimension table). Flat files Convert source product keys to warehouse product keys. Insert into SALES warehouse table.
Transformation Mechanisms • You have the following choices for transforming data inside the database: • SQL • PL/SQL • Table functions
Transformation Using SQL • After data is loaded into the database, transformations can be executed using SQL operations: • CREATE TABLE ... AS SELECT • INSERT /*+APPEND*/ AS SELECT • UPDATE • MERGE • Multitable INSERT
CREATE TABLE … AS SELECTand UPDATE DESC sales_activity_direct Name Null? Type ------------ ----- --------- SALES_DATE DATE PRODUCT_ID NUMBER CUSTOMER_ID NUMBER PROMOTION_ID NUMBER AMOUNT NUMBER QUANTITY NUMBER INSERT /*+ APPEND NOLOGGING PARALLEL(sales) */ INTO sales as SELECT product_id, customer_id, TRUNC(sales_date), 3, promotion_id, quantity, amount FROM sales_activity_direct;
MERGE Statement: Overview • MERGE statements provide the ability to conditionally UPDATE/INSERT into the database. • The DELETE clause for UPDATE branch is provided for implicit data maintenance. • Conditions are specified in the ON clause. • MERGE statements do an UPDATE if the row exists and an INSERT if it is a new row. • Using MERGE statements avoids multiple updates. • MERGE statements use a single SQL statement to complete an UPDATE or INSERT or both.
Data Warehousing MERGE: Example MERGE INTO customer C USING cust_src S ON (c.customer_id = s.src_customer_id) WHEN MATCHED THEN UPDATE SET c.cust_address = s.cust_address WHEN NOT MATCHED THEN INSERT ( Customer_id, cust_first_name,…) VALUES (src_customer_id,src_first_name,…);
Data Maintenance with MERGE/DELETE MERGE USING product_changes S INTO products D ON (d.prod_id = s.prod_id) WHEN MATCHED THEN UPDATE SET d.prod_list_price = s.prod_new_price, d.prod_status = s.prod_newstatus DELETE WHERE (d.prod_status = "OBSOLETE") WHEN NOT MATCHED THEN INSERT (prod_id, prod_list_price, prod_status) VALUES (s.prod_id, s.prod_new_price, s.prod_new_status);
Overview of Multitable INSERT Statements • Allows the INSERT … AS SELECT statement to insert rows into multiple tables as part of a single DML statement • Can be used in data warehousing systems to transfer data from one or more operational sources to a set of target tables • Types of multitable INSERT statements: • Unconditional INSERT • Pivoting INSERT • Conditional ALL INSERT • Conditional FIRST INSERT
Example of Unconditional INSERT INSERT ALL INTO sales VALUES (product_id, customer_id, today, 3, promotion_id, quantity_per_day, amount_per_day) INTO costs VALUES (product_id, today, promotion_id, 3, product_cost, product_price) SELECT TRUNC(s.sales_date) AS today, s.product_id, s.customer_id, s.promotion_id, SUM(s.amount) AS amount_per_day, SUM(s.quantity) quantity_per_day, p.prod_min_price*0.8 AS product_cost, p.prod_list_price AS product_price FROM sales_activity_direct s, products p WHERE s.product_id = p.prod_id AND TRUNC(sales_date) = TRUNC(SYSDATE) GROUP BY TRUNC(sales_date), s.product_id, s.customer_id, s.promotion_id, p.prod_min_price*0.8, p.prod_list_price;
Example of Conditional ALL INSERT INSERT ALL WHEN product_id IN (SELECT product_id FROMpromotional_items)THEN INTO promotional_sales VALUES(product_id,list_price) WHEN order_mode = 'online' THEN INTO web_orders VALUES(product_id, order_total) SELECT product_id, list_price,order_total,order_mode FROM orders;
Example of Pivoting INSERT INSERT ALL INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales (product_id, customer_id, week, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed) ... SELECT product_id, customer_id, weekly_start_date, sales_sun, sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat FROM sales_summary;
Example of Conditional FIRST INSERT INSERT FIRST WHEN order_total> 10000 THEN INTO priority_handling VALUES (id) WHEN order_total > 5000 THEN INTO special_handling VALUES (id) WHEN total > 3000 THEN INTO privilege_handling VALUES (id) ELSE INTO regular_handling VALUES (id) SELECT order_total , order_id id FROM orders ;
Overview of Table Functions • A table function is a function that can produce a set of rows as output. • Input can be a set of rows. • Table functions support pipelined and parallel execution of transformations using: • PL/SQL • C programming language • Java • Table functions are used in the FROM clause of a SELECT statement.
Creating Table Functions CREATE OR REPLACE FUNCTION transform(p ref_cur_type) RETURNtable_order_items_type PIPELINED PARALLEL_ENABLE( PARTITION p BY ANY) IS BEGIN FOR rec IN p LOOP … -- Transform this record PIPE ROW (rec); END LOOP; RETURN; END;
Using Table Functions SELECT * FROM TABLE(transform(cursor(SELECT * FROM order_items_ext))); INSERT /*+ APPEND, PARALLEL(order_items) */ INTO order_items SELECT * FROM TABLE(transform(cursor(SELECT * FROM order_items_ext)));
DML Error Logging: Overview • Aborting long-running bulk DML operations is wasteful of time and system resources. • DML Error Logging allows bulk DML operations to continue processing after a DML error occurs. • Errors are logged to an error-logging table. • DML Error Logging combines the power and speed of bulk processing with the functionality of row-level error handling.
DML Error Logging Concepts • An error-logging table is created and associated with an existing table by using the DBMS_ERRLOG package. • The LOG ERRORS INTO...clause is added to the bulk DML load statement providing: • Error-logging table name • Statement tag • Reject limit EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses', 'errlog'); INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct > .2 LOG ERRORS INTO errlog ('my_bad') REJECT LIMIT 10;
Error-Logging Table • The error-logging table logically consists of two parts: • Fixed: Columns that describe the error • Variable: Columns that contain data values corresponding to the row in error • Column names for the variable part determine the DML table columns logged when an error occurs. • The columns of the DML table to be logged is the intersection of the column names of the DML table and the error-logging table.
Column name • Data type • Information • ORA_ERR_NUMBER$ • varchar2(10) • Oracle error number • ORA_ERR_MESG$ • varchar2(4000) • Oracle error message text • ORA_ERR_ROWID$ • row ID • Row ID of row in error (UPDATE/DELETE) • ORA_ERR_OPTYP$ • varchar2(2) • Type of operation: (I)nsert, (U)pdate, (D)elete • ORA_ERR_TAG$ • varchar2(4000) • User-supplied tag value via DML statement Error-Logging Table Format • Fixed Portion: Error information columns
Inserting into a Table with Error Logging • Example: CREATE TABLE bonuses (emp_id NUMBER, sal NUMBER CHECK(sal > 8000)); EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('bonuses','errlog'); INSERT INTO bonuses SELECT employee_id, salary*1.1 FROM employees WHERE commission_pct > .2 LOG ERRORS INTO errlog (’my_bad’) REJECT LIMIT 10; 1 2 3
Summary • In this lesson, you should have learned how to: • Explain transformation using SQL • CREATE TABLE AS SELECT (CTAS) • UPDATE • MERGE • Multitable INSERT • Describe transformation using table functions • Implement DML Error Logging
Practice 6: Overview • This practice covers the following topics: • Loading data using multitable inserts • Loading data using multitable conditional insert • Loading data using MERGE • Logging DML load errors