450 likes | 611 Views
An In-depth Look at Active Data Features - Constraints and Triggers – in DB2 LUW. Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com. Agenda. Active Data Features Overview of SQL Processing in DB2 Constraints Triggers Semantic Query Optimization in DB2. Active Data Features.
E N D
An In-depth Look at Active Data Features - Constraints and Triggers – in DB2 LUW Petrus Chan, IBM Toronto Lab, petrus@ca.ibm.com
Agenda • Active Data Features • Overview of SQL Processing in DB2 • Constraints • Triggers • Semantic Query Optimization in DB2
Active Data Features • a mechanism whereby an SQL statement can invoke an action that is not explicitly specified by the SQL statement • Can be used to enable business rules enforcement • Event – Condition – Action Database event • Update, delete, insert Search condition on • Transition values Database or External Action • e.g. alert, reject, fix up, log Event • when • if • then Condition Action
Categories of Active Data Features • Constraints • Declarative rules that ensure the validity of data values • Triggers • Automatic procedural actions that are triggered by update events on a given table • Materialized Query Table (MQT) maintenance • Maintenance of materialized query results stored in a table, which can be used by DB2 Optimizer to significantly improve performance of complex queries.
Access Plan: ----------- Total Cost: 63.9241 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 25 GRPBY ( 2) 63.4613 2 | 829.44 MSJOIN ( 3) 55.1569 2 /---+----\ 144 5.76 TBSCAN FILTER ( 4) ( 7) 23.8489 23.6013 1 1 | | 144 144 SORT TBSCAN ( 5) ( 8) 23.7975 23.6013 1 1 | | 144 144 TBSCAN SORT ( 6) ( 9) 19.4513 23.5498 1 1 | | 144 144 TABLE: PETRUS3 TBSCAN T2 ( 10) Q1 19.4513 1 | 144 TABLE: PETRUS3 T1 Q2 Overview of SQL processing in DB2 Query SELECT t1.c1, sum (t2.c2) FROM T1, T2 WHERE t1.c1 = t2.c1 GROUP BY t1.c1;
Agenda • Active Data Features • Overview of SQL Processing in DB2 • Constraints • Triggers • Semantic Query Optimization in DB2
Constraints in DB2 • Domain constraints – data type, nullability • Key constraints – primary, unique • Check constraints – domain restriction, primitive business rules • Referential integrity constraints – foreign key • Generated columns – derived attributes • Functional dependencies – determined by • Symmetric views – with check option
Primary/Unique Key Constraints • Purpose • Enforcing the primary or candidate key semantics of an attribute • Specification CREATE TABLE PERSON(SSN INT NOT NULL UNIQUE, …); CREATE TABLE PERSON(SSN INT NOT NULL, …, PRIMARY KEY(SSN)); ALTER TABLE PERSON ADD CONSTRAINT ssn_uk UNIQUE(SSN); • Implementation • Enforced by DB2 through the creation of a unique index on the columns of the key. • Semantically enforced as an end of statement constraint, through a mechanism called deferred unique checking. UPDATE T SET C1 = C1+1 can be executed without raising an error.
Check Constraints • Purpose • Can be used to enforce/restrict the domain of an attribute • Can implement simple business rules based on values of other columns of the same data row, e.g., CHECK(ship_date >= order_date) • Specification CREATE TABLE SALES_2004(ITEM_ID CHAR(16), SALE_DATE DATE CHECK(SALE_DATE BETWEEN 01/01/2004 AND 12/31/2004), …); ALTER TABLE EMPLOYEE ADD CONSTRAINT check_sex CHECK( SEX IN (‘M’, ‘F’)); • Implementation • Additional predicate corresponding to the check constraint is compiled into the query graph of any INSERT, UPDATE or DELETE statements modifying a given table • Evaluated using the “after” image for each row, error is raised when check condition is not satisfied
Access Plan: ----------- Total Cost: 13.3724 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 0.04 FILTER ( 2) 13.3724 1 | 1 INSERT ( 3) 13.2904 1 /---+--\ 1 77 TBSCAN TABLE: ADMIN ( 4) EMPLOYEE 0.0048 0 | 1 TABFNC: SYSIBM GENROW Check constraints - example db2 => ALTER TABLE EMPLOYEE ADD CONSTRAINT chk_sex CHECK(sex IN ('M','F')) db2 => INSERT INTO EMPLOYEE VALUES('000070', 'EVA', 'D', 'PULASKI', …) 2) FILTER: (Filter) ... Predicate Text: -------------- (1 = CASE WHEN NOT(Q4.$C0 IN ('M', 'F')) THEN RAISE_ERROR(-545, ‘PETRUS.EMPLOYEE.CHK_SEX.') ELSE 0 END)
Referential Integrity • Purpose • Defines an inclusion relationship between a foreign key (set of columns in the referencing/child table) and a primary key (set of columns in the referenced/parent table) • Simply put, any non-NULL values of the foreign key must be present as a value of the corresponding primary/unique key. • Specification (can be provided during CREATE or ALTER table) • The definition of the referencing table (child table) contains a clause of the form: FOREIGN KEY fk_name(fk_col1, fk_col2, …) REFERENCES parent_table(pk_col1, pk_col2, …) <cascading options> where <cascading options> can be one or more of: • ON DELETE [ NO ACTION | RESTRICT | CASCADE | SET NULL ] • ON UPDATE [ NO ACTION | RESTRICT ]
Referential Integrity (cont’d) • Implementation • UPDATE/INSERT of child table’s foreign key columns: • Additional processing compiled into query graph to: • Look up primary key values in the parent table. • Raise error if not found. • UPDATE/DELETE of parent. • Additional processing compiled into query graph to: • Depends on ON UPDATE and ON DELETE cascading options: • RESTRICT – raise error if old value of the parent columns still has corresponding children • SET NULL – set foreign key columns in child table to NULL • CASCADE – delete the children rows • NO ACTION (default) – similar to RESTRICT, but enforced after other constraints
Access Plan: ----------- Total Cost: 77.8767 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 0.1232 FILTER ( 2) 77.8767 5.08 /--------+--------\ 3.08 1 UPDATE IXSCAN ( 3) ( 5) 75.557 1.97248 5.08 0 /---+--\ | 3.08 77 47 TBSCAN TABLE: ADMIN INDEX: ADMIN ( 4) EMPLOYEE DEPT_PK 34.747 2 | 77 TABLE: ADMIN EMPLOYEE Referential integrity - example db2 => ALTER TABLE EMPLOYEE ADD CONSTRAINT dept_fk FOREIGN KEY (workdept) REFERENCES DEPARTMENT(deptno) ON DELETE SET NULL ON UPDATE RESTRICT db2 => UPDATE EMPLOYEE SET workdept=‘871’ WHERE empno=‘000010’ 2) FILTER: (Filter) ... Predicate Text: -------------- (1 = CASE WHEN (Q4.$C0 IS NOT NULL AND NOT EXISTS(SELECT $RID$ FROM ADMIN.DEPARTMENT AS Q5 WHERE ('871' = Q5.DEPTNO))) THEN RAISE_ERROR(-530, 'ADMIN.EMPLOYEE.DEPT_FK.') ELSE 2 END)
Rows RETURN ( 1) Cost I/O | 3.08 UPDATE ( 2) 90.8877 6.08 /---+--\ 3.08 77 NLJOIN TABLE: ADMIN ( 3) EMPLOYEE 50.005 3 /---------+--------\ 1 3.08 DELETE TBSCAN ( 4) ( 6) 15.2581 34.747 1 2 /---+---\ | 1 47 77 IXSCAN TABLE: ADMIN TABLE: ADMIN ( 5) DEPARTMENT EMPLOYEE 1.97248 0 | 47 INDEX: ADMIN DEPT_PK Referential integrity – 2nd example Cascaded Update db2 => DELETE FROM DEPARTMENT WHERE deptno='E11' 3) NLJOIN: (Nested Loop Join) … Predicate Text: -------------- (Q5.$C0 = Q6.WORKDEPT) Table Scan! Old deptno value • Note the table scan when executing the look up of the secondary table. • Create index on foreign key columns to avoid this
Generated columns • Purpose • Deriving values of a column based on other columns on the same row • Specification (CREATE or ALTER TABLE) CREATE TABLE Persons(first_name VARCHAR(64), last_name VARCHAR(64), last_name_upper GENERATED ALWAYS AS UPPER(last_name)); • Implementation • The SQL compiler will substitute the column value or the DEFAULT keyword with the generated column expression whenever necessary
Generated columns - Example • Can be used to implement “index on expression”: • create index idx1 on table_name (expr(col1)) NOT ALLOWED • Instead: • create generated column col2 as expr(col1) • create index on generated column c2 • A typical MDC example: CREATE TABLE ORDERS( O_ORDKEY INT NOT NULL, O_CUSTKEY INT NOT NULL, O_PRICE FLOAT NOT NULL, O_ORDERDATE DATE NOT NULL, OG_ORDERMONTH GENERATED ALWAYS AS (INTEGER(O_ORDERDATE)/100), PRIMARY KEY o_key (O_ORDERKEY)) ORGANIZE BY ( OG_ORDERMONTH ); • The default value for the generated column will be replaced by the generated column expression. INSERT INTO ORDERS VALUES(151, 32, 16.97, CURRENT DATE, DEFAULT); ---> INSERT INTO ORDERS VALUES(151, 32, 16.97, CURRENT DATE, INTEGER(CURRENT DATE)/100);
Functional dependencies • Purpose • To capture the functional dependency between columns of the same table, other than the implicit dependency on the key • Can occur due to denormalization • Typical example: attribute_code attribute_description • Specification CREATE TABLE PRODUCT(prod_key CHAR(15) NOT NULL, smu_code VARCHAR(6) NOT NULL, smu_desc VARCHAR(35), ..., prod_desc VARCHAR(30), CONSTRAINT FD1 CHECK (smu_desc DETERMINED BY smu_code) NOT ENFORCED ENABLE QUERY OPTIMIZATION; ALTER TABLE product ADD CHECK ( prod_desc DETERMINED BY prod_key) NOT ENFORCED; • Implementation • Purely informational, currently cannot be created as enforced • Currently exploited during MQT matching, possible other uses in the future
Symmetric views – Views with CHECK OPTION • Purpose • Often, views are created to control user’s access to base table. • Regular views do not restrict modifications that do not conform to view definition. Example: • Create view view1 as (SELECT * FROM EMPLOYEE WHERE WORKDEPT=‘E21’) • User can insert into view1 with workdept = E22 • But subsequent select * from view1 cannot see the row. • View with CHECK OPTION (or Symmetric View) can be used to restrict the modification. • Specification CREATE VIEW DEPT_E21_EMPLOYEES AS (SELECT * FROM EMPLOYEE WHERE WORKDEPT=‘E21’) WITH CHECK OPTION; • Implementation • Similar to a check constraint, any predicate in the where clause of the view are applied against the new values of the rows during an update through the view.
Symmetric views – Views with CHECK OPTION (2) • LOCAL vs. CASCADED CHECK OPTION • View over another view • LOCAL – must satisfy definitions of current view and all underlying views that also have a check option • CASCADED (default) – must satisfy definitions of current view and all underlying views, regardless of whether they have a check option or not. create view VIEW1 as Select empno, empname, deptno from EMP where dept10 = 10; Create view VIEW2 as Select empno, empname, deptno from VIEW1 where empno > 20 WITH Y CHECK OPTION;
Costs vs. Benefits of Constraints Benefit of Preventing application errors Enabling optimizations Cost of data integrity enforcement / maintenance vs. • Informational Constraints: if the integrity is enforced outside of the database (e.g. ETL processes), informational constraints can provide the benefit without the additional cost db2 => ALTER TABLE DEPARTMENT ALTER FOREIGN KEY dept_fk NOT ENFORCED
Enforced vs. Informational Constraints • By default: • the constraints are enforced by the database engine, frequently there is cost associated with enforcement. • all constraints are exploited during query optimization • Check and Referential Integrity constraints can be ENFORCED or NOT • Functional dependencies purely informational (cannot be enforced) • If an informational constraint is enabled for query optimization, then the user must ensure that informational constraints are satisfied at all times, otherwise the query results could become incorrect
Agenda • Active Data Features • Overview of SQL Processing in DB2 • Constraints • Triggers • Semantic Query Optimization in DB2
Introduction to Triggers • Automatic procedural actions that are triggered by update events on a given table • Trigger types: BEFORE, AFTER and … INSTEAD OF • Granularity: trigger execution can be either per row, or per statement • Procedural SQL extensions can be used to implement the trigger logic
For each row triggers - BEFORE • Usually used to validate or fix up data before modification • Typical application: look at the new value of one or more columns of the current row, compare it to other rows in the table (possibly using aggregation) or reference against another table, if value doesn’t conform, fix it up or raise an error. • No database modifications are allowed inside before triggers • Simple before triggers are “inlined” into the main data flow, more complex ones will result in a join below the U/D/I operation
Rows RETURN ( 1) Cost I/O | 77 UPDATE ( 2) 1015.6 79 /---+---\ 77 77 FILTER TABLE: ADMIN ( 3) EMPLOYEE 25.8637 2 | 77 FILTER ( 4) 25.8115 2 | 77 TBSCAN ( 5) 25.7725 2 | 77 TABLE: ADMIN EMPLOYEE BEFORE trigger - example • Sample business rule: Disallow salary increases of over 25% CREATE TRIGGER VERIFY_RAISE NO CASCADE BEFORE UPDATE OF SALARY ON EMPLOYEE REFERENCING NEW AS N OLD AS O FOR EACH ROW MODE DB2SQL WHEN(N.SALARY > O.SALARY*1.25) SIGNAL SQLSTATE ‘75000’ (‘Salary increase over limit’); This trigger is “inlined” here. Typically you would expect the trigger body joined to the main flow of data.
For each row triggers - AFTER • Can be used for: • database assertion validation based on the after state • Example: archiving, versioning and replication of data • Executed once per each row modification • Can access the row transition variables NEW and OLD, corresponding to the before and after image of the row • Body can contain arbitrarily complex procedural-like logic inside a BEGIN ATOMIC … END block • Starting in V82 DB2 offers the ability to issue a CALL statement within a trigger body to invoke a stored procedure. • Packaged trigger (in development)
AFTER row level trigger - example • Implementing versioning with an after for each row trigger: CREATE TRIGGER AFTTR1 AFTER UPDATE OF WORKDEPT ON EMPLOYEE REFERENCING OLD AS O FOR EACH ROW MODE DB2SQL BEGIN ATOMIC DECLARE LAST_NEWEST DATE; SET LAST_NEWEST = (SELECT END_DATE FROM EMP_DEPT_HISTORY WHERE EMPNO=O.EMPNO ORDER BY 1 DESC FETCH FIRST 1 ROW ONLY); INSERT INTO EMP_DEPT_HISTORY VALUES(O.EMPNO, O.WORKDEPT, LAST_NEWEST, CURRENT DATE); END% UPDATE EMPLOYEE SET WORDEPT=‘D21’ WHERE EMPNO=‘000230’%
For each statement triggers - AFTER • Executed once per triggering statement, even if no rows affected by the update/delete/insert • Can reference the set of old and new row values via the transition tables
AFTER statement level trigger - example • Auditing of table data changes CREATE TRIGGER EMP_AUDIT AFTER UPDATE ON EMPLOYEE REFERENCING OLD_TABLE AS OLDT NEW_TABLE AS NEWT FOR EACH STATEMENT MODE DB2SQL INSERT INTO EMP_AUDIT SELECT O.EMPNO, O.LASTNAME, N.LASTNAME, O.WORKDEPT, N.WORKDEPT, O.PHONENO, N.PHONENO, O.JOB, N.JOB, O.EDLEVEL, N.EDLEVEL, O.SALARY, N.SALARY, O.BONUS, N.BONUS, O.COMM, N.COMM FROM OLDT O, NEWT N WHERE O.EMPNO=N.EMPNO; UPDATE EMPLOYEE SET WORKDEPT=‘D21’ WHERE EMPNO=‘000230’;
Some views are considered not updatable by DB2. Example: view with joins Instead Of Trigger on a view allows you to “intercept” the update/delete/insert issued against the view and replace it with operations on the underlying tables. Currently only row level INSTEAD OF triggers supported in DB2 Execute at the same time as after triggers Triggers on Views – INSTEAD OF CREATE VIEW EMPLOYEE_WITH_DEPT_INFO AS (SELECT E.*, D.DEPTNAME, D.MGRNO, D.LOCATION FROM EMPLOYEE E, DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO); CREATE TRIGGER INSTTR1 INSTEAD OF UPDATE ON EMPLOYEE_WITH_DEPT_INFO REFERENCING NEW AS N FOR EACH ROW MODE DB2SQL BEGIN ATOMIC UPDATE EMPLOYEE E SET (FIRSTNME, MIDINIT, LASTNAME, WORKDEPT, PHONENO, HIREDATE, JOB, EDLEVEL, SEX, BIRTHDATE, SALARY, BONUS, COMM) = (N.FIRSTNME, N.MIDINIT, N.LASTNAME, N.WORKDEPT, N.PHONENO, N.HIREDATE, N.JOB, N.EDLEVEL, N.SEX, N.BIRTHDATE, N.SALARY, N.BONUS, N.COMM) WHERE N.EMPNO = E.EMPNO;-- UPDATE DEPARTMENT D SET (DEPTNAME, MGRNO, LOCATION) = (N.DEPTNAME, N.MGRNO, N.LOCATION) WHERE N.WORKDEPT = D.DEPTNO;-- END;
Database utilities and constraints and triggers • Implications of triggers and constraints for utilities: • Bulk LOAD utility: • Data type, nullability and key constraints enforced during load. • Referential integrity and check constraints not verified during load • table placed in check pending state after load • constraints must be verified in a separate step using the SET INTEGRITY statement. • If load was incremental (append) the constraint checking will be incremental as well • Generated columns can be generate during load, or by SET INTEGRITY • Triggers not fired!
Agenda • Active Data Features • Overview of SQL Processing in DB2 • Constraints • Triggers • Semantic Query Optimization in DB2
Semantic Query Optimization in DB2 • Semantic query optimization (SQO) = the use of integrity constraints to optimize the evaluation of a query • Mostly during the query rewrite and query optimization phases • elimination of unnecessary processing based on additional semantic information provided by the integrity constraints
Exploiting Check Constraints • Check constraint and generated column serve as derived predicate. • The predicate will be exploited by DB2 Query Rewrite Theorem Prover. • One common usage is to prove a FALSE predicate for the elimination of a subgraph or a branch • Example: Check Constraint: alter table t1 add constraint chk1 check (c1 > 5) Query: select * from t1 where c1 < 5 • Query Rewrite will prove that no row in T1 will satisfy the 2 predicates – will replace it with a FALSE predicate (1=0). • The subgraph that evaluates the scan of T1 will be changed to: Select * from values(1) where 1=0
Exploiting Check Constraints: Another Example Query Rewrite will: (1) Compile in the check constraint predicate (2) push down the predicate Year=2007 through union all operations (3) derives a FALSE predicate for branch 2006 and 2008 (4) eliminates the two branches altogether • create view SALES (SALES_2006 UNION ALL SALES_2007 UNION ALL SALES_2008) • Check constraints: SALES_2006 - check (Year = 2006) SALES_2007 - check (Year = 2007) SALES_2008 - check (Year = 2008) • Query: select * from SALES where Year = 2007 SELECT … WHERE YEAR=2007 PUSH DOWN (YEAR=2007) THRU UNION ALL PREDICATE DERIVED FROM THE CHECK CONSTRAINT UNION ALL SELECT (YEAR = 2006) SELECT (YEAR = 2007) SELECT (YEAR = 2008) SALES_2006 SALES_2008 SALES_2007
Exploiting Referential Integrity • Redundant join elimination CREATE VIEW EMPLOYEE_WITH_DEPT_INFO AS (SELECT E.*, D.DEPTNAME, D.MGRNO, D.LOCATION FROM EMPLOYEE E, DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO); select empno, lastname, workdept from employee_with_dept_info; • Join with the DEPARTMENT table eliminated entirely from the query • Need to compensate with workdept is not null SELECT SELECT JOIN EMPLOYEE DEPARTMENT EMPLOYEE
Group by pushdown through join Exploiting Referential Integrity (2) select deptno, deptname, sum(salary) from employee, department where workdept=deptno group by deptno, deptname GROUP BY pushed down below the join (along the child table) => reduces number of rows participating in the join GROUPBY JOIN JOIN DEPARTMENT GROUPBY DEPARTMENT EMPLOYEE EMPLOYEE
Exploiting Generated Columns CREATE TABLE ORDERS( O_ORDKEY INT NOT NULL, O_CUSTKEY INT NOT NULL, O_PRICE FLOAT NOT NULL, O_ORDERDATE DATE NOT NULL, ... OG_ORDERMONTH GENERATED ALWAYS AS (INTEGER(O_ORDERDATE)/100), PRIMARY KEY o_key (O_ORDERKEY)) ORGANIZE BY ( OG_ORDERMONTH ); SELECT * FROM ORDERS WHERE INTEGER(O_ORDERDATE)/100 = 200901; ---> SELECT FROM ORDERS WHERE OG_ORDERMONTH = 200901; • The generated expression in the query will be replaced by the generated column. • The predicate on generated column is used to exploit an index
Exploiting NOT NULL constraint, Uniqueness and Functional Dependency • NOT NULL • Example: MQT matching (more on this later) • Whenever possible declare columns as NOT NULL, do not leave nullable by default. • Uniqueness • can be exploited to remove DISTINCT processing, explicit and implicit, to avoid unnecessary SORT operators: • Example 1: create table emp (empno int not null unique, empname char(30)); select distinct * from emp ; • Example 2: MERGE statement duplicate checking • Functional Dependency • Example: MQT matching (more on this later)
Summary • Data integrity constraints implement business rules and application logic within the database server • If data integrity is maintained outside of the database, informational constraints can be defined • Constraints provide valuable information to Query Rewrite for query optimization • Triggers offer more flexibility by allowing procedural logic
Selected Further Reading • Roberta Cochrane, Hamid Pirahesh, Nelson Mendonça Mattos: Integrating Triggers and Declarative Constraints in SQL Database Sytems. VLDB 1996: 567-578 • Hamid Pirahesh, T. Y. Cliff Leung, Waqar Hasan: A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS. ICDE 1997: 391-400 • Qi Cheng, Jarek Gryz, Fred Koo, T. Y. Cliff Leung, Linqi Liu, Xiaoyan Qian, K. Bernhard Schiefer: Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database. VLDB 1999: 687-698 • Serge Rielau: INSTEAD OF Triggers - All Views are Updatable!, http://www.ibm.com/developerworks/db2/library/techarticle/0210rielau/0210rielau.html, 2002