Schema Refinement and Normal Forms

Conceptual design Schemas ICs Schema Refinement and Normal Forms • Given a design, how do we know it is good or not? • What is the best design? • Can a bad design be transformed into a good one?

Normalization A relation is said to be in a particular normal form if it satisfies a certain set of constraints. • If a relation is in a certain normal form (BCNF, 3NF etc.), we know what problems it has and what problems it does not have • Each normal eliminates or minimizes certain kinds of problems • Given a relation, the process of making it to be in certain normal form is called normalization • Typically this is done by breaking up the relation into a set of smaller relations that possess desirable properties.

key … nonkey attr_1 nonkey attr_2 nonkey attr_n Boyce-Codd Normal Form (BCNF) • A relation R is in BCNF if whenever a FD XA holds in R, one of the following statements is true. • XA is a trivial FD. • X is a superkey. A trivial FD XY where Y X • Example 1: • Scheme: Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked) • Constraints: ssnis the primary key and Rating hrly_wages • Example 2: • Schema: R(A, B, C, D) • Constraints: A is the primary key, B is a candidate key, is R a BCNF?

BCNF is the most desirable form • A BCNF relation does not allow redundancy • Every field of every tuple records a piece of information that cannot be inferred from the values in all other non-key fields

Normalization If a relation is not in BCNF, can we make it BCNF? • Example • Relation R(SNLRWH) has FDs SSNLRWH and RW • Second FD causes violation of BCNF • consequence: W values repeatedly associated with R values • We decompose SNLRWH into SNLRH and RW

Normalization through Decomposition • A decomposition of a relation schema R • The replacement of the schema R by two or more relation schemas, each contains a subset of R and together include all attributes of R. • A decomposition must ensure two properties: • Lossless join • Dependency preservation

Lossless Join Decompositions • Decomposition of R into X and Y is a lossless-join decomposition w.r.t. a set of FDs F if, for every instance r that satisfies F: • (r) (r) = r • It is always true that r (r) (r) • In general, the other direction does not hold! If it does, the decomposition is lossless-join. • It is essential that all decompositions used to deal with redundancy be lossless!

A decomposition of D={R1,R2,…,Rm} of Relation R has the lossless join property with respect to the set of FDs F on R if for every relation instance r(R) that satisfies F, the following holds: NATURAL_JOIN( ) =r

Lossless Join Decomposition: Property 1 Property 1: A decomposition D={R1,R2} of R has the lossless join property with respect to a set of FDs F of R if and only if either is in F+ or is in F+. The common attribute must be a super key for either R1 or R2. A B B C Case 1: Case 2: R1 R2 A foreign key A B B C R1 R2 A foreign key

Lossless Join Decomposition: Property 2 If (i) a decomposition D={R1,…,Rm} of R has the lossless join property with respect to a set of FDs F on R, and (ii) a decomposition Di={Q1,…,Q2} of Ri has the lossless join property with respect to the projection of F on Ri. then the decomposition D’={R1,R2,…, Ri-1,Q1,…,Qn,Ri+1,…,Rm} of R has the lossless join property with respect to F. Decomposition D Decomposition D3 Decomposition D’ R R3 R Q1 Q2 Q3 … Qn R1 R2 Q1 Q2 … Qn R4 … Rm R1 R2 R3 … Rm

Lossless Join Decomposition into BCNF relations • Algorithm: • 1 Set D{R} • 2While there is a relation schema Q in D that is not in BCNF do • begin • Choose a relation schema Q in D that is not in BCNF; • Find a functional dependency XY in Q that violates BCNF; • Replace Q in D by two schemas (Q-Y) and (XUY) • end; We have Since XY is in F, D={(Q-Y),(X Y)} has the lossless join property.

Exercise Determine whether D={R1,R2, R3} of R(S,E,P,N,L,H) is a lossless-join decomposition. R1={S,E} R2={P,N,L} R3={S,P,H} F={SE, SPH, PNL}

DEPENDENCY PRESERVATION R with a set of FDs F D={R1,…,Rn} is a decomposition of R. projection R1 R2 Rn FR1 FR2 FRn The projection of F on Ri (FRi) is defined as: D={R1,…,Rn} of R is dependency preserving with respect to F if . , meaning that is equivalent to F.

DEPENDENCY PRESERVATION • We want to preserve the dependencies because each FD in F represents a constraint on the database. • We want each original FD to be represented by some individual relation Ri so we can check the constraint without joining two or more relations. • Otherwise, each update would require to do join operations Contracts(contractid, supplierid, projectid, deptid, partid, qty, value) Example: Contracts (C S J D P Q V) CCSJDPQV JPC SDP JS Is it BCNF?

DEPENDENCY PRESERVATION CCSJDPQV JPC SDP JS Example: Contracts (C S J D P Q V) SDP SDP CSDJQV JS JS CJDQV Loss-less join decomposition?

DEPENDENCY PRESERVATION CCSJDPQV JPC SDP JS Example: Contracts (C S J D P Q V) SDP SDP CSDJQV JS JS CJDQV • Where JPC is in the result of the decomposition? • To enforce JPC, we need to join the three relations for each update

An alternative decomposition CCSJDPQV JPC SDP JS Example: Contracts (C S J D P Q V) J->S JS CJDPQV Is this decomposition lossless join and dependency preserving?

Question Can any relation be decomposed into BCNF while ensuring lossless join and dependency preservation?

Is CSZ in ? • In general, there may not be a dependency preserving decomposition into BCNF. • e.g., CSZ, CS  Z, Z  C • what NF? • Let’s consider a decomposition D={ZC,SZ}. • what NF? Is lossless-join decomposition? • Is CSZ preserved? • CS+={C,S,Z}; SZ+= {S,Z,C}; ZC+={Z,C} • R1(ZC); FR1={ZC} • R2(SZ); FR2={SZZ, SZS} • = {ZC,SZZ,SZS}+

Schema Refinement and Normal Forms