Lossless Decomposition& 4NF

CS157B Lecture 16 Lossless Decomposition& 4NF Prof. Sin-Min Lee Department of Computer Science

1. Closure • Given a set of functional dependencies, F, its closure, F+ , is all FDs that are implied by FDs in F. • e.g. If A  B, and B  C, • then clearly A  C

Armstrong’s Axioms • We can find F+ by applying Armstrong’s Axioms: • if   , then   (reflexivity) • if  , then    (augmentation) • if  , and   , then   (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).

Additional rules • If   and , then  (union) • If   , then   and (decomposition) • If   and  , then   (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • Some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • by augmenting CG I to infer CG  CGI, and augmenting of CG H to inferCGI HI, and then transitivity

2. Closure of an attribute set • Given a set of attributes A and a set of FDs F, closure of A under F is the set of all attributes implied by A • In other words, the largest B such that: • A  B • Redefining super keys: • The closure of a super key is the entire relation schema • Redefining candidate keys: • 1. It is a super key • 2. No subset of it is a super key

Computing the closure for A • Simple algorithm • 1. Start with B = A. • 2. Go over all functional dependencies,  , in F+ • 3. If  B, then • Add  to B • 4. Repeat till B changes

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • (AG) + ? • 1. result = AG 2. result = ABCG (A C and A  B) 3. result = ABCGH (CG H and CG  AGBC) 4. result = ABCGHI (CG I and CG  AGBCH Is (AG) a candidate key ? 1. It is a super key. 2. (A+) = BC, (G+) = G. YES.

Uses of attribute set closures • Determining superkeys and candidate keys • Determining if A  B is a valid FD • Check if A+ contains B • Can be used to compute F+

Database Normalization Functional dependency (FD) means that if there is only one possible value of Y for every value of X, then Y is Functionally dependent on X. Is the following FDs hold?

Database Normalization • Functional Dependencyis “good”. With functional dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.) • Transitive dependencyis “bad”. Transitive dependency exists if the primary/candidate key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C. • If a relation schema has more than one key, each is called a candidate key • An attribute in a relation schema R is called prim if it is a member of some candidate key of R

Third Normal Form (3NF) • Remove transitive dependencies. • Transitive dependency • A non-prime attribute is dependent on another, non-prime attribute or attributes • Attribute is the result of a calculation • Examples: • Area code attribute based on City attribute of a customer • Total price attribute of order entry based on quantity attribute and unit price attribute (calculated value) • Solution: • Any transitive dependencies are moved into a smaller table.

Transitive Dependence Give a relation R, Assume the following FD hold: Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists R2 R1 Note : If address is a prime attribute Then R is in 3NF

Modification Anomalies • What happens when you want to • add a new book? • change the address of a patron? • delete a patron record?

Modification Anomalies • Deletion anomaly • deleting one fact about an entity deletes a fact about another entity • Insertion anomaly • cannot insert one fact about an entity unless a fact about another entity is also added • Update anomaly • changing one fact about an entity requires multiple changes to a table

Referential Integrity Constraint • When we split a relation, we must pay attention to the references across the newly formed relations • E.g., a book must exist before it can be checked out: • CHECKOUT [BookID] ÍBOOK [BookID] • The DBMS or the applications will have to check/enforce constraints

Boyce-Codd Normal Form • Every determinant is a candidate key • ADVISER(SID,Major,Fname) • STU-ADV(SID,Fname)ADV-SUBJ(Fname,Subject)

Database Normalization • Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.

A Table That Is In 3NF But Not In BCNF Figure 5.7

The Decomposition of a Table Structure to Meet BCNF Requirements Figure 5.8

Lossless-join Decomposition • For the case of R = (R1, R2), we require that for all possible relations r on schema R r = R1(r ) |X| R2(r ) • A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+: • R1R2R1 • R1R2R2

R = (A, B, C) F = {A B, B C) • Can be decomposed in two different ways • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1 R2 = {B}and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1 R2 = {A}and A  AB • Not dependency preserving (cannot check B C without computing R1 |X|R2)

Dependency Preservation • Let Fibe the set of dependencies F + that include only attributes in Ri. • A decomposition is dependency preserving, if (F1 F2  … Fn )+ = F + • If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.

Dependency Preservation • To check if a dependency  is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) • result = while (changes to result) dofor eachRiin the decompositiont = (result Ri)+ Riresult = result t • If result contains all attributes in , then the functional dependency  is preserved.

Dependency Preservation • We apply the test on all dependencies in F to check if a decomposition is dependency preserving • This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2 …  Fn)+

FD Example • R = (A, B, C )F = {AB, B  C}Key = {A} • R is not in BCNF • Decomposition R1 = (A, B), R2 = (B, C) • R1and R2 now in BCNF • Lossless-join decomposition • Dependency preserving

A Lossy Decomposition

Aim of Normalization • Goal for a relational database design is: • BCNF. • Lossless join. • Dependency preservation. • If we cannot achieve this, we accept one of • Lack of dependency preservation • Redundancy due to use of 3NF

Sample Data for a BCNF Conversion Table 5.2

Decomposition into BCNF

Perform lossless-join decompositions of each of the following scheme into BCNF schemes: R(A, B, C, D, E) with dependency set {AB  CDE, C  D, D  E} A B C D A B C D C D A B C E A B C D D E A B C A B C C D D E

Given the FDs {B  D, AB  C, D  B} and the relation {A, B, C, D}, give a two distinct lossless join decomposition to BNCF indicating the keys of each of the resulting relations. A B C D A B C D B D A B C B D A C D

Normalization • Good Decomposition • dependency preserving decomposition - it is undesirable to lose functional dependencies during decomposition • lossless join decomposition - join of decomposed relations should be able to create the original relation (no spurious tuples)

Decomposition and 4NF • If X ->->Y is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF. • XY is one of the decomposed relations. • All but Y – X is the other.

Example • Drinkers(name, addr, phones, beersLiked) • FD: name -> addr • MVD’s: name ->-> phones • name ->-> beersLiked • Key is {name, phones, beersLiked}. • All dependencies violate 4NF.

Example, Continued • Decompose using name -> addr: • Drinkers1(name, addr) • In 4NF, only dependency is name -> addr. • Drinkers2(name, phones, beersLiked) • Not in 4NF. MVD’s name ->-> phones and name ->-> beersLiked apply. No FD’s, so all three attributes form the key.

Example: Decompose Drinkers2 • Either MVD name ->-> phones or name ->-> beersLiked tells us to decompose to: • Drinkers3(name, phones) • Drinkers4(name, beersLiked)

BCNF • Given a relation schema R, and a set of functional dependencies F, if every FD, A  B, is either: • 1. Trivial • 2. A is a superkey of R • Then, R is in BCNF (Boyce-Codd Normal Form) • Why is BCNF good ?

BCNF • What if the schema is not in BCNF ? • Decompose (split) the schema into two pieces. • Careful: you want the decomposition to be lossless

Achieving BCNF Schemas • For all dependencies A  B in F+, check if A is a superkey • By using attribute closure • If not, then • Choose a dependency in F+ that breaks the BCNF rules, say A  B • Create R1 = A B • Create R2 = A (R – B – A) • Note that: R1 ∩ R2 = A and A  AB (= R1), so this is lossless decomposition • Repeat for R1, and R2 • By defining F1+ to be all dependencies in F that contain only attributes in R1 • Similarly F2+

B  C Example 1 • R = (A, B, C) • F = {A  B, B  C} • Candidate keys = {A} • BCNF = No. B  C violates. • R1 = (B, C) • F1 = {B  C} • Candidate keys = {B} • BCNF = true • R2 = (A, B) • F2 = {A  B} • Candidate keys = {A} • BCNF = true

From A  B and BC  D by pseudo-transitivity AC  D A  B • R = (A, B, C, D, E) • F = {A  B, BC  D} • Candidate keys = {ACE} • BCNF = Violated by {A  B, BC  D} etc… Example 2-1 • R1 = (A, B) • F1 = {A  B} • Candidate keys = {A} • BCNF = true • R2 = (A, C, D, E) • F2 = {AC  D} • Candidate keys = {ACE} • BCNF = false (AC  D) • Dependency preservation ??? • We can check: • A  B (R1), AC  D (R3), • but we lost BC  D • So this is not a dependency • -preserving decomposition • R4 = (A, C, E) • F4 = {} [[ only trivial ]] • Candidate keys = {ACE} • BCNF = true • R3 = (A, C, D) • F3 = {AC  D} • Candidate keys = {AC} • BCNF = true

BC  D A  B • R = (A, B, C, D, E) • F = {A  B, BC  D} • Candidate keys = {ACE} • BCNF = Violated by {A  B, BC  D} etc… Example 2-2 • R1 = (B, C, D) • F1 = {BC  D} • Candidate keys = {BC} • BCNF = true • R2 = (B, C, A, E) • F2 = {A  B} • Candidate keys = {ACE} • BCNF = false (A  B) • Dependency preservation ??? • We can check: • BC  D (R1), A  B (R3), • Dependency-preserving • decomposition • R3 = (A, B) • F3 = {A  B} • Candidate keys = {A} • BCNF = true • R4 = (A, C, E) • F4 = {} [[ only trivial ]] • Candidate keys = {ACE} • BCNF = true

E  HA A  BC Example 3 • R = (A, B, C, D, E, H) • F = {A  BC, E  HA} • Candidate keys = {DE} • BCNF = Violated by {A  BC} etc… • R1 = (A, B, C) • F1 = {A  BC} • Candidate keys = {A} • BCNF = true • R2 = (A, D, E, H) • F2 = {E  HA} • Candidate keys = {DE} • BCNF = false (E  HA) • Dependency preservation ??? • We can check: • A  BC (R1), E  HA (R3), • Dependency-preserving • decomposition • R4 = (ED) • F4 = {} [[ only trivial ]] • Candidate keys = {DE} • BCNF = true • R3 = (E, H, A) • F3 = {E  HA} • Candidate keys = {E} • BCNF = true

Multi-valued Dependency • Two or more functionally independent multi-valued attributes are dependent on another attribute • EMPLOYEE(Name,Dependent,Project) • Data redundancy and modification anomalies • 4NF: BCNF & no multi-valued dependencies • EMPLOYEE(Name,Dependent) • EMPLOYEE(Name, Project)

Lossless Decomposition& 4NF