700 likes | 1.01k Views
CS157B Lecture 16. Lossless Decomposition& 4NF. Prof. Sin-Min Lee Department of Computer Science. 1. Closure. Given a set of functional dependencies, F, its closure, F + , is all FDs that are implied by FDs in F. e.g. If A B, and B C, then clearly A C.
E N D
CS157B Lecture 16 Lossless Decomposition& 4NF Prof. Sin-Min Lee Department of Computer Science
1. Closure • Given a set of functional dependencies, F, its closure, F+ , is all FDs that are implied by FDs in F. • e.g. If A B, and B C, • then clearly A C
Armstrong’s Axioms • We can find F+ by applying Armstrong’s Axioms: • if , then (reflexivity) • if , then (augmentation) • if , and , then (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).
Additional rules • If and , then (union) • If , then and (decomposition) • If and , then (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • Some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • by augmenting CG I to infer CG CGI, and augmenting of CG H to inferCGI HI, and then transitivity
2. Closure of an attribute set • Given a set of attributes A and a set of FDs F, closure of A under F is the set of all attributes implied by A • In other words, the largest B such that: • A B • Redefining super keys: • The closure of a super key is the entire relation schema • Redefining candidate keys: • 1. It is a super key • 2. No subset of it is a super key
Computing the closure for A • Simple algorithm • 1. Start with B = A. • 2. Go over all functional dependencies, , in F+ • 3. If B, then • Add to B • 4. Repeat till B changes
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • (AG) + ? • 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH Is (AG) a candidate key ? 1. It is a super key. 2. (A+) = BC, (G+) = G. YES.
Uses of attribute set closures • Determining superkeys and candidate keys • Determining if A B is a valid FD • Check if A+ contains B • Can be used to compute F+
Database Normalization Functional dependency (FD) means that if there is only one possible value of Y for every value of X, then Y is Functionally dependent on X. Is the following FDs hold?
Database Normalization • Functional Dependencyis “good”. With functional dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.) • Transitive dependencyis “bad”. Transitive dependency exists if the primary/candidate key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C. • If a relation schema has more than one key, each is called a candidate key • An attribute in a relation schema R is called prim if it is a member of some candidate key of R
Third Normal Form (3NF) • Remove transitive dependencies. • Transitive dependency • A non-prime attribute is dependent on another, non-prime attribute or attributes • Attribute is the result of a calculation • Examples: • Area code attribute based on City attribute of a customer • Total price attribute of order entry based on quantity attribute and unit price attribute (calculated value) • Solution: • Any transitive dependencies are moved into a smaller table.
Transitive Dependence Give a relation R, Assume the following FD hold: Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists R2 R1 Note : If address is a prime attribute Then R is in 3NF
Modification Anomalies • What happens when you want to • add a new book? • change the address of a patron? • delete a patron record?
Modification Anomalies • Deletion anomaly • deleting one fact about an entity deletes a fact about another entity • Insertion anomaly • cannot insert one fact about an entity unless a fact about another entity is also added • Update anomaly • changing one fact about an entity requires multiple changes to a table
Referential Integrity Constraint • When we split a relation, we must pay attention to the references across the newly formed relations • E.g., a book must exist before it can be checked out: • CHECKOUT [BookID] ÍBOOK [BookID] • The DBMS or the applications will have to check/enforce constraints
Boyce-Codd Normal Form • Every determinant is a candidate key • ADVISER(SID,Major,Fname) • STU-ADV(SID,Fname)ADV-SUBJ(Fname,Subject)
Database Normalization • Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.
A Table That Is In 3NF But Not In BCNF Figure 5.7
The Decomposition of a Table Structure to Meet BCNF Requirements Figure 5.8
Lossless-join Decomposition • For the case of R = (R1, R2), we require that for all possible relations r on schema R r = R1(r ) |X| R2(r ) • A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+: • R1R2R1 • R1R2R2
R = (A, B, C) F = {A B, B C) • Can be decomposed in two different ways • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1 R2 = {B}and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1 R2 = {A}and A AB • Not dependency preserving (cannot check B C without computing R1 |X|R2)
Dependency Preservation • Let Fibe the set of dependencies F + that include only attributes in Ri. • A decomposition is dependency preserving, if (F1 F2 … Fn )+ = F + • If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.
Dependency Preservation • To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) • result = while (changes to result) dofor eachRiin the decompositiont = (result Ri)+ Riresult = result t • If result contains all attributes in , then the functional dependency is preserved.
Dependency Preservation • We apply the test on all dependencies in F to check if a decomposition is dependency preserving • This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2 … Fn)+
FD Example • R = (A, B, C )F = {AB, B C}Key = {A} • R is not in BCNF • Decomposition R1 = (A, B), R2 = (B, C) • R1and R2 now in BCNF • Lossless-join decomposition • Dependency preserving
Aim of Normalization • Goal for a relational database design is: • BCNF. • Lossless join. • Dependency preservation. • If we cannot achieve this, we accept one of • Lack of dependency preservation • Redundancy due to use of 3NF
Sample Data for a BCNF Conversion Table 5.2
Perform lossless-join decompositions of each of the following scheme into BCNF schemes: R(A, B, C, D, E) with dependency set {AB CDE, C D, D E} A B C D A B C D C D A B C E A B C D D E A B C A B C C D D E
Given the FDs {B D, AB C, D B} and the relation {A, B, C, D}, give a two distinct lossless join decomposition to BNCF indicating the keys of each of the resulting relations. A B C D A B C D B D A B C B D A C D
Normalization • Good Decomposition • dependency preserving decomposition - it is undesirable to lose functional dependencies during decomposition • lossless join decomposition - join of decomposed relations should be able to create the original relation (no spurious tuples)
Decomposition and 4NF • If X ->->Y is a 4NF violation for relation R, we can decompose R using the same technique as for BCNF. • XY is one of the decomposed relations. • All but Y – X is the other.
Example • Drinkers(name, addr, phones, beersLiked) • FD: name -> addr • MVD’s: name ->-> phones • name ->-> beersLiked • Key is {name, phones, beersLiked}. • All dependencies violate 4NF.
Example, Continued • Decompose using name -> addr: • Drinkers1(name, addr) • In 4NF, only dependency is name -> addr. • Drinkers2(name, phones, beersLiked) • Not in 4NF. MVD’s name ->-> phones and name ->-> beersLiked apply. No FD’s, so all three attributes form the key.
Example: Decompose Drinkers2 • Either MVD name ->-> phones or name ->-> beersLiked tells us to decompose to: • Drinkers3(name, phones) • Drinkers4(name, beersLiked)
BCNF • Given a relation schema R, and a set of functional dependencies F, if every FD, A B, is either: • 1. Trivial • 2. A is a superkey of R • Then, R is in BCNF (Boyce-Codd Normal Form) • Why is BCNF good ?
BCNF • What if the schema is not in BCNF ? • Decompose (split) the schema into two pieces. • Careful: you want the decomposition to be lossless
Achieving BCNF Schemas • For all dependencies A B in F+, check if A is a superkey • By using attribute closure • If not, then • Choose a dependency in F+ that breaks the BCNF rules, say A B • Create R1 = A B • Create R2 = A (R – B – A) • Note that: R1 ∩ R2 = A and A AB (= R1), so this is lossless decomposition • Repeat for R1, and R2 • By defining F1+ to be all dependencies in F that contain only attributes in R1 • Similarly F2+
B C Example 1 • R = (A, B, C) • F = {A B, B C} • Candidate keys = {A} • BCNF = No. B C violates. • R1 = (B, C) • F1 = {B C} • Candidate keys = {B} • BCNF = true • R2 = (A, B) • F2 = {A B} • Candidate keys = {A} • BCNF = true
From A B and BC D by pseudo-transitivity AC D A B • R = (A, B, C, D, E) • F = {A B, BC D} • Candidate keys = {ACE} • BCNF = Violated by {A B, BC D} etc… Example 2-1 • R1 = (A, B) • F1 = {A B} • Candidate keys = {A} • BCNF = true • R2 = (A, C, D, E) • F2 = {AC D} • Candidate keys = {ACE} • BCNF = false (AC D) • Dependency preservation ??? • We can check: • A B (R1), AC D (R3), • but we lost BC D • So this is not a dependency • -preserving decomposition • R4 = (A, C, E) • F4 = {} [[ only trivial ]] • Candidate keys = {ACE} • BCNF = true • R3 = (A, C, D) • F3 = {AC D} • Candidate keys = {AC} • BCNF = true
BC D A B • R = (A, B, C, D, E) • F = {A B, BC D} • Candidate keys = {ACE} • BCNF = Violated by {A B, BC D} etc… Example 2-2 • R1 = (B, C, D) • F1 = {BC D} • Candidate keys = {BC} • BCNF = true • R2 = (B, C, A, E) • F2 = {A B} • Candidate keys = {ACE} • BCNF = false (A B) • Dependency preservation ??? • We can check: • BC D (R1), A B (R3), • Dependency-preserving • decomposition • R3 = (A, B) • F3 = {A B} • Candidate keys = {A} • BCNF = true • R4 = (A, C, E) • F4 = {} [[ only trivial ]] • Candidate keys = {ACE} • BCNF = true
E HA A BC Example 3 • R = (A, B, C, D, E, H) • F = {A BC, E HA} • Candidate keys = {DE} • BCNF = Violated by {A BC} etc… • R1 = (A, B, C) • F1 = {A BC} • Candidate keys = {A} • BCNF = true • R2 = (A, D, E, H) • F2 = {E HA} • Candidate keys = {DE} • BCNF = false (E HA) • Dependency preservation ??? • We can check: • A BC (R1), E HA (R3), • Dependency-preserving • decomposition • R4 = (ED) • F4 = {} [[ only trivial ]] • Candidate keys = {DE} • BCNF = true • R3 = (E, H, A) • F3 = {E HA} • Candidate keys = {E} • BCNF = true
Multi-valued Dependency • Two or more functionally independent multi-valued attributes are dependent on another attribute • EMPLOYEE(Name,Dependent,Project) • Data redundancy and modification anomalies • 4NF: BCNF & no multi-valued dependencies • EMPLOYEE(Name,Dependent) • EMPLOYEE(Name, Project)