460 likes | 641 Views
Normalization. Lossless Decomposition. Theorem A decomposition of R into R 1 and R 2 is lossless join wrt FDs F, if and only if at least one of the following dependencies is in F + : R 1 R 2 R 1 R 1 R 2 R 2
E N D
Lossless Decomposition Theorem A decomposition of R into R1 and R2 is lossless join wrt FDs F, if and only if at least one of the following dependencies is in F+: • R1 R2 R1 • R1 R2 R2 In other words, R1 R2 forms a superkey of either R1 or R2
Lossless Decomposition • Observe that S satisfies the FDs: • S# Status & S# City • It can not be a coincidence that S is equal to the join of its projections on {S#, Status} & {S#, City} • Heaths’ Theorem: Let R{A,B,C} be a relation, where A, B, & C are sets of attributes. If R satisfies AB & AC, then R is equal to the join of its projections on {A,B} & {A,C} • Observe that in the second decomposition of S the FD, S# City is lost
Lossless Decomposition • The decomposition of R into R1, R2, …Rn is lossless if for any instance r of R r = R1(r ) R2(r ) …… Rn(r ) • We can replace R by R1 & R2, knowing that the instance of R can be recovered from the instances of R1 & R2 • We can use FDs to show that decompositions are lossless
Decomposition Goal #2: Dependency preservation Goal: efficient integrity checks of FD’s An example w/ no DP: R = ( bname, bcity, assets, cname, lno, amt) bname bcity assets lno amt bname Decomposition: R = R1 U R2 R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt) Lossless but not DP. Why? Ans: bname bcity assets crosses 2 tables
Decomposition Goal #2: Dependency preservation To ensure best possible efficiency of FD checks ensure that only a SINGLE table is needed in order to check each FD i.e. ensure that: A1 A2 ... An B1 B2 ... Bm Can be checked by examining Ri = ( ..., A1, A2, ..., An, ..., B1, ..., Bm, ...) To test if the decomposition R = R1 U R2 U ... U Rn is DP (1) see which FD’s of R are covered by R1, R2, ..., Rn (2) compare the closure of (1) with the closure of FD’s of R
Decomposition Goal #2: Dependency preservation Example: Given F = { AB, AB D, C D} consider R = R1 U R2 s.t. R1 = (A, B, D) , R2 = (C, D) (1) F+ = { ABD, CD}+ (2) G = {ABD, CD, ...} + (3) F+ = G+ note: G+ cannot introduce new FDs not in F+ Decomposition is DP
Dependency Preservation • Let Fibe the set of dependencies F + that include only attributes in Ri. • A decomposition is dependency preserving, if (F1 F2 … Fn )+ = F + • If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.
In Layman’s Term • Each Functional Dependency specified in F either appears directly in one of the relations in the decomposition.
Continue… • It is not necessary that all dependencies from the relation R appear in some relation Ri. It is sufficient that the union of the dependencies on all the relations Ri be equivalent to the dependencies on R.
Property of Dependency-Preservation • If a decomposition is not dependency-preserving, therefore, that dependency is lost in the decomposition. FD4 FD3 FD1 FD2
Example of Dependency Preservation • R(A B C D) • FD1: A B • FD2: B C • FD3: C D • Decomposition:R1(A B C) R2(C D)
FD1: A B • FD2: B C • FD3: C D R1( A B C ) FD1 FD2
FD1: A B • FD2: B C • FD3: C D R2( C D ) FD3
FD1: A B • FD2: B C • FD3: C D R1( A B C ) R2( C D ) FD1 FD3 FD2 Has all 3 functional dependencies! Therefore, it’s preserving the dependencies
Example of Non-Dependency Preservation • R(A B C D) • FD1: A B • FD2: B C • FD3: C D • Decomposition:R1(A C D) R2(B C)
FD1: A B • FD2: B C • FD3: C D R1( A C D ) FD3
FD1: A B • FD2: B C • FD3: C D R2( B C ) FD2
FD1: A B • FD2: B C • FD3: C D R1( A C D ) R2( B C ) FD2 FD3 Does not support FD1: A => B Therefore, it does not preserve the dependencies
More Example • R(A B C D E) • FD1: A B • FD2: BC D • Decomposition:R1(A C E) R2(B C D) R3(A B)
FD1: A B • FD2: BC D R1( A C E ) No Dependencies
FD1: A B • FD2: BC D R2( B C D ) FD2
FD1: A B • FD2: BC D R3( A B ) FD1
FD1: A B • FD2: BC D R1( A C E ) R2( B C D ) FD2 R3( A B ) FD1 Has all 2 functional dependencies! Therefore, it’s preserving the dependencies
R(A, B, C, D, E, F ) • FD1: DA, B • FD2: C E, F • Decomposition:R1( A, C, D )R2( A, D, B ) R3( D, E, F )R4( C, E, F )
FD1: DA, B FD2: C E, F R1( A C D )R2( A D B )R3( D E F ) R4( C E F ) FD1 FD2
Answer • Yes! This is a dependency-preservation
To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) result = while (changes to result) dofor eachRiin the decompositiont = (result Ri)+ Riresult = result t If result contains all attributes in , then the functional dependency is preserved. We apply the test on all dependencies in F to check if a decomposition is dependency preserving This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2 … Fn)+ Testing for Dependency Preservation
Example • R = (A, B, C)F = {A B, B C) • Can be decomposed in two different ways • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1 R2 = {B}and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1 R2 = {A}and A AB • Not dependency preserving (cannot check B C without computing R1 R2)
Decomposition Goal #3: Redudancy Avoidance Example: Redundancy for B=x , y and z (1) An FD that exists in the above relation is: B C (2) A superkey in the above relation is A, (or any set containing A) When do you have redundancy? Ans: when there is some FD, XY covered by a relation and X is not a superkey
Problems with Decompositions There are three potential problems to consider: • Some queries become more expensive • e.g., What is the price of prop# 1? • Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! • Fortunately, not in the PLOTS example • Checking some dependencies may require joining the instances of the decomposed relations. • Fortunately, not in the PLOTS example Tradeoff:Must consider these issues vs. redundancy
Example • R = (A, B, C )F = {AB B C}Key = {A} • R is not in BCNF (B C but B is not superkey) • Decomposition R1 = (A, B), R2 = (B, C) • R1and R2 in BCNF • Lossless-join decomposition • Dependency preserving
Testing for BCNF • To check if a non-trivial dependency causes a violation of BCNF 1. compute + (the attribute closure of ), and 2. verify that it includes all attributes of R, that is, it is a superkey of R. • Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+. • If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either. • However, simplified test using only F isincorrectwhen testing a relation in a decomposition of R • Consider R = (A, B, C, D, E), with F = { A B, BC D} • Decompose R into R1 =(A,B) and R2 =(A,C,D, E) • Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF. • In fact, dependency ACD in F+ shows R2 is not in BCNF.
BCNF and Dependency Preservation It is not always possible to get a BCNF decomposition that is dependency preserving • R = (J, K, L )F = {JK L L K }Two candidate keys = JK and JL • R is not in BCNF • Any decomposition of R will fail to preserve JK L This implies that testing for JK L requires a join
Third Normal Form: Motivation • There are some situations where • BCNF is not dependency preserving, and • efficient checking for FD violation on updates is important • Solution: define a weaker normal form, called Third Normal Form (3NF) • Allows some redundancy (with resultant problems; we will see examples later) • But functional dependencies can be checked on individual relations without computing a join. • There is always a lossless-join, dependency-preserving decomposition into 3NF.
Redundancy in 3NF • There is some redundancy in this schema • Example of problems due to redundancy in 3NF • R = (J, K, L)F = {JK L, L K } J L K j1 j2 j3 null l1 l1 l1 l2 k1 k1 k1 k2 • repetition of information (e.g., the relationship l1, k1) • (i_ID, dept_name) • need to use null values (e.g., to represent the relationshipl2, k2 where there is no corresponding value for J). • (i_ID, dept_nameI) if there is no separate relation mapping instructors to departments
Testing for 3NF • Optimization: Need to check only FDs in F, need not check all FDs in F+. • Use attribute closure to check for each dependency , if is a superkey. • If is not a superkey, we have to verify if each attribute in is contained in a candidate key of R • this test is rather more expensive, since it involve finding candidate keys • testing for 3NF has been shown to be NP-hard • Interestingly, decomposition into third normal form (described shortly) can be done in polynomial time
3NF Decomposition Algorithm Let Fcbe a canonical cover for F;i := 0;for each functional dependency in Fcdo if none of the schemas Rj, 1 j i contains then begini := i + 1;Ri := endif none of the schemas Rj, 1 j i contains a candidate key for Rthen begini := i + 1;Ri := any candidate key for R;end /* Optionally, remove redundant relations */ repeatif any schema Rj is contained in another schema Rkthen /* delete Rj */Rj = R;; i=i-1;return (R1, R2, ..., Ri)
Testing Decomposition for BCNF • To check if a relation Ri in a decomposition of R is in BCNF, • Either test Ri for BCNF with respect to the restriction of F to Ri (that is, all FDs in F+ that contain only attributes from Ri) • or use the original set of dependencies F that hold on R, but with the following test: • for every set of attributes Ri, check that + (the attribute closure of ) either includes no attribute of Ri- , or includes all attributes of Ri. • If the condition is violated by some in F, the dependency (+ - ) Rican be shown to hold on Ri, and Ri violates BCNF. • We use above dependency to decompose Ri
BCNF Decomposition Algorithm result := {R };done := false;compute F +;while (not done) do if (there is a schema Riin result that is not in BCNF)then beginlet be a nontrivial functional dependency that holds on Risuch that Riis not in F +, and = ;result := (result – Ri ) (Ri – ) (, );end else done := true; Note: each Riis in BCNF, and decomposition is lossless-join.
Example of BCNF Decomposition • class (course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id) • Functional dependencies: • course_id→ title, dept_name, credits • building, room_number→capacity • course_id, sec_id, semester, year→building, room_number, time_slot_id • A candidate key {course_id, sec_id, semester, year}. • BCNF Decomposition: • course_id→ title, dept_name, credits holds • but course_id is not a superkey. • We replace class by: • course(course_id, title, dept_name, credits) • class-1 (course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id)
BCNF Decomposition (Cont.) • course is in BCNF • How do we know this? • building, room_number→capacity holds on class-1 • but {building, room_number} is not a superkey for class-1. • We replace class-1 by: • classroom (building, room_number, capacity) • section (course_id, sec_id, semester, year, building, room_number, time_slot_id) • classroom and section are in BCNF.