Normalization

Normalization

Lossless Decomposition Theorem A decomposition of R into R1 and R2 is lossless join wrt FDs F, if and only if at least one of the following dependencies is in F+: • R1  R2 R1 • R1  R2 R2 In other words, R1  R2 forms a superkey of either R1 or R2

Lossy Decomposition

Lossless Decomposition • Observe that S satisfies the FDs: • S#  Status & S#  City • It can not be a coincidence that S is equal to the join of its projections on {S#, Status} & {S#, City} • Heaths’ Theorem: Let R{A,B,C} be a relation, where A, B, & C are sets of attributes. If R satisfies AB & AC, then R is equal to the join of its projections on {A,B} & {A,C} • Observe that in the second decomposition of S the FD, S#  City is lost

Lossless Decomposition • The decomposition of R into R1, R2, …Rn is lossless if for any instance r of R r = R1(r ) R2(r ) …… Rn(r ) • We can replace R by R1 & R2, knowing that the instance of R can be recovered from the instances of R1 & R2 • We can use FDs to show that decompositions are lossless

Decomposition Goal #2: Dependency preservation Goal: efficient integrity checks of FD’s An example w/ no DP: R = ( bname, bcity, assets, cname, lno, amt) bname  bcity assets lno  amt bname Decomposition: R = R1 U R2 R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt) Lossless but not DP. Why? Ans: bname bcity assets crosses 2 tables

Decomposition Goal #2: Dependency preservation To ensure best possible efficiency of FD checks ensure that only a SINGLE table is needed in order to check each FD i.e. ensure that: A1 A2 ... An  B1 B2 ... Bm Can be checked by examining Ri = ( ..., A1, A2, ..., An, ..., B1, ..., Bm, ...) To test if the decomposition R = R1 U R2 U ... U Rn is DP (1) see which FD’s of R are covered by R1, R2, ..., Rn (2) compare the closure of (1) with the closure of FD’s of R

Decomposition Goal #2: Dependency preservation Example: Given F = { AB, AB D, C D} consider R = R1 U R2 s.t. R1 = (A, B, D) , R2 = (C, D) (1) F+ = { ABD, CD}+ (2) G = {ABD, CD, ...} + (3) F+ = G+ note: G+ cannot introduce new FDs not in F+ Decomposition is DP

Dependency Preservation • Let Fibe the set of dependencies F + that include only attributes in Ri. • A decomposition is dependency preserving, if (F1 F2  …  Fn )+ = F + • If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.

In Layman’s Term • Each Functional Dependency specified in F either appears directly in one of the relations in the decomposition.

Continue… • It is not necessary that all dependencies from the relation R appear in some relation Ri. It is sufficient that the union of the dependencies on all the relations Ri be equivalent to the dependencies on R.

Property of Dependency-Preservation • If a decomposition is not dependency-preserving, therefore, that dependency is lost in the decomposition. FD4 FD3 FD1 FD2

Example of Dependency Preservation • R(A B C D) • FD1: A  B • FD2: B  C • FD3: C  D • Decomposition:R1(A B C) R2(C D)

FD1: A  B • FD2: B  C • FD3: C  D R1( A B C ) FD1 FD2

FD1: A  B • FD2: B  C • FD3: C  D R2( C D ) FD3

FD1: A  B • FD2: B  C • FD3: C  D R1( A B C ) R2( C D ) FD1 FD3 FD2 Has all 3 functional dependencies! Therefore, it’s preserving the dependencies

Example of Non-Dependency Preservation • R(A B C D) • FD1: A  B • FD2: B  C • FD3: C  D • Decomposition:R1(A C D) R2(B C)

FD1: A  B • FD2: B  C • FD3: C  D R1( A C D ) FD3

FD1: A  B • FD2: B  C • FD3: C  D R2( B C ) FD2

FD1: A  B • FD2: B  C • FD3: C  D R1( A C D ) R2( B C ) FD2 FD3 Does not support FD1: A => B Therefore, it does not preserve the dependencies

More Example • R(A B C D E) • FD1: A  B • FD2: BC  D • Decomposition:R1(A C E) R2(B C D) R3(A B)

FD1: A  B • FD2: BC  D R1( A C E ) No Dependencies

FD1: A  B • FD2: BC  D R2( B C D ) FD2

FD1: A  B • FD2: BC  D R3( A B ) FD1

FD1: A  B • FD2: BC  D R1( A C E ) R2( B C D ) FD2 R3( A B ) FD1 Has all 2 functional dependencies! Therefore, it’s preserving the dependencies

R(A, B, C, D, E, F ) • FD1: DA, B • FD2: C  E, F • Decomposition:R1( A, C, D )R2( A, D, B ) R3( D, E, F )R4( C, E, F )

FD1: DA, B FD2: C  E, F R1( A C D )R2( A D B )R3( D E F ) R4( C E F ) FD1 FD2

Answer • Yes! This is a dependency-preservation

To check if a dependency    is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) result = while (changes to result) dofor eachRiin the decompositiont = (result  Ri)+  Riresult = result  t If result contains all attributes in , then the functional dependency    is preserved. We apply the test on all dependencies in F to check if a decomposition is dependency preserving This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2  …  Fn)+ Testing for Dependency Preservation

Example • R = (A, B, C)F = {A B, B C) • Can be decomposed in two different ways • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1  R2 = {B}and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1  R2 = {A}and A  AB • Not dependency preserving (cannot check B C without computing R1 R2)

Decomposition Goal #3: Redudancy Avoidance Example: Redundancy for B=x , y and z (1) An FD that exists in the above relation is: B  C (2) A superkey in the above relation is A, (or any set containing A) When do you have redundancy? Ans: when there is some FD, XY covered by a relation and X is not a superkey

Problems with Decompositions There are three potential problems to consider: • Some queries become more expensive • e.g., What is the price of prop# 1? • Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! • Fortunately, not in the PLOTS example • Checking some dependencies may require joining the instances of the decomposed relations. • Fortunately, not in the PLOTS example Tradeoff:Must consider these issues vs. redundancy

Example • R = (A, B, C )F = {AB B  C}Key = {A} • R is not in BCNF (B  C but B is not superkey) • Decomposition R1 = (A, B), R2 = (B, C) • R1and R2 in BCNF • Lossless-join decomposition • Dependency preserving

Testing for BCNF • To check if a non-trivial dependency causes a violation of BCNF 1. compute + (the attribute closure of ), and 2. verify that it includes all attributes of R, that is, it is a superkey of R. • Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+. • If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either. • However, simplified test using only F isincorrectwhen testing a relation in a decomposition of R • Consider R = (A, B, C, D, E), with F = { A  B, BC  D} • Decompose R into R1 =(A,B) and R2 =(A,C,D, E) • Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF. • In fact, dependency ACD in F+ shows R2 is not in BCNF.

BCNF and Dependency Preservation It is not always possible to get a BCNF decomposition that is dependency preserving • R = (J, K, L )F = {JK L L K }Two candidate keys = JK and JL • R is not in BCNF • Any decomposition of R will fail to preserve JK L This implies that testing for JK L requires a join

Third Normal Form: Motivation • There are some situations where • BCNF is not dependency preserving, and • efficient checking for FD violation on updates is important • Solution: define a weaker normal form, called Third Normal Form (3NF) • Allows some redundancy (with resultant problems; we will see examples later) • But functional dependencies can be checked on individual relations without computing a join. • There is always a lossless-join, dependency-preserving decomposition into 3NF.

Redundancy in 3NF • There is some redundancy in this schema • Example of problems due to redundancy in 3NF • R = (J, K, L)F = {JK L, L K } J L K j1 j2 j3 null l1 l1 l1 l2 k1 k1 k1 k2 • repetition of information (e.g., the relationship l1, k1) • (i_ID, dept_name) • need to use null values (e.g., to represent the relationshipl2, k2 where there is no corresponding value for J). • (i_ID, dept_nameI) if there is no separate relation mapping instructors to departments

Testing for 3NF • Optimization: Need to check only FDs in F, need not check all FDs in F+. • Use attribute closure to check for each dependency   , if  is a superkey. • If  is not a superkey, we have to verify if each attribute in  is contained in a candidate key of R • this test is rather more expensive, since it involve finding candidate keys • testing for 3NF has been shown to be NP-hard • Interestingly, decomposition into third normal form (described shortly) can be done in polynomial time

3NF Decomposition Algorithm Let Fcbe a canonical cover for F;i := 0;for each functional dependency in Fcdo if none of the schemas Rj, 1  j  i contains then begini := i + 1;Ri := endif none of the schemas Rj, 1  j  i contains a candidate key for Rthen begini := i + 1;Ri := any candidate key for R;end /* Optionally, remove redundant relations */ repeatif any schema Rj is contained in another schema Rkthen /* delete Rj */Rj = R;; i=i-1;return (R1, R2, ..., Ri)

Testing Decomposition for BCNF • To check if a relation Ri in a decomposition of R is in BCNF, • Either test Ri for BCNF with respect to the restriction of F to Ri (that is, all FDs in F+ that contain only attributes from Ri) • or use the original set of dependencies F that hold on R, but with the following test: • for every set of attributes   Ri, check that + (the attribute closure of ) either includes no attribute of Ri- , or includes all attributes of Ri. • If the condition is violated by some   in F, the dependency (+ - )  Rican be shown to hold on Ri, and Ri violates BCNF. • We use above dependency to decompose Ri

BCNF Decomposition Algorithm result := {R };done := false;compute F +;while (not done) do if (there is a schema Riin result that is not in BCNF)then beginlet   be a nontrivial functional dependency that holds on Risuch that  Riis not in F +, and   = ;result := (result – Ri )  (Ri – )  (,  );end else done := true; Note: each Riis in BCNF, and decomposition is lossless-join.

Example of BCNF Decomposition • class (course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id) • Functional dependencies: • course_id→ title, dept_name, credits • building, room_number→capacity • course_id, sec_id, semester, year→building, room_number, time_slot_id • A candidate key {course_id, sec_id, semester, year}. • BCNF Decomposition: • course_id→ title, dept_name, credits holds • but course_id is not a superkey. • We replace class by: • course(course_id, title, dept_name, credits) • class-1 (course_id, sec_id, semester, year, building, room_number, capacity, time_slot_id)

BCNF Decomposition (Cont.) • course is in BCNF • How do we know this? • building, room_number→capacity holds on class-1 • but {building, room_number} is not a superkey for class-1. • We replace class-1 by: • classroom (building, room_number, capacity) • section (course_id, sec_id, semester, year, building, room_number, time_slot_id) • classroom and section are in BCNF.

Normalization

Normalization

Presentation Transcript

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

NORMALIZATION

Normalization

Normalization

Normalization

Normalization