280 likes | 389 Views
Logical Database Design (3 of 3). John Ortiz. Normalization. If a relation is not in BCNF or 3NF, we refine it by decomposing it into two or more smaller relation schemas that are in the normal form. Decomposition has to be used carefully, since there are potential problems.
E N D
Logical Database Design (3 of 3) John Ortiz
Normalization • If a relation is not in BCNF or 3NF, we refine it by decomposing it into two or more smaller relation schemas that are in the normal form. • Decomposition has to be used carefully, since there are potential problems. • What are desirable properties of a decomposition, and how to test them? • How to obtain a decomposition with some desirable properties? Logical Database Design (2)
Decomposition of a Relation • Let R be a relation schema. A decomposition of R, demoted by D = {R1, R2, ..., Rn}, is a set of relation schemas such that R = R1 ... Rn. • If {R1, R2, ..., Rn} is a decomposition of R and r is an instance of R, then r R1(r) R2(r) . . . Rn(r) • Information may be lost (i.e. wrong tuples may be added by the natural join) due to a decomposition. Logical Database Design (2)
SC SR SG SR SG An Example of Information Loss • Before • After Logical Database Design (2)
Lossless Join Decomposition • Let R be a relation schema, and D = {R1, R2, ..., Rn} be a decomposition of R. D is a lossless (non-additive) join decomposition of R if for every legal instance r of R, we have r = R1(r) R2(r) . . . Rn(r) • Theorem: Let F be a set of FDs over R, and D = {R1, R2} be a decomposition of R. D is a lossless-join decomposition if and only if • R1 R2 R1 - R2 is in F+; or • R1 R2 R2 - R1 is in F+. Logical Database Design (2)
Lossless Join: An Example Consider F = {B AH, L CAt} over Bank-Loans(Bank, Assets, Headquarter, Loan#, Customer, Amount). Let D = {Banks(B,A,H), Loans(B,L,C,At)}. Since Banks Loans = B AH = Banks - Loans is in F+ (since it is already in F), D is a lossless-join decomposition. • What if the decomposition contains more than two relations. Logical Database Design (2)
Test for Lossless Join * AlgorithmTestLJ (Chase) Input: A relation schema R(A1, …, Am), a set of FDs F, and a decomposition D = {R1, …, Rn}. Output: Yes, if D is a Lossless join; no, otherwise. Method: • Create an n m table T (labeled by Ai and Rj). • If Ri contains Aj, place aj at Ti,j. Otherwise, place bij at Ti,j. Logical Database Design (2)
TestLJ (cont.) * • Repeat for each FD X Y in F do For all rows with identical symbols on X do make the symbols on Y identical. (choose aj over bij whenever possible) Until no more change can be made. • Return yes if there is a row of aj’s. Otherwise, return no. Logical Database Design (2)
B A H L C At BAH a1 a2 a3 b14 b15 b16 BLCAt a1 b22 b23 a4 a5 a6 B A H L C At BAH a1 a2 a3 b14 b15 b16 BLCAt a1a2 a3 a4 a5 a6 TestLJ: An Example Continue with the previous example. • Set up the table T. • Enforce B AH. • Need to repeat until no more changes. Logical Database Design (2)
Dependency-Preserving Decomposition • Let F be a set of FDs over R, and D = {R1, R2, ..., Rn} be a decomposition of R. D is a dependency-preserving decomposition if F+ = (R1(F) R2(F) . . . Rn(F))+ where for i = 1, …, n Ri(F) = { X Y | X Y F and XY Ri}. • Restrict FDs to local relations. If all “global” FDs can be derived from “local” FDs, all dependencies are preserved. Logical Database Design (2)
Dependency Preservation: An Example Consider F = {CS Z, Z C} over R(City, Street, Zipcode), and D ={R1(S, Z), R2(C, Z)}. Then R1(F) = {} and R2(F) = {Z C} (consider non-trivial FDs only) Since CS Z F+ but CS Z (R1(F) R2(F))+, D is not dependency-preserving. Logical Database Design (2)
Test for Dependency Preservation Algorithm TestDP Input: A relation schema R, A set of FDs F over R, a decomposition D= {R1, R2, ..., Rn} of R. Output: Yes, if D is dependency-preserving; no, otherwise. Method: for every X Y F if Ri such that XY Ri then X Y is preserved; Logical Database Design (2)
TestDP (cont.) else W := X; repeat for i from 1 to n do W := W ((W Ri)+ Ri); until there is no change to W; if Y W then X Y is preserved; if every X Y is preserved then return yes; else return no. • Derive global FDs using only local FDs. Logical Database Design (2)
TestDP: An example Consider F = {A B, B C, C D, D A } over R(A, B, C, D), & D = {R1(A,B), R2(B,C), R3(C,D)}. Is D a dependency-preserving decomposition? Since AB R1, A B is preserved. Since BC R2, B C is preserved. Since CD R3, C D is preserved. Since DA is not in any one of the three relations, we need to compute W. Logical Database Design (2)
TestDP: An example (cont.) * Initialization: W = D; first iteration: W = D ((D AB)+ AB) = D; W = D ((D BC)+ BC) = D; W = D ((D CD)+ CD) = D (D+ CD) = D (ABCD CD) = CD; • W changed from D to CD. Logical Database Design (2)
TestDP: An example (cont.) * second iteration: W = CD ((CD AB)+ AB) = CD; W = CD ((CD BC)+ BC) = CD (C+ BC) = BCD; W = BCD ((BCD CD)+ CD) = BCD; • W changed from CD to BCD. Logical Database Design (2)
Normalization • It is good to have BCNF relation schemas. • If a relation schema is not in BCNF, then decompose it into a set of relation schemas: • every new schema is in BCNF; • it is lossless-join (can guarantee); • it is dependency-preserving (no guarantee). • If not possible to have all nice properties, be happy with a lossless join, dependency preserving 3NF decomposition (can guarantee) Logical Database Design (2)
Normalization to BCNF Algorithm LLJD-BCNF Input: R: A relation schema F: A set of FDs satisfied by R. Output: A lossless-join decomposition D = {R1, …, Rn}, such that each Ri is in BCNF. Logical Database Design (2)
Normalization to BCNF (cont.) Method: D := {R}; while RiDthat is not in BCNF do begin Find an FD X Y such that (1) Ri is not BCNF because of X Y, and (2) XY Ri; D:= D - Ri {Ri - Y, XY} end; Logical Database Design (2)
Normalization to BCNF (cont.) * Theorem: Algorithm LLJD-BCNF is correct. Proof (sketch): • Every schema in D is in BCNF because the algorithm will not stop otherwise. • D is a lossless-join decomposition because in each iteration, Ri is decomposed into 2 smaller schemas (Ri - Y) and XY and they satisfy the condition: (Ri - Y) XY = X Y = (XY - (Ri - Y)). Logical Database Design (2)
Normalization to BCNF: An Example Consider F = {B AH, L CAt} over Bank-Loans(Bank, Assets, Headquarter, Loan#, Customer, Amount), and a set of FDs, Candidate keys: LB Initialization: D = {BAHLCAt } Logical Database Design (2)
Normalize to BCNF: An Example * 1st iteration: • Ri = BAHLCAt is not in BCNF because B AH is not a trivial FD and B is not a superkey. • Replace BAHLCAt by BAH and BLCAt. Hence: D = {BAH, BLCAt}. BAH is in BCNF because in BAH, B is a candidate key. Logical Database Design (2)
Normalize to BCNF: An Example * 2nd iteration: • Ri = BLCAt is not in BCNF because L CAt is not a trivial FD and L is not a superkey in BLCAt. • Replace BLCAt by CLAt and BL. Hence, D= {BAH, CLAt, BL}. CLAt is in BCNF because in CLAt, L is a candidate key. BL is in BCNF (see theorem on next page). Final result: D = {BAH, CLAt, BL}. • Dhappens to be dependency-preserving. • Any relation schema with exactly two attributes is in BCNF. Logical Database Design (2)
Normalize to BCNF: Another Ex. * Consider R(City, Street, Zipcode), and F = {CS Z, Z C }. Candidate keys: CS, ZS. Initialization: D = {CSZ}; 1st iteration: • R = CSZ is not in BCNF because Z C is not a trivial FD and Z is not a superkey. • D = {ZC, ZS}. D is not dependency-preserving because CS Z is not preserved. Logical Database Design (2)
Equivalence of FD Sets Let F and G be two sets of FDs satisfied by R. F and G are equivalent, denoted by F G, if F+ = G+. Example: F = {B CD, AD E, B A} and G = {B CDE, B ABC, AD E} are equivalent. • Check to see that every FD in F is also in G+ and that every FD in G is also in F+ Logical Database Design (2)
Extraneous Attributes Let F be a set of FDs. F contains an extraneous attribute A if there is an FD X Y in F, such that • either A X, and [F - {X Y} {X - {A} Y}] F; • or A Y, and [F - {X Y} {X Y - {A}}] F. • This is a “useless” attribute either at the left side or at the right side of an FD. Logical Database Design (2)
Summary • A good schema should have three properties: • BCNF (or 3NF if BCNF can not be obtained) • Lossless join • Dependency preserving • Lossless join BCNF decomposition is guaranteed, need to check for dependency preservation • Lossless join, dependency preserving 3NF decomposition is guaranteed (need to find the minimal cover) Logical Database Design (2)
Look Ahead • Next topic: SQL Overview & DDL • Read textbook: • Chapter 8, 10.1-10.6 Logical Database Design (2)