260 likes | 349 Views
CS 430 Database Theory. Winter 2005 Lecture 9: Fourth and Fifth Normal Forms. Decompositions. Given a relation R = { A 1 , … , A n } (all of the A i are unique), then a set of relation schemas D = { R 1 , … , R m } is a decomposition of R if R is the union of the R i , or
E N D
CS 430Database Theory Winter 2005 Lecture 9: Fourth and Fifth Normal Forms
Decompositions • Given a relation R = {A1, … , An} (all of the Ai are unique), then a set of relation schemas D = {R1, … , Rm} is a decomposition of R if R is the union of the Ri, or • That is, all the attributes of R appear in the Ri
Goodness of Decomposition • When is a decomposition “good”? • Two standards: • Dependency Preservation • Lossless (Nonadditive) Join
Dependency Preservation • Suppose we have a set of FDs F on R and a decomposition D = {R1, … , Rm}, the projection of F on R is the set Ri(F) = {XY F+ | XY Ri} That is, Ri(F) consists of all the FDs in the closure of F which are FDs on Ri
Dependency Preservation • D is Dependency Preserving with respect to F if the closure of the union of the projections of F onto the Ri is the closure of F. Or, (R1(F) … Rm(F))+ = F+ Or, if we project F onto the individual Ri, union the projections together, and compute the closure, we get the original closure of F. • Or, no information contained in F is lost by projecting F onto the individual Ri
Dependency Preservation Notes • Claim: It is possible to find a 3NF decomposition of R (each of Ri is 3NF) which is dependency preserving • See Algorithm 11.2, page 340. (No proof.) • Why do we want this? • When we update the database, we want to be able verify FDs by verifying them on the individual relations • The alternative is having to do joins to verify that our update is good, slowing system.
Lossless (Nonadditive) Join Property • D has the Lossless (Nonadditive) Join property with respect to a set of FDs F if for every relation state r of R that satisfies F: R1(r) … Rm(r) = r ( is the natural join) • Lossless means no loss of information • Nonadditive means that natural join doesn’t add any information
Lossless (Nonadditive) Join Notes • Algorithm 11.1, page 337, provides a way to test for this property • If D is a binary decomposition, D = {R1 , R2}, D is nonadditive if and only if: (R1 R2) (R1 -R2) is in F+, or (R1 R2) (R2 -R1) is in F+ That is, R1 R2 is a key for (at least) one of R1 orR2
Aside: Null Problems with Nulls • See Figures 11.2, 11.3, Text Book • Bottom line: If nulls are present, especially nulls in foreign keys then • May have to use outer joins instead of ordinary (inner) joins • Have to be careful if using aggregation (e.g. sum or average)
Multi-Value Dependencies • If X,Y attributes of R there is a Multi-Valued Dependency (MVD) X>Y, (we let Z =R - (XY )) if for all states r of R, and t1, t2 tuples of r such that t1[X ] = t2[X ], then there exist tuples t3, t4 of r such that: t3[X ] = t4[X ] = t1[X ] = t2[X ] t3[Y ] = t1[Y ], t4[Y ] = t2[Y ] t4[Z ] = t1[Z ], t3[Z ] = t2[Z ] • An MVD X>Y, is trivial if Y X , or X Y = R
Fourth Normal Form • R is 4NF with respect to a set of FDs and MVDs F if for every non-trivial MVD X>Y, X is a superkey of R. • See Figure 11.4(a, b) in Text Book.
Fourth Normal Form Notes • If a relation is not 4NF then there are update anomalies: • If you add a relation you must also add the corresponding relations • D is a lossless (nonadditive) decomposition of R, D = {R1 , R2}, with respect to a set of FDs and MVDs F if and only if: (R1R2) > (R1-R2), which is the same as (R1R2) > (R2-R1)
Fifth Normal Form • JD(R1, … , Rm) is a Join Dependency (JD) for a decomposition {R1, … , Rm} of R if for every legal state r of R: R1(r) … Rm(r) = r • A JD is trivial if some Ri = R • A relation R is in Fifth Normal Form (5NF) if for every non-trivial JD of R, every Ri is a superkey of R
Notes on Fifth Normal Form • An MVD is a JD with m = 2 • Finding all the JDs of a database of any size is probably not feasible • Example: See Figure 11.4 (c, d) of Text Book
Products, Salesmen, TerritoriesA Data Design Problem • Salesman • Sells specific products • Has specific territories • Has a quota: How much he is supposed to sell • Product • Sold by salesmen • Has a price • Territory • Worked by salesmen
ER Model Version 1 Salesman Quota Sells Product Works Territory Product Territory Price A Salesman can sell any Product he sells in any Territory he works. A Product has one Price for all Salesmen and all Territories. A Salesman has one Quota for all his sales. Note: Each Entity and Relation becomes a relation in our database.
ER Model Version 2 Salesman Sells Product Works Territory Quota Product Territory Price A Salesman has a Quota for each product he sells.
ER Model Version 3 Salesman Sells Product Works Territory Quota Sold In Product Territory Price Products are only sold in specific Territories. A Product has a Price set for each Territory where it is sold. A Salesman can sell any Product he sells in any Territory he works where that Product is sold. Note JD between “Sells Product”, “Sold In”, and “Works Territory”.
ER Model Version 4 Salesman Quota Sells Product in Territory Sells Product Product Territory Sold In Price A Salesman is assigned to sell specific Products in specific Territories. A Salesman has a Quota for each Product he sells in each Territory. Possible Integrity Constraint: Keys of “Sells Product” and “Sold In” are projections of “Sells Product in Territory”.
ER Model Version 4A Salesman Sells Product in Territory Quota Product Territory Sold In Price Possible Integrity Constraint: Key of “Sold In” is projection of “Sells Product in Territory”. (But I might want to assign a Price even though no Salemen have yet been assigned that Product in that Territory.)
Employee Employee ID Number Employee Name Work Location Manager Manager ID Number Manager Name Territory Territory Number Territory Name Territory Bonus Product Product Number Product Name Price Actual_Sales Target_Sales Other Quota Commission Rate Commission Manager Commission Sample Fields
Possible Functional Dependencies • {Employee ID Number} • {Employee Name, Work Location, Manager ID Number, Manager Commission(?)} • {Manage ID Number} • {Manager Name, Manager Commission(?)} • {Territory Number} • {Territory Name, Territory Bonus(?)} • {Product Number} • {Product Name, Price(?), Actual Sales(?), Target Sales (?)}
More Possible FDs • {Employee ID Number, Territory Number} • {Territory Bonus(?), Quota(?), Commission Rate(?)} • {Employee ID Number, Product Number} • {Quota(?), Commission Rate(?)} • {Territory Number, Product Number} • {Price(?), Actual Sales(?), Target Sales(?), Territory Bonus(?), Commission Rate(?), Commission(?), Manager Commission(?)}
More Possible FDs • {Employee ID Number, Product Number, Territory Number} • {Quota(?), Actual Sales(?), Target Sales(?), Commission Rate(?), Commission(?) , Manager Commission(?)} • {Actual Sales, Commission Rate} • {Commission} • {Actual Sales, Manager Commission Rate} • {Manager Commission}
Proposed Solution • Employee(Employee ID Number, Employee Name, Work Location, Manager ID Number) • Manager(Manager ID Number, Manager Name, Manager Commission) • Territory(Territory Number, Territory Name) • Product(Product Number, Product Name)
More Proposed Solution • Product_Territory(Product Number, Territory Number, Price) • Employee_Territory(Employee ID Number, Territory Number, Territory Bonus) • Employee_Product(Employee ID Number, Product Number, Commission Rate) • Employee_Product_Territory(Employee ID Number, Product Number, Territory Number, Quota, Actual Sales, Target Sales, Commission, Manager Commission)