1 / 61

CS411 Database Systems

CS411 Database Systems. 04: Relational Schema Design. Reminder: redundancy causes trouble. Inconsistency. What trouble will this repetition cause? update anomaly = update one copy of Fred’s SSN but not the other deletion anomaly = delete all Fred’s phones,

hoshi
Download Presentation

CS411 Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS411Database Systems 04: Relational Schema Design

  2. Reminder: redundancy causes trouble Inconsistency. What trouble will this repetition cause? update anomaly = update one copy of Fred’s SSN but not the other deletion anomaly = delete all Fred’s phones, lose his SSN as a side effect

  3. Non-solution: multiple values in one field First Normal Form: Only one value in each field.

  4. Your common sense will tell you how to fix this schema No more update or delete anomalies.

  5. What if you don’t have common sense? There is a theory to tell you what to do! R*must: Preserve the information of R Have minimal redundancy “Preserve dependencies”:easy to check R’s constraints Give good query performance(but the theory won’t help you there) normalize R This theory formalizes the concept of redundancy

  6. Functional Dependencies

  7. Functional dependencies generalize the idea of a key If two tuples agree on the attributes A1, …, An, then they must also agree on attributes B1, …, Bn. equiv A1, …, An functionally determine B1, …, Bn. equiv A1, …, An  B1, …, Bn

  8. EmpID Name, Phone, Office Office  Phone Phone Office Name EmpID isn’t likely to hold in all instances of this schema, though it holds in this instance More generally, an instance can tell you many FDs that don’t hold, but not all those that do.

  9. Use your common sense to find the FDs in the world around you Product: name price, manufacturer Person:ssn name, age Company: name  stock price, president School: student, course, semester  grade

  10. We can define keys in terms of FDs Key of a relation R is a set of attributes that • functionally determines all attributes of R • none of its proper subsets have this property. Superkey= set of attributes that contains a key.

  11. Person name ssn address Given a relation constructed from an ER diagram, what is its key? 1. If the relation comes from an entity set, the key of the relation is thekey of the entity set. Person(address, name, ssn)

  12. Given a relation constructed from an ER diagram, what is its key? 2. If the relation comes from a many-many relationship, the key of the relation is the set of all key attributes in the relations corresponding to the entity sets name buys Person Product price name ssn date buys(name, ssn, date)

  13. Product Purchase Store Payment Method Person Given a relation constructed from an ER diagram, what is its key? But: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. sname name card-no ssn Purchase(name, sname, ssn, card-no)

  14. The textbook has more rules on finding keys: • Many-one, one-many, one-one relationships • Multi-way relationships • Weak entity sets (but there is no substitute for common sense)

  15. Reasoning with FDs1) Closure of a set of FDs2) Closure of a set of attributes

  16. The closure S+ of a set S of FDs is the set of all FDs logically implied by S. R = {A, B, C, G, H, I} S = {A  B, A  C, CG  H, CG  I, B  H} Does A  H hold? You can prove whether it does!

  17. Compute the closure S+ of S using Armstrong’s Axioms 1. Reflexivity A1 ... An every subset of A1 ... An 2. Augmentation If A1 ... An B1 ... Bm, then A1 ... An C1 ... Ck B1 ... Bm C1 ... Ck 3. Transitivity If A1 ... An B1 ... Bmand B1 ... BmC1 ... Ck, then A1 ... AnC1 ... Ck

  18. How to compute S+ using Armstrong's Axioms S+ = S; loop { For each f in S, apply the reflexivity and augmentation rules and add the new FDs to S+. For each pair of FDs in S, apply the transitivity rule and add the new FDs to S+ } until S+ does not change any more.

  19. You can infer additional rules from Armstrong’s Axioms Union If X  Y and X  Z, then X  YZ (X, Y, Z are sets of attributes) Decomposition X  YZ, then X  Y and X  Z Pseudo-transitivity X  Y and YZ  U, then XZ  U

  20. The closure of a set of attributes contains everything they functionally determine Given a set S of dependencies, the closure of a set of attributes {A1 ... An}, written{A1 ... An}+, is { B such that any relation that satisfies S also satisfies A1 ... AnB } +

  21. It is easy to compute the closure of a set of attributes Start with X = {A1 ... An}. repeat until X doesn’t change do: if B1 ... BmC is in S, and B1 ... Bm are all in X, andC is not in X then add C to X.

  22. A B C A D E B  D A F  B {A, B}+= {A, B, C, D, E} {A, F}+= {A, F, B, D, C, E}

  23. What is the attribute closure good for? • Test if X is a superkey • compute X+, and check if X+ contains all attrs of R • Check if X  Y holds • by checking if Y is contained in X+ • Another (not so clever) way to compute closure S+ of FDs • for each subset of attributes X in relation R, compute X+ with respect to S • for each subset of attributes Y in X+, output the FDX  Y

  24. Announcements MySQL accounts available (see newsgroup) Sign up for Class Project 3.5 time slot (see newsgroup) More than 4 homeworks, but no extra work Keep the same pace

  25. From “The top 10 signs you hired the wrong SQL DBA”, at Sue’s Edream #3. They are convinced that the error generated by trying to insert a duplicate primary key value is a bug in SQL.

  26. Desirable Properties of Schema Refinement1) minimize redundancy2) avoid info loss3) easy to check dependencies4) ensure good query performance

  27. Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = obsolete Boyce Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF) Others...

  28. Boyce-Codd Normal Form A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R. An FD is trivial if all the attributes on its right-hand side are also on its left-hand side.

  29. What are the nontrivial functional dependencies? SSN Name (plus the FDs that can be derived from that) What are the keys? The only key is {SSN, Phone Number}. How do I know? Augmentation + minimality. Is it in BCNF? No. SSN is not a key.

  30. What if we are in a situation where Phone Number  SSN? What are the nontrivial FDs? Phone Number  SSN SSN  Name (plus FDs derived from these) What are the keys? Only {Phone Number}. How do I know? Augmentation, transitivity, minimality. Is it in BCNF? No. A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R.

  31. What about that alternative schema we recommended earlier---are they in BCNF? For each relation: What are its important FDs? What are its keys? Is it in BCNF? A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R.

  32. What about that alternative schema we recommended earlier---are they in BCNF? Important FDS: SSN  Name Keys: {SSN}. Is it in BCNF? Yes. If Phone Number  SSN holds Important FDS: Phone Number  SSN. Keys: {Phone Number} Is it in BCNF? Yes. If Phone Number  SSN doesn’t hold Important FDS: none. Keys: {SSN, Phone Number} Is it in BCNF? Yes.

  33. Name  Price, CategoryWhat are the keys for this one?Is it in BCNF? True or False: Any 2-attribute relation is in BCNF. A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R.

  34. Name  Price, CategoryWhat are the keys for this one?Is it in BCNF? True or False: Any 2-attribute relation is in BCNF. Answers: Key = {Name}, it’s in BCNF, true.

  35. If relation R is not in BCNF, you can pull out the violating part(s) until it is. 1. Find a dependency that violates BCNF: AB R’s Other Attributes A = {A1, ..., An} B = {B1, …, Bm} R

  36. 2. Break R into R1 and R2 as follows. R’s Other Attributes A B R becomes A R’s Other Attributes A B R1 R2

  37. 3. Repeat until all relations are in BCNF. Heuristic to speed things up and reduce the final number of relations: Make B as large as possible! won’t give as good query performance as

  38. Can you turn this one into BCNF? Person Remember that I promised that the use of these algorithms would NOT guarantee a good schema. Functional dependencies: NetID Name, Birthdate, EyeColor, CanVote Birthdate CanVote The key is {NetID, Parent} so this FD violates BCNF Personinfo Voting But this FD is still violated, so we are not in BCNF yet

  39. One more split needed to reach BCNF Person Functional dependencies: NetID Name, Birthdate, EyeColor, CanVote Birthdate CanVote We split the old PersonInfo into two relations. Now everything is in BCNF. Personinfo2 Voting Parentinfo

  40. An Official BCNF Decomposition Algorithm Input: relation R, set S of FDs over R. Output: a set of relations in BCNF. 1. Compute S+. 2. Compute keys for R (from ER or from S+). 3. Use S+ and keys to check if R is in BCNF. If not: a. Pick a violation FD A  B. b. Expand B as much as possible, by computing A+. c. Create R1 = A  B, and R2 = R  B. d. Find the FDs over R1, using S+.Repeat for R2. e. Recurse on R1 & its set of FDs. Repeat for R2. 4. Else R is already in BCNF; add R to the output.

  41. Any good schema decomposition should be lossless. Natural join Project the instance R decompose R1 … Rn Lossless iff a trip around the outer circle gives you back exactly the original instance of R.

  42. A lossy decomposition gives you too many tuples! Natural join Project the instance ED(emp, dept) decompose E(emp) D(dept) A trip around this outer circle gives you the cartesian product of employees and departments.

  43. BCNF decompositions are always lossless. Natural join Project the instance R(A, B, C) decompose A  C R1(A, B) R2(A, C)

  44. Why don’t we get garbage? Natural join Project the instance R(A, B, C) decompose A  C R1(A, B) R2(A, C)

  45. Why don’t we get garbage? Natural join Project the instance R(A, B, C) decompose A  C But this violates A  C! R1(A, B) R2(A, C)

  46. BCNF doesn’t always have a dependency-preserving decomposition. Third normal form may be preferable to having to take a join to check dependencies after an update.

  47. Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Boyce Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF) Others...

  48. A schema doesn’t preserve dependencies if you have to do a join to check an FD Client, Office  Account Account  Office Key is {Client, Office} violates BCNF decompose into BCNF Can’t check this FD now without doing a join No nontrivial FDs Account  Office

  49. A schema doesn’t preserve dependencies if you have to do a join to check an FD A B B C Key = {A} violates BCNF What about AC? Do we have to do a join to check it? decompose into BCNF No. So this BCNF decomposition does preserve dependencies. A  B B  C

  50. If a BCNF decomposition doesn’t preserve dependencies, use 3rd Normal Forminstead. R is in 3NF if for every nontrivial FD A1, ..., An B, either {A1, ..., An} is a superkey, or B is part of a key. Weakens BCNF. Informally: everything depends on the key or is part of a key. Reminder: no proper subset of a key is a key!

More Related