610 likes | 756 Views
CS411 Database Systems. 04: Relational Schema Design. Reminder: redundancy causes trouble. Inconsistency. What trouble will this repetition cause? update anomaly = update one copy of Fred’s SSN but not the other deletion anomaly = delete all Fred’s phones,
E N D
CS411Database Systems 04: Relational Schema Design
Reminder: redundancy causes trouble Inconsistency. What trouble will this repetition cause? update anomaly = update one copy of Fred’s SSN but not the other deletion anomaly = delete all Fred’s phones, lose his SSN as a side effect
Non-solution: multiple values in one field First Normal Form: Only one value in each field.
Your common sense will tell you how to fix this schema No more update or delete anomalies.
What if you don’t have common sense? There is a theory to tell you what to do! R*must: Preserve the information of R Have minimal redundancy “Preserve dependencies”:easy to check R’s constraints Give good query performance(but the theory won’t help you there) normalize R This theory formalizes the concept of redundancy
Functional dependencies generalize the idea of a key If two tuples agree on the attributes A1, …, An, then they must also agree on attributes B1, …, Bn. equiv A1, …, An functionally determine B1, …, Bn. equiv A1, …, An B1, …, Bn
EmpID Name, Phone, Office Office Phone Phone Office Name EmpID isn’t likely to hold in all instances of this schema, though it holds in this instance More generally, an instance can tell you many FDs that don’t hold, but not all those that do.
Use your common sense to find the FDs in the world around you Product: name price, manufacturer Person:ssn name, age Company: name stock price, president School: student, course, semester grade
We can define keys in terms of FDs Key of a relation R is a set of attributes that • functionally determines all attributes of R • none of its proper subsets have this property. Superkey= set of attributes that contains a key.
Person name ssn address Given a relation constructed from an ER diagram, what is its key? 1. If the relation comes from an entity set, the key of the relation is thekey of the entity set. Person(address, name, ssn)
Given a relation constructed from an ER diagram, what is its key? 2. If the relation comes from a many-many relationship, the key of the relation is the set of all key attributes in the relations corresponding to the entity sets name buys Person Product price name ssn date buys(name, ssn, date)
Product Purchase Store Payment Method Person Given a relation constructed from an ER diagram, what is its key? But: if there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. sname name card-no ssn Purchase(name, sname, ssn, card-no)
The textbook has more rules on finding keys: • Many-one, one-many, one-one relationships • Multi-way relationships • Weak entity sets (but there is no substitute for common sense)
Reasoning with FDs1) Closure of a set of FDs2) Closure of a set of attributes
The closure S+ of a set S of FDs is the set of all FDs logically implied by S. R = {A, B, C, G, H, I} S = {A B, A C, CG H, CG I, B H} Does A H hold? You can prove whether it does!
Compute the closure S+ of S using Armstrong’s Axioms 1. Reflexivity A1 ... An every subset of A1 ... An 2. Augmentation If A1 ... An B1 ... Bm, then A1 ... An C1 ... Ck B1 ... Bm C1 ... Ck 3. Transitivity If A1 ... An B1 ... Bmand B1 ... BmC1 ... Ck, then A1 ... AnC1 ... Ck
How to compute S+ using Armstrong's Axioms S+ = S; loop { For each f in S, apply the reflexivity and augmentation rules and add the new FDs to S+. For each pair of FDs in S, apply the transitivity rule and add the new FDs to S+ } until S+ does not change any more.
You can infer additional rules from Armstrong’s Axioms Union If X Y and X Z, then X YZ (X, Y, Z are sets of attributes) Decomposition X YZ, then X Y and X Z Pseudo-transitivity X Y and YZ U, then XZ U
The closure of a set of attributes contains everything they functionally determine Given a set S of dependencies, the closure of a set of attributes {A1 ... An}, written{A1 ... An}+, is { B such that any relation that satisfies S also satisfies A1 ... AnB } +
It is easy to compute the closure of a set of attributes Start with X = {A1 ... An}. repeat until X doesn’t change do: if B1 ... BmC is in S, and B1 ... Bm are all in X, andC is not in X then add C to X.
A B C A D E B D A F B {A, B}+= {A, B, C, D, E} {A, F}+= {A, F, B, D, C, E}
What is the attribute closure good for? • Test if X is a superkey • compute X+, and check if X+ contains all attrs of R • Check if X Y holds • by checking if Y is contained in X+ • Another (not so clever) way to compute closure S+ of FDs • for each subset of attributes X in relation R, compute X+ with respect to S • for each subset of attributes Y in X+, output the FDX Y
Announcements MySQL accounts available (see newsgroup) Sign up for Class Project 3.5 time slot (see newsgroup) More than 4 homeworks, but no extra work Keep the same pace
From “The top 10 signs you hired the wrong SQL DBA”, at Sue’s Edream #3. They are convinced that the error generated by trying to insert a duplicate primary key value is a bug in SQL.
Desirable Properties of Schema Refinement1) minimize redundancy2) avoid info loss3) easy to check dependencies4) ensure good query performance
Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = obsolete Boyce Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF) Others...
Boyce-Codd Normal Form A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R. An FD is trivial if all the attributes on its right-hand side are also on its left-hand side.
What are the nontrivial functional dependencies? SSN Name (plus the FDs that can be derived from that) What are the keys? The only key is {SSN, Phone Number}. How do I know? Augmentation + minimality. Is it in BCNF? No. SSN is not a key.
What if we are in a situation where Phone Number SSN? What are the nontrivial FDs? Phone Number SSN SSN Name (plus FDs derived from these) What are the keys? Only {Phone Number}. How do I know? Augmentation, transitivity, minimality. Is it in BCNF? No. A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R.
What about that alternative schema we recommended earlier---are they in BCNF? For each relation: What are its important FDs? What are its keys? Is it in BCNF? A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R.
What about that alternative schema we recommended earlier---are they in BCNF? Important FDS: SSN Name Keys: {SSN}. Is it in BCNF? Yes. If Phone Number SSN holds Important FDS: Phone Number SSN. Keys: {Phone Number} Is it in BCNF? Yes. If Phone Number SSN doesn’t hold Important FDS: none. Keys: {SSN, Phone Number} Is it in BCNF? Yes.
Name Price, CategoryWhat are the keys for this one?Is it in BCNF? True or False: Any 2-attribute relation is in BCNF. A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An}is a superkeyfor R.
Name Price, CategoryWhat are the keys for this one?Is it in BCNF? True or False: Any 2-attribute relation is in BCNF. Answers: Key = {Name}, it’s in BCNF, true.
If relation R is not in BCNF, you can pull out the violating part(s) until it is. 1. Find a dependency that violates BCNF: AB R’s Other Attributes A = {A1, ..., An} B = {B1, …, Bm} R
2. Break R into R1 and R2 as follows. R’s Other Attributes A B R becomes A R’s Other Attributes A B R1 R2
3. Repeat until all relations are in BCNF. Heuristic to speed things up and reduce the final number of relations: Make B as large as possible! won’t give as good query performance as
Can you turn this one into BCNF? Person Remember that I promised that the use of these algorithms would NOT guarantee a good schema. Functional dependencies: NetID Name, Birthdate, EyeColor, CanVote Birthdate CanVote The key is {NetID, Parent} so this FD violates BCNF Personinfo Voting But this FD is still violated, so we are not in BCNF yet
One more split needed to reach BCNF Person Functional dependencies: NetID Name, Birthdate, EyeColor, CanVote Birthdate CanVote We split the old PersonInfo into two relations. Now everything is in BCNF. Personinfo2 Voting Parentinfo
An Official BCNF Decomposition Algorithm Input: relation R, set S of FDs over R. Output: a set of relations in BCNF. 1. Compute S+. 2. Compute keys for R (from ER or from S+). 3. Use S+ and keys to check if R is in BCNF. If not: a. Pick a violation FD A B. b. Expand B as much as possible, by computing A+. c. Create R1 = A B, and R2 = R B. d. Find the FDs over R1, using S+.Repeat for R2. e. Recurse on R1 & its set of FDs. Repeat for R2. 4. Else R is already in BCNF; add R to the output.
Any good schema decomposition should be lossless. Natural join Project the instance R decompose R1 … Rn Lossless iff a trip around the outer circle gives you back exactly the original instance of R.
A lossy decomposition gives you too many tuples! Natural join Project the instance ED(emp, dept) decompose E(emp) D(dept) A trip around this outer circle gives you the cartesian product of employees and departments.
BCNF decompositions are always lossless. Natural join Project the instance R(A, B, C) decompose A C R1(A, B) R2(A, C)
Why don’t we get garbage? Natural join Project the instance R(A, B, C) decompose A C R1(A, B) R2(A, C)
Why don’t we get garbage? Natural join Project the instance R(A, B, C) decompose A C But this violates A C! R1(A, B) R2(A, C)
BCNF doesn’t always have a dependency-preserving decomposition. Third normal form may be preferable to having to take a join to check dependencies after an update.
Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Boyce Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF) Others...
A schema doesn’t preserve dependencies if you have to do a join to check an FD Client, Office Account Account Office Key is {Client, Office} violates BCNF decompose into BCNF Can’t check this FD now without doing a join No nontrivial FDs Account Office
A schema doesn’t preserve dependencies if you have to do a join to check an FD A B B C Key = {A} violates BCNF What about AC? Do we have to do a join to check it? decompose into BCNF No. So this BCNF decomposition does preserve dependencies. A B B C
If a BCNF decomposition doesn’t preserve dependencies, use 3rd Normal Forminstead. R is in 3NF if for every nontrivial FD A1, ..., An B, either {A1, ..., An} is a superkey, or B is part of a key. Weakens BCNF. Informally: everything depends on the key or is part of a key. Reminder: no proper subset of a key is a key!