380 likes | 394 Views
This course is an introduction to database systems, covering topics such as functional dependencies, normalization, and dependency-preserving decomposition.
E N D
EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2007 Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus
Administrative • Sample midterm questions will be posted at the class websites later today. • You may pick up an answering key for homework 2 • PL/SQL functionality has been enabled in the EECS postgresql server. Luke Huan Univ. of Kansas
Review • Functional dependencies • X!Y: X “determines” Y • If two rows agree on X, they must agree on Y • A generalization of the key concepts y1 Luke Huan Univ. of Kansas
Review • A FD X!Y is anon-key FDif X is not a superkey • Non-key functional dependencies represent a source of redundancy y1 … Luke Huan Univ. of Kansas
Review • How to deal with non-key FD? • Decomposition • A relation R is “broken down” to a set of relations R1, R2, …, Rn • Such that: • attr(R1) attr(R2) … attr(Rn) = attr(R) • And? • Lossless join • And? • Dependency preservation join (we are going to discuss this today) Luke Huan Univ. of Kansas
Review • What is a lossless join? • R = R1*R2 *… * Rn • By nature join, we can restore the data! Luke Huan Univ. of Kansas
Review • Is this a lossless join? • R = (Sid, Sname, Cid, Cname) • F = {Sid Sname, Cid Cname) • R1 = (Sid, Sname), R2 = (Cid, Cname) Luke Huan Univ. of Kansas
Review • How to achieve a lossless join • “foreign key” • R = R1 R2 • R1 R2 R1 • A combination of attributes in R2 that is a key of R1 • R1 R2 R2 • A combination of attributes in R1 that is a key of R2 Luke Huan Univ. of Kansas
Review • Why the foreign key mechanism guarantees a lossless join? • Well, thinking about the following relation X Y Luke Huan Univ. of Kansas
Review • Do you have to use the “foreign key” mechanism to achieve lossless join in binary decomposition? • Yes. See EN 11.1.4 Luke Huan Univ. of Kansas
Review • How to remove the redundancy that is introduced by non-key FDs in a relation R? • Pick up a non-key FD X Y, decompose R into R1 and R2, where • R1 has attributes X[Y • R2 has attributes X[Z, where Z = attr(R) – X – Y • Need to do this recursively? • When to stop? Luke Huan Univ. of Kansas
StudentID (email, EID) StudentGrade’ (email, Ename, PID, hours) StudentName (email, Ename) Grade (email, PID, hours) F = {EID!Ename, email; email!EID; EID, PID!hours} Exercise WorkOn (EID, Ename, email, PID, hours) Pick up email!EID pickup email!Ename Luke Huan Univ. of Kansas
Review • A relation R is in Normal Form if • For every non-trivial FD X!Y in R, X is a super key • That is, all FDs follow from “key ! other attributes” • BCNF applies binary decomposition recursively. • At each step, the lossless join property is maintained • Guarantee to remove ALL redundancies caused by FD Boyce-Codd Luke Huan Univ. of Kansas
key A X key A X key X A Three Types of non-key DF X A Partial dependency X A Transitive dependency I X A Transitive dependency II Luke Huan Univ. of Kansas
Summary • Philosophy behind BCNF (and 4NF):Data should depend on the key, the whole key, and nothing but the key! • Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement! Luke Huan Univ. of Kansas
Next • Dependency-preserving decomposition • 3NF (BCNF is too much) • Multivalued dependencies: another source of redundancy • 4NF (BCNF is not enough) Luke Huan Univ. of Kansas
Motivation for 3NF • Address (street_address, city, state, zip) • street_address, city, state!zip • zip!city, state • Keys • {street_address, city, state} • {street_address, zip} • BCNF? • Violation: zip!city, state Luke Huan Univ. of Kansas
To decompose or not to decompose Address1 (zip, city, state) Address2 (street_address, zip) • FD’s in Address1 • zip!city, state • FD’s in Address2 • None! • Hey, where is street_address, city, state!zip? • Cannot check without joining Address1 and Address2 back together • Problem: Some lossless join decomposition is not dependency-preserving • Dilemma: Should we get rid of redundancy at the expense of making constraints harder to enforce? Luke Huan Univ. of Kansas
Preserving Dependency • Projection of set of FDs F : If R is decomposed into X and Y the projection of F on X (denoted FX ) is the set of FDs U V in F+(closure of F , not just F)such that all of the attributes U, V are in X. (same holds for Y of course) • Decomposition of R into X and Y is dependencypreserving if (FX FY ) + = F + • i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +. Luke Huan Univ. of Kansas
An Example • Important to consider F + in this definition: • R = ABC, F = {A B, B C, C A} • R is decomposed into AB and BC. • Is this dependency preserving? • Is C A preserved????? • FAB contains A B and B A; • FBC contains B C and C B • So, (FAB FBC)+ contains C A Luke Huan Univ. of Kansas
Why Preserving FDs? • Another way to view lossless join: we lost one (or more) key FD. EID, PID hours Luke Huan Univ. of Kansas
Dependency-preserving Decomposition • General idea: • Keep all functional dependency and at least one key during natural join! • Input: a relation R and a set of FDs F • Output: a set of join-lossless, dep.-pres. relations Step 1: for each FD X Y, create a relation R’ = XY Step 2: if there is no key included in any of the created relations, add a relation with a key Step 3: remove all relations R’ which is a subset of some other relations • GuaranteeLossless-Join, Dep. Pres. Decomp! Luke Huan Univ. of Kansas
An Example • R = ABCDEFGH • FDs: A B, ACD E, EF GH • R is decomposed into • R1 = AB, R2 = ACDE, R3 = EFGH • No key yet! • Add R4 = ACDF Luke Huan Univ. of Kansas
Another Example • Address (street_address, city, state, zip) • FDs F • street_address, city, state!zip • zip!city, state • Keys • {street_address, city, state} • {street_address, zip} • Decomposeto Address1(street_address, city, state, zip), and Address2(zip, city, state) • Address2 is part of Address1, remove Addresss2 Luke Huan Univ. of Kansas
A X A X A X 3NF • R is in Third Normal Form (3NF) if for every non-trivial FD X!A (where A is single attribute), either • X is a superkey of R, or • A is a member of at least one key of R • Intuitively, BCNF decomposition on X!A would “break” the key containing A Partial dependency Transitive dependency I Transitive dependency II 2NF 3NF BCNF 2NF 3NF BCNF Luke Huan Univ. of Kansas
Notes about 3NF • Address (street_address, city, state, zip) • street_address, city, state!zip • zip!city, state • Is Address in 3NF? • means a set of attributes participate at least one key • Yes • Tradeoff: • Can enforce all original FD’s on individual decomposed relations • Might have some redundancy due to FD’s Luke Huan Univ. of Kansas
Minimal Cover for a Set of FDs • Minimal coverG for a set of FDs F: • Closure of F = closure of G. • Right hand side of each FD in G is a single attribute. • If we modify G by deleting an FD or by deleting attributes from an FD in G, the closure changes. • Intuitively, every FD in G is needed, and ``as small as possible’’ in order to get the same closure as F. • e.g., A B, ABCD E, EF GH, ACDF EG has the following minimal cover: • A B, ACD E, EF G and EF H
Dependency-preserving 3NF • Input: a relation R and a set of FDs F • Output: a set of join-lossless, dep.-pres. relations in 3NF Step 1: compute a minimal cover F’ of F Step 2: combine all FD X {A1}, X {A2} … X {An} to a single FD X {A1, A2 ,…, An} Step 3: for each FD X Y, create a relation R’ = XY Step 4: if there is no key included in any of the created relations add a key Step 5: remove all relations R’ which is a subset of some other relations • Minimal Cover impliesLossless-Join, Dep. Pres. Decomp in 3NF!!! • (in EN 11.2.3, p. 389) Luke Huan Univ. of Kansas
BCNF = no redundancy? • Student (SID, CID, club) • Suppose your classes have nothing to do with the clubs you join • FD’s? • None • BCNF? • Yes • Redundancies? • Tons! Luke Huan Univ. of Kansas
Multivalued dependencies • A multivalued dependency (MVD) has the formX)Y, where X and Y are sets of attributes in a relation R • X)Y means that whenever two rows in R agree on all the attributes of X, then we can swap their Y components and get two new rows that are also in R Must in the relation also Luke Huan Univ. of Kansas
MVD examples Student (SID, CID, club) • SID)CID • SID)club • Intuition: given SID, CID and club are “independent” • SID, CID)club • Trivial: LHS [ RHS = all attributes of R • SID, CID)SID • Trivial: LHS ¶ RHS • In general when two independent 1:N relations AB, AC are mixed together in R=ABC, a MVD may occur Luke Huan Univ. of Kansas
Complete MVD + FD rules • FD reflexivity, augmentation, and transitivity • MVD complementation:If X)Y, then X)attrs(R) – X – Y • MVD augmentation:If X)Y and VµW, then XW)YV • MVD transitivity:If X)Y and Y)Z, then X )Z – Y • Replication (FD is MVD):If X!Y, then X)Y • Coalescence:If X)Y and ZµY and there is some W disjoint from Y such that W!Z, then X!Z Try proving things using these! Luke Huan Univ. of Kansas
4NF • A relation R is in Fourth Normal Form (4NF) if • For every non-trivial MVD X)Y in R, X is a superkey • That is, all FD’s and MVD’s follow from “key ! other attributes” (i.e., no MVD’s, and no FD’s besides key functional dependencies) • 4NF is stronger than BCNF • Because every FD is also a MVD Luke Huan Univ. of Kansas
4NF decomposition algorithm • Find a 4NF violation • A non-trivial MVD X)Y in R where X is not a superkey • Decompose R into R1 and R2, where • R1 has attributes X[Y • R2 has attributes X[Z (Z contains attributes not in X or Y) • Repeat until all relations are in 4NF • Almost identical to BCNF decomposition algorithm • Any decomposition on a 4NF violation is lossless Luke Huan Univ. of Kansas
4NF violation: SID)CID Enroll (SID, CID) Join (SID, club) 4NF 4NF 4NF decomposition example Student (SID, CID, club) Luke Huan Univ. of Kansas
3NF, BCNF, 4NF, and beyond • Of historical interests • 1NF: All column values must be atomic • 2NF: Slightly more relaxed than 3NF Luke Huan Univ. of Kansas
Summary • Philosophy behind BCNF, 4NF:Data should depend on the key, the whole key, and nothing but the key! • Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement! Luke Huan Univ. of Kansas
3NF • How to create join lossless decomposition? • Binary decomposition with foreign key • N-nary decomposition with FD preserving and a key • How to achieve FD preserving decompositions? • For each FD, create a relation • Minimal cover of FD guarantees 3NF Luke Huan Univ. of Kansas