1 / 38

EECS 647: Introduction to Database Systems

This course is an introduction to database systems, covering topics such as functional dependencies, normalization, and dependency-preserving decomposition.

vunderwood
Download Presentation

EECS 647: Introduction to Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2007 Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus

  2. Administrative • Sample midterm questions will be posted at the class websites later today. • You may pick up an answering key for homework 2 • PL/SQL functionality has been enabled in the EECS postgresql server. Luke Huan Univ. of Kansas

  3. Review • Functional dependencies • X!Y: X “determines” Y • If two rows agree on X, they must agree on Y • A generalization of the key concepts y1 Luke Huan Univ. of Kansas

  4. Review • A FD X!Y is anon-key FDif X is not a superkey • Non-key functional dependencies represent a source of redundancy y1 … Luke Huan Univ. of Kansas

  5. Review • How to deal with non-key FD? • Decomposition • A relation R is “broken down” to a set of relations R1, R2, …, Rn • Such that: • attr(R1)  attr(R2)  …  attr(Rn) = attr(R) • And? • Lossless join • And? • Dependency preservation join (we are going to discuss this today) Luke Huan Univ. of Kansas

  6. Review • What is a lossless join? • R = R1*R2 *… * Rn • By nature join, we can restore the data! Luke Huan Univ. of Kansas

  7. Review • Is this a lossless join? • R = (Sid, Sname, Cid, Cname) • F = {Sid  Sname, Cid  Cname) • R1 = (Sid, Sname), R2 = (Cid, Cname) Luke Huan Univ. of Kansas

  8. Review • How to achieve a lossless join • “foreign key” • R = R1  R2 • R1 R2  R1 • A combination of attributes in R2 that is a key of R1 • R1 R2  R2 • A combination of attributes in R1 that is a key of R2 Luke Huan Univ. of Kansas

  9. Review • Why the foreign key mechanism guarantees a lossless join? • Well, thinking about the following relation X  Y Luke Huan Univ. of Kansas

  10. Review • Do you have to use the “foreign key” mechanism to achieve lossless join in binary decomposition? • Yes. See EN 11.1.4 Luke Huan Univ. of Kansas

  11. Review • How to remove the redundancy that is introduced by non-key FDs in a relation R? • Pick up a non-key FD X  Y, decompose R into R1 and R2, where • R1 has attributes X[Y • R2 has attributes X[Z, where Z = attr(R) – X – Y • Need to do this recursively? • When to stop? Luke Huan Univ. of Kansas

  12. StudentID (email, EID) StudentGrade’ (email, Ename, PID, hours) StudentName (email, Ename) Grade (email, PID, hours) F = {EID!Ename, email; email!EID; EID, PID!hours} Exercise WorkOn (EID, Ename, email, PID, hours) Pick up email!EID pickup email!Ename Luke Huan Univ. of Kansas

  13. Review • A relation R is in Normal Form if • For every non-trivial FD X!Y in R, X is a super key • That is, all FDs follow from “key ! other attributes” • BCNF applies binary decomposition recursively. • At each step, the lossless join property is maintained • Guarantee to remove ALL redundancies caused by FD Boyce-Codd Luke Huan Univ. of Kansas

  14. key A X key A X key X A Three Types of non-key DF X A Partial dependency X A Transitive dependency I X A Transitive dependency II Luke Huan Univ. of Kansas

  15. Summary • Philosophy behind BCNF (and 4NF):Data should depend on the key, the whole key, and nothing but the key! • Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement! Luke Huan Univ. of Kansas

  16. Next • Dependency-preserving decomposition • 3NF (BCNF is too much) • Multivalued dependencies: another source of redundancy • 4NF (BCNF is not enough) Luke Huan Univ. of Kansas

  17. Motivation for 3NF • Address (street_address, city, state, zip) • street_address, city, state!zip • zip!city, state • Keys • {street_address, city, state} • {street_address, zip} • BCNF? • Violation: zip!city, state Luke Huan Univ. of Kansas

  18. To decompose or not to decompose Address1 (zip, city, state) Address2 (street_address, zip) • FD’s in Address1 • zip!city, state • FD’s in Address2 • None! • Hey, where is street_address, city, state!zip? • Cannot check without joining Address1 and Address2 back together • Problem: Some lossless join decomposition is not dependency-preserving • Dilemma: Should we get rid of redundancy at the expense of making constraints harder to enforce? Luke Huan Univ. of Kansas

  19. Preserving Dependency • Projection of set of FDs F : If R is decomposed into X and Y the projection of F on X (denoted FX ) is the set of FDs U  V in F+(closure of F , not just F)such that all of the attributes U, V are in X. (same holds for Y of course) • Decomposition of R into X and Y is dependencypreserving if (FX FY ) + = F + • i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +. Luke Huan Univ. of Kansas

  20. An Example • Important to consider F + in this definition: • R = ABC, F = {A  B, B  C, C  A} • R is decomposed into AB and BC. • Is this dependency preserving? • Is C  A preserved????? • FAB contains A B and B  A; • FBC contains B  C and C  B • So, (FAB FBC)+ contains C  A Luke Huan Univ. of Kansas

  21. Why Preserving FDs? • Another way to view lossless join: we lost one (or more) key FD. EID, PID  hours Luke Huan Univ. of Kansas

  22. Dependency-preserving Decomposition • General idea: • Keep all functional dependency and at least one key during natural join! • Input: a relation R and a set of FDs F • Output: a set of join-lossless, dep.-pres. relations Step 1: for each FD X  Y, create a relation R’ = XY Step 2: if there is no key included in any of the created relations, add a relation with a key Step 3: remove all relations R’ which is a subset of some other relations • GuaranteeLossless-Join, Dep. Pres. Decomp! Luke Huan Univ. of Kansas

  23. An Example • R = ABCDEFGH • FDs: A  B, ACD  E, EF  GH • R is decomposed into • R1 = AB, R2 = ACDE, R3 = EFGH • No key yet! • Add R4 = ACDF Luke Huan Univ. of Kansas

  24. Another Example • Address (street_address, city, state, zip) • FDs F • street_address, city, state!zip • zip!city, state • Keys • {street_address, city, state} • {street_address, zip} • Decomposeto Address1(street_address, city, state, zip), and Address2(zip, city, state) • Address2 is part of Address1, remove Addresss2 Luke Huan Univ. of Kansas

  25. A X A X A X 3NF • R is in Third Normal Form (3NF) if for every non-trivial FD X!A (where A is single attribute), either • X is a superkey of R, or • A is a member of at least one key of R • Intuitively, BCNF decomposition on X!A would “break” the key containing A Partial dependency Transitive dependency I Transitive dependency II 2NF 3NF BCNF 2NF  3NF  BCNF Luke Huan Univ. of Kansas

  26. Notes about 3NF • Address (street_address, city, state, zip) • street_address, city, state!zip • zip!city, state • Is Address in 3NF? • means a set of attributes participate at least one key • Yes • Tradeoff: • Can enforce all original FD’s on individual decomposed relations • Might have some redundancy due to FD’s Luke Huan Univ. of Kansas

  27. Minimal Cover for a Set of FDs • Minimal coverG for a set of FDs F: • Closure of F = closure of G. • Right hand side of each FD in G is a single attribute. • If we modify G by deleting an FD or by deleting attributes from an FD in G, the closure changes. • Intuitively, every FD in G is needed, and ``as small as possible’’ in order to get the same closure as F. • e.g., A  B, ABCD  E, EF  GH, ACDF  EG has the following minimal cover: • A  B, ACD  E, EF  G and EF  H

  28. Dependency-preserving 3NF • Input: a relation R and a set of FDs F • Output: a set of join-lossless, dep.-pres. relations in 3NF Step 1: compute a minimal cover F’ of F Step 2: combine all FD X  {A1}, X  {A2} … X  {An} to a single FD X  {A1, A2 ,…, An} Step 3: for each FD X  Y, create a relation R’ = XY Step 4: if there is no key included in any of the created relations add a key Step 5: remove all relations R’ which is a subset of some other relations • Minimal Cover impliesLossless-Join, Dep. Pres. Decomp in 3NF!!! • (in EN 11.2.3, p. 389) Luke Huan Univ. of Kansas

  29. BCNF = no redundancy? • Student (SID, CID, club) • Suppose your classes have nothing to do with the clubs you join • FD’s? • None • BCNF? • Yes • Redundancies? • Tons! Luke Huan Univ. of Kansas

  30. Multivalued dependencies • A multivalued dependency (MVD) has the formX)Y, where X and Y are sets of attributes in a relation R • X)Y means that whenever two rows in R agree on all the attributes of X, then we can swap their Y components and get two new rows that are also in R Must in the relation also Luke Huan Univ. of Kansas

  31. MVD examples Student (SID, CID, club) • SID)CID • SID)club • Intuition: given SID, CID and club are “independent” • SID, CID)club • Trivial: LHS [ RHS = all attributes of R • SID, CID)SID • Trivial: LHS ¶ RHS • In general when two independent 1:N relations AB, AC are mixed together in R=ABC, a MVD may occur Luke Huan Univ. of Kansas

  32. Complete MVD + FD rules • FD reflexivity, augmentation, and transitivity • MVD complementation:If X)Y, then X)attrs(R) – X – Y • MVD augmentation:If X)Y and VµW, then XW)YV • MVD transitivity:If X)Y and Y)Z, then X )Z – Y • Replication (FD is MVD):If X!Y, then X)Y • Coalescence:If X)Y and ZµY and there is some W disjoint from Y such that W!Z, then X!Z Try proving things using these! Luke Huan Univ. of Kansas

  33. 4NF • A relation R is in Fourth Normal Form (4NF) if • For every non-trivial MVD X)Y in R, X is a superkey • That is, all FD’s and MVD’s follow from “key ! other attributes” (i.e., no MVD’s, and no FD’s besides key functional dependencies) • 4NF is stronger than BCNF • Because every FD is also a MVD Luke Huan Univ. of Kansas

  34. 4NF decomposition algorithm • Find a 4NF violation • A non-trivial MVD X)Y in R where X is not a superkey • Decompose R into R1 and R2, where • R1 has attributes X[Y • R2 has attributes X[Z (Z contains attributes not in X or Y) • Repeat until all relations are in 4NF • Almost identical to BCNF decomposition algorithm • Any decomposition on a 4NF violation is lossless Luke Huan Univ. of Kansas

  35. 4NF violation: SID)CID Enroll (SID, CID) Join (SID, club) 4NF 4NF 4NF decomposition example Student (SID, CID, club) Luke Huan Univ. of Kansas

  36. 3NF, BCNF, 4NF, and beyond • Of historical interests • 1NF: All column values must be atomic • 2NF: Slightly more relaxed than 3NF Luke Huan Univ. of Kansas

  37. Summary • Philosophy behind BCNF, 4NF:Data should depend on the key, the whole key, and nothing but the key! • Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement! Luke Huan Univ. of Kansas

  38. 3NF • How to create join lossless decomposition? • Binary decomposition with foreign key • N-nary decomposition with FD preserving and a key • How to achieve FD preserving decompositions? • For each FD, create a relation • Minimal cover of FD guarantees 3NF Luke Huan Univ. of Kansas

More Related