1 / 52

Functional Dependencies and Normalization

Functional Dependencies and Normalization. Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu. What to Cover. Functional Dependencies (FDs) Closure of Functional Dependencies Lossy & Lossless Decomposition Normalization. sNumber. sName. pNumber. pName. s1. Dave. p1. MM. s2. Greg.

hnick
Download Presentation

Functional Dependencies and Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Dependencies and Normalization Instructor: Mohamed Eltabakh meltabakh@cs.wpi.edu

  2. What to Cover • Functional Dependencies (FDs) • Closure of Functional Dependencies • Lossy & Lossless Decomposition • Normalization

  3. sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 MM Student Professor sNumber sName pNumber pNumber pName s1 Dave p1 p1 MM s2 Greg p2 p2 MM Student Professor sNumber sName pName pNumber pName S1 Dave MM p1 MM S2 Greg MM p2 MM Decomposing Relations StudentProf FDs: pNumber  pName Lossless Lossy

  4. Lossless vs. Lossy Decomposition • Assume R is divided into R1 and R2 • Lossless Decomposition • R1 natural join R2 should create exactly R • LossyDecomposition • R1 natural join R2 adds more records (or deletes records) from R

  5. sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 MM Student Professor sNumber sName pNumber pNumber pName s1 Dave p1 p1 MM s2 Greg p2 p2 MM Lossless Decomposition StudentProf FDs: pNumber  pName Lossless Student & Professor are lossless decomposition of StudentProf (Student ⋈ Professor = StudentProf)

  6. sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 MM Student Professor sNumber sName pName pNumber pName S1 Dave MM p1 MM S2 Greg MM p2 MM Lossy Decomposition StudentProf FDs: pNumber  pName Lossy Student & Professor are lossy decomposition of StudentProf (Student ⋈ Professor != StudentProf)

  7. Goal: Ensure Lossless Decomposition • How to ensure lossless decomposition? • Answer: • The common columns must be candidate key in one of the two relations

  8. sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 MM Student Professor sNumber sName pNumber pNumber pName s1 Dave p1 p1 MM s2 Greg p2 p2 MM Student Professor sNumber sName pName pNumber pName S1 Dave MM p1 MM S2 Greg MM p2 MM Back to our example StudentProf pNumber is candidate key FDs: pNumber pName Lossless pName is not candidate key Lossy

  9. What to Cover • Functional Dependencies (FDs) • Closure of Functional Dependencies • Lossy & Lossless Decomposition • Normalization

  10. Normalization

  11. Normalization • First Normal Form (1NF) • Boyce-Codd Normal Form (BCNF) • Third Normal Form (3NF) • Canonical Cover of FDs

  12. Normalization • Set of rules to avoid “bad” schema design • Decide whether a particular relation R is in “good” form • If not, decompose R to be in a “good” form • Several levels of normalization • First Normal Form (1NF) • BCNF • Third Normal Form (3NF) • Fourth Normal Form (4NF) • If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized

  13. First Normal Form (1NF) • Attribute domain isatomicif its elements are considered to be indivisible units (primitive attributes) • Examples of non-atomic domains are multi-valued and composite attributes • A relational schema R is in first normal form (1NF) if the domains of all attributes of R are atomic We assume all relations are in 1NF

  14. First Normal Form (1NF): Example Since all attributes are primitive  It is in 1NF

  15. Boyce-Codd Normal Form (BCNF): Definition A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+of the form α→β where α ⊆ R and β ⊆ R, then at least one of the following holds: • α → β is trivial (i.e.,β⊆α) • α is a superkey for R Remember: Candidate keys are also superkeys

  16. BCNF: Example Student Is relation Student in BCNF given pNumber pName • It is not trivial FD • pNumber is not a key in Student relation How to fix it and make it in BCNF??? Student Info Professor Info NO

  17. Decomposing a Schema into BCNF • If R is not in BCNF because of non-trivial dependency α → β, then decompose R • R is decomposed into two relations • R1 = (α U β )-- α is super key in R1 • R2 = (R-(β-α))-- R2.α is foreign keys to R1.α

  18. Example of BCNF Decomposition StudentProf FDs: pNumber pName Student Professor FOREIGN KEY: Student (PNum) references Professor (PNum)

  19. What is Nice about this Decomposing ??? • R is decomposed into two relations • R1 = (α U β )-- α is super key in R1 • R2 = (R-(β-α))-- R2.α is foreign keys to R1.α This decomposition is lossless (Because R1 and R2 can be joined based on α, and αis unique in R1) • When you join R1 and R2 on α, you get R back without lose of information

  20. StudentProf = Student ⋈ Professor StudentProf FDs: pNumber pName Student Professor BCNF decomposition rule create lossless decomposition

  21. Multi-Step Decomposition • Relation R and functional dependency F • R = (customer_name, loan_number, branch_name, branch_city, assets, amount ) • F = {branch_nameassets branch_city, loan_numberamount branch_name} • Is R in BCNF ?? • Based on branch_nameassets branch_city • R1 = (branch_name, assets, branch_city) • R2 = (customer_name, loan_number, branch_name, amount) • Are R1 and R2 in BCNF ? • Divide R2 based on loan_number amount branch_name • R3 = (loan_number, amount, branch_name) • R4 = (customer_name, loan_number) NO R2 is not Final Schema has R1, R3, R4

  22. What is NOT Nice about BCNF Before decomposition, we had set of functional dependencies FDs (Say F) After decomposition, do we still have the same set of FDs or we lost something ??

  23. What is NOT Nice about BCNF • Dependency Preservation • After the decomposition, all FDs in F+ should be preserved • BCNF does not guarantee dependency preservation • Can we always find a decomposition that is both BCNF and preserving dependencies? • No…This decomposition may not exist • That is why we study a weaker normal form called (third normal form –3NF)

  24. Dependency Preserving Assume R is decomposed to R1 and R2 Dependencies of R1 and R2 include: • Local dependencies α → β • All columns of α and β must be in a single relation • Global Dependencies • Use transitivity property to form more FDs across R1 and R2 relations Yes  Dependency preserving Does these dependencies match the ones in R ? No  Not dependency preserving

  25. Example of Lost FD • Assume relation R(C, S, J, D, T, Q, V) • C is key, JT  C and SD  T • C  CSJDTQV (C is key) -- Good for BCNF • JT  CSJDTQV (JT is key) -- Good for BCNF • SD  T (SD is not a key) –Bad for BCNF • Decomposition: • R1(C, S, J, D, Q, V) and R2(S, D, T) • Does C CSJDTQV still exist? • Yes: C CSJDQV (local), SDT (local), C CSJDQVT (global) Lossless & in BCNF

  26. Example of Lost FD (Cont’d) • Assume relation R(C, S, J, D, T, Q, V) • C is key, JT  C and SD  T • C  CSJDTQV (C is key) -- Good for BCNF • JT  CSJDTQV (JT is key) -- Good for BCNF • SD  T (SD is not a key) –Bad for BCNF • Decomposition: • R1(C, S, J, D, Q, V) and R2(S, D, T) • Does SD T still exist? • Yes: SDT (local) Lossless & in BCNF

  27. Example of Lost FD (Cont’d) • Assume relation R(C, S, J, D, T, Q, V) • C is key, JT  C and SD  T • C  CSJDTQV (C is key) -- Good for BCNF • JT  CSJDTQV (JT is key) -- Good for BCNF • SD  T (SD is not a key) –Bad for BCNF • Decomposition: • R1(C, S, J, D, Q, V) and R2(S, D, T) • Does JT CSJDTQV still exist? • No this one is lost (no way from the local FDs to get this one) Lossless & in BCNF

  28. Dependency Preservation Test • Assume R is decomposed into R1 and R2 • The closure of FDs in R is F+ • The FDs in R1 and R2 are FR1 and FR2, respectively • Then dependencies are preserved if: • F+ = (FR1 union FR2)+ local dependencies in R1 local dependencies in R2

  29. Back to Our Example • Assume relation R(C, S, J, D, T, Q, V) • C is key, JT  C and SD  T • C  CSJDTQV (C is key) -- Good for BCNF • JT  CSJDTQV (JT is key) -- Good for BCNF • SD  T (SD is not a key) –Bad for BCNF • Decomposition: • R1(C, S, J, D, Q, V) and R2(S, D, T) • F+ = {C  CSJDTQV, JT CSJDTQV, SD T} • FR1 = {C  CSJDQV}  local for R1 • FR2 = {SD  T}  local for R2 • FR1 U FR2 = {C  CSJDQV, SD  T} • (FR1 U FR2)+ = {C  CSJDQV, SD  T, C T} JT  C is still missing

  30. Dependency Preservation BCNFdoes not necessarily preserve FDs. But 3NFis guaranteed to be able to preserve FDs.

  31. Normalization • First Normal Form (1NF) • Boyce-Codd Normal Form (BCNF) • Third Normal Form (3NF) • Canonical Cover of FDs

  32. Third Normal Form: Motivation • There are some situations where • BCNF is not dependency preserving • Solution: Define a weaker normal form, called Third Normal Form (3NF) • Allows some redundancy (we will see examples later) • But all FDs are preserved • There is always a lossless, dependency-preserving decomposition in 3NF

  33. Normal Form : 3NF Relation R is in 3NF if, for every FD in F+ α  β, where α ⊆ R and β ⊆ R, at least one of the following holds: • α → β is trivial (i.e.,β⊆α) • α is a superkey for R • Each attribute in β-α is part of a candidate key (prime attribute) L.H.S is superkey OR R.H.S consists of prime attributes

  34. Testing for 3NF • Use attribute closure to check for each dependency α → β, if α is a superkey • If α is not a superkey, we have to verify if each attribute in (β- α) is contained in a candidate key of R

  35. 3NF: Example Lot (ID, county, lotNum, area, price, taxRate) Primary key: ID Candidate key: <county, lotNum> FDs: county  taxRate area  price • Is relation Lot in 3NF ? NO Decomposition based on county  taxRate Lot (ID, county, lotNum, area, price) County (county, taxRate) • Are relations Lot and County in 3NF ? Lot is not

  36. 3NF: Example (Cont’d) Lot (ID, county, lotNum, area, price) County (county, taxRate) Candidate key for Lot: <county, lotNum> FDs: county taxRate area  price Decompose Lot based on area  price Lot (ID, county, lotNum, area) County (county, taxRate) Area (area, price) • Is every relation in 3NF ? YES

  37. Comparison between 3NF & BCNF ? • If R is in BCNF, obviously R is in 3NF • If R is in 3NF, R may not be in BCNF • 3NF allows some redundancy and is weaker than BCNF • 3NF is a compromise to use when BCNF with good constraint enforcement is not achievable • Important: Lossless, dependency-preserving decomposition of R into a collection of 3NF relations always possible !

  38. Normalization • First Normal Form (1NF) • Boyce-Codd Normal Form (BCNF) • Third Normal Form (3NF) • Canonical Cover of FDs

  39. Canonical Cover of FDs

  40. Canonical Cover of FDs • Canonical Cover (Minimal Cover) = G • Is the smallest set of FDs that produce the same F+ • There are no extra attributes in the L.H.S or R.H.S of and dependency in G • Given set of FDs (F) with functional closure F+ • Canonical cover of F is the minimal subset of FDs (G), where G+ = F+ Every FD in the canonical cover is needed, otherwise some dependencies are lost

  41. Example : Canonical Cover • Given F: • A  B, ABCD  E, EF  GH, ACDF  EG • Then the canonical cover G: • A  B, ACD  E, EF  GH The smallest set (minimal) of FDs that can generate F+

  42. Computing the Canonical Cover • Given a set of functional dependencies F, how to compute the canonical cover G

  43. Example : Canonical Cover(Lets Check L.H.S) • Given F= {A  B, ABCD  E, EF  G, EF H, ACDF  EG} • Union Step: {A  B, ABCD  E, EF GH, ACDF  EG} • Test ABCD  E • Check A: • {BCD}+ = {BCD}  A cannot be deleted • Check B: • {ACD}+ = {A B C D E}  Then B can be deleted • Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG} • Test ACD  E • Check C: • {AD}+ = {ABD}  C cannot be deleted • Check D: • {AC}+ = {ABC}  D cannot be deleted

  44. Example: Canonical Cover(Lets Check L.H.S-Cont’d) • Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG} • Test EF  GH • Check E: • {F}+ = {F}  E cannot be deleted • Check F: • {E}+ = {E}  F cannot be deleted • Test ACDF  EG • None of the H.L.S can be deleted

  45. Example: Canonical Cover(Lets Check R.H.S) • Now the set is: {A  B, ACD  E, EF  GH, ACDF  EG} • Test EF  GH • Check G: • {EF}+ = {E F H}  G cannot be deleted • Check H: • {EF}+ = {E F G}  H cannot be deleted • Test ACDF  EG • Check E: • {ACDF}+ = {A B C D F E G}  E can be deleted • Now the set is: {A  B, ACD  E, EF  GH, ACDF G}

  46. Example: Canonical Cover(Lets Check R.H.S-Cont’d) • Now the set is: {A  B, ACD  E, EF  GH, ACDF G} • Test ACDF  G • Check G: • {ACDF}+ = {A B C D F E G}  G can be deleted Now the set is: {A  B, ACD  E, EF GH} The canonical cover is: {A  B, ACD  E, EF  GH}

  47. Canonical Cover • Used to find the smallest (minimal) set of FDs that have the same closure as the original set. • Used in the decomposition of relations to be in 3NF • The resulting decomposition is lossless and dependency preserving

  48. Done with Normalization • First Normal Form (1NF) • Boyce-Codd Normal Form (BCNF) • Third Normal Form (3NF) • Canonical Cover of FDs

  49. Questions ?

  50. What You Learned • Data Models • Entity-Relationship Model & ERD • Relational Model • Conversion between the data models • Relational Algebra & Operators • Structured Query Language SQL • DML: Data Manipulation Language • DDL: Data Definition Language

More Related