1 / 62

Temple University – CIS Dept. CIS616– Principles of Data Management

Temple University – CIS Dept. CIS616– Principles of Data Management. V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU). Overview. Relational model formal query languages commercial query languages (SQL)

hayden
Download Presentation

Temple University – CIS Dept. CIS616– Principles of Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Temple University – CIS Dept.CIS616– Principles of Data Management V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)

  2. Overview • Relational model • formal query languages • commercial query languages (SQL) • Integrity constraints • domain I.C., foreign keys • functional dependencies • Functional Dependencies • DB design and normalization

  3. Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition • normal forms

  4. Goal • Design ‘good’ tables • sub-goal#1: define what ‘good’ means • sub-goal#2: fix ‘bad’ tables • in short: “we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key” • Let’s see why, and how:

  5. Pitfalls takes1 (ssn, c-id, grade, name, address) Ssn c-id Grade Name Address A 123 smith Main cs331

  6. Pitfalls ‘Bad’ - why? because: ssn->address, name Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main

  7. Pitfalls • Redundancy • space • (inconsistencies) • insertion/deletion anomalies:

  8. Ssn c-id Grade Name Address 123 cs331 A smith Main … … … … … 234 null null jones Forbes Pitfalls • insertion anomaly: • “jones” registers, but takes no class - no place to store his address!

  9. Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main Pitfalls • deletion anomaly: • delete the last record of ‘smith’ (we lose his address!)

  10. Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 123 cs211 A smith Main Solution: decomposition • split offending table in two (or more), e.g.: ? ?

  11. Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition • lossless join • dependency preserving • normal forms

  12. Decompositions • there are ‘bad’ decompositions • we want: • lossless and • dependency preserving

  13. Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Decompositions - lossy: R1(ssn, grade, name, address) R2(c-id,grade) c-id Grade cs331 A cs351 B cs211 A ssn->name, address ssn, c-id -> grade

  14. Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Decompositions - lossy: can not recover original table with a join! c-id Grade cs331 A cs351 B cs211 A ssn->name, address ssn, c-id -> grade

  15. Decompositions – lossy: Another example • Decomposition of R = (A, B) into R1 = (A), R2 = (B) A B A B    1 2 1   1 2 A(r) B(r) r A B A (r) B (r)     1 2 1 2

  16. Decompositions example of non-dependency preserving S# -> address S# -> status S# -> address, status address -> status

  17. Decompositions is it lossless? S# -> address, status address -> status S# -> address S# -> status

  18. Decompositions - lossless Definition: Consider schema R, with FD ‘F’. R1, R2 is a lossless join decomposition of R if we always have: An easier criterion?

  19. Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally:

  20. Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Decomposition - lossless example: Ssn c-id Grade R2 123 cs331 A R1 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade

  21. Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition • lossless join decomp. • dependency preserving • normal forms

  22. Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables - counter-example: S# -> address S# -> status S# -> address, status address -> status

  23. Decomposition - depend. pres. dependency preserving decomposition: S# -> address address -> status S# -> address, status address -> status (but: S#->status ?)

  24. Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables more specifically: … the FDs of the canonical cover Let Fibe the set of dependencies F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, that is, (F1 F2  …  Fn)+ = F+ • Otherwise, checking updates for violation of functional dependencies may require computing joins  expensive

  25. Decomposition - depend. pres. why is dependency preservation good? S# -> address S# -> status S# -> address address -> status (address->status: ‘lost’)

  26. Decomposition - depend. pres. A: eg., record that ‘Philly’ has status ‘A’ S# -> address S# -> address address -> status S# -> status (address->status: ‘lost’)

  27. Decomposition - depend. pres. • To check if a dependency  is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done w.r.t. F) • result = while (changes to result) dofor eachRiin the decompositiont = (result  Ri)+  Riresult = result  t • If result contains all attributes in , then functional dependency    is preserved • We apply the test on all dependencies in F to check if a decomposition is dependency preserving • The test takes polynomial time • Computing F+and(F1 F2  …  Fn)+ needs exponential time

  28. Decomposition - conclusions • decompositions should always be lossless • joining attribute -> superkey • whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…)

  29. Normalization using FD • When decomposing a relation schema R with a set of functional dependencies F into R1, R2,…, Rn we want: • Lossless-join decomposition: otherwise … information loss • No redundancy: relations Ripreferably should be in either Boyce-Codd Normal Form or Third Normal Form • Dependency preservation: Let Fibe the set of dependencies in F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, i.e., (F1 F2  …  Fn)+ = F+ • Otherwise, checking updates for violation of functional dependencies may require computing joins  expensive

  30. Normalization using FD - Example • R = (A, B, C)F = {A B, B C) • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1  R2 = {B} and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1  R2 = {A} and A  AB • Not dependency preserving (cannot check B C without computing R1 R2)

  31. Overview - detailed • DB design and normalization • pitfalls of bad design • decomposition ( how to fix the problem) • normal forms ( how to detect the problem) • BCNF, • 3NF, • (1NF, 2NF)

  32. Normal forms - BCNF We saw how to fix ‘bad’ schemas - but what is a ‘good’ schema? Answer: ‘good’, if it obeys a ‘normal form’, i.e., a set of rules Typically: Boyce-Codd Normal Form (BCNF)

  33. Normal forms - BCNF Defn.: Rel. R is in BCNF w.r.t. F, if • informally: everything depends on the full key, and nothing but the key • semi-formally: every determinant (of the cover) is a candidate key

  34. Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Normal forms - BCNF Example and counter-example: ssn->name, address ssn->name, address ssn, c-id -> grade

  35. Normal forms - BCNF Formally: for every FD a->b in F+ • a->b is trivial (a is a superset of b) or • a is a superkey • (or both)

  36. Normal forms - BCNF Theorem: given a schema R and a set of FD ‘F’, we can always decompose it to schemas R1, … Rn, so that • R1, … Rn are in BCNF and • the decomposition is lossless (…but, some decomp. might lose dependencies)

  37. BCNF Decomposition How? ….essentially, break off FDs of the cover eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade

  38. grade name ssn address c-id Normal forms - BCNF eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade

  39. Ssn c-id Grade Name Address 123 cs331 A smith Main 123 cs351 B smith Main 234 cs211 A jones Forbes Normal forms - BCNF Ssn c-id Grade 123 cs331 A 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade

  40. grade name ssn address c-id Normal forms - BCNF pictorially: we want a ‘star’ shape :notin BCNF

  41. G F H B A D C E Normal forms - BCNF pictorially: we want a ‘star’ shape or

  42. name name ssn st# ssn st# address address Normal forms - BCNF or a star-like: (e.g., 2 cand. keys): STUDENT(ssn, st#, name, address) =

  43. G F H B D A D C E Normal forms - BCNF but not: or

  44. BCNF Decomposition result := {R};done := false;compute F+;while (not done) do if (there is a schema Riin result that is not in BCNF)then beginlet   be a nontrivial functional dependency that holds on Risuch that  Riis not in F+, and   = ;result := (result – Ri)  (Ri – )  (,  );end else done := true; Note: each Riis in BCNF, and decomposition is lossless-join

  45. T S J Normal forms - 3NF consider the ‘classic’ case: STJ( Student, Teacher, subJect) T-> J S,J -> T is it BCNF?

  46. T S J Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T How to decompose it to BCNF?

  47. Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? ) 2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )

  48. Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N ) 2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )

  49. Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T in this case: impossible to have both • BCNF and • dependency preservation Welcome 3NF (…a weaker normal form)!

  50. S T J Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T informally, 3NF ‘forgives’ the red arrow in the can. cover

More Related