1 / 97

Functional Dependencies and Normalization for Relational Databases

Functional Dependencies and Normalization for Relational Databases. Relational Database Designs. There are many different relational database "designs" (schemas) that can be used to store the relevant mini-world information. Can one schema be much better than another?.

tala
Download Presentation

Functional Dependencies and Normalization for Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional Dependencies and Normalization for Relational Databases

  2. Relational Database Designs • There are many different relational database "designs" (schemas) that can be used to store the relevant mini-world information. • Can one schema be much better than another?

  3. Example: Shipment of Parts • Consider a database containing shipments of parts from suppliers. • A supplier belongs to a particular city and a status is associated with each city. • A part has a name, color, quantity and weight and is shiped on particular date • Proposed relation to capture this information: Ship(S#, sname, status, city, P#, pname, colour, weight, qty, date)

  4. Example: Shipment of Parts Ship(S#, sname, status, city, P#, pname, colour, weight, qty, date) Problems ?? • Redundant Data • Inability to represent certain information • Loss of information

  5. Example: Shipment of Parts Ship(S#, sname, status, city, P#, pname, colour, weight, qty, date) • INSERT • We can not enter a new supplier until the supplier ships a part. • We can not enter a new part until it is shipped by a supplier. • DELETE • If we delete the only tuple of a particular supplier, we destroy not only the shipment information but also the information that the supplier is from a particular city. (example S5 and P1) • UPDATE • The city of a supplier may appear many times in Ship. • If we change the value of the city in one tuple but not all the other, it creates inconsistency in the database.

  6. Design Anomalies Ship(S#, sname, status, city, P#, pname, colour, weight, qty, date) • This Apply relation exhibits three types of anomalies: • Redundancy • Update Anomaly • Deletion Anomaly • Good designs: • No anomalies • Can reconstruct all original information

  7. Example: Shipment of Parts

  8. Example: Shipment of Parts Supplier(S#, sname, status, city) Parts(P#, pname, color, weight) Shipment(S#, P#, qty, date) • INSERT • We can not enter the fact that a particular city has a particular status until we have a supplier from that city • DELETE • If we delete the only tuple with a particular city, we destroy not only the supplier information but also the information that that city has that particular status. • UPDATE • The city of a supplier may appear many times in Supplier (redundant information). • If we change the status for Paris, we have to change it for all tuples his Paris as the city or we face inconsistency. Problems ??

  9. Example: Shipment of Parts

  10. Informal Design Guidelines for Relational Databases • We first discuss informal guidelines for good relational design • Then we discuss formal concepts of functional dependencies and normal forms

  11. Semantics of the Relation Attributes • Each tuple in a relation should represent one entity or relationship instance. EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Attributes of different entities should not be mixed in the same relation • Only foreign keys should be used to refer to other entities • Entity and relationship attributes should be kept apart as much as possible.

  12. Redundant Information in Tuples and Update Anomalies EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Information is stored redundantly • Wastes storage • Causes problems with update anomalies • Insertion anomalies • Deletion anomalies • Modification anomalies

  13. EXAMPLE : DESIGN ANOMALIES EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Update Anomaly: • Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1. • Insert Anomaly: • Cannot insert a project unless an employee is assigned to it. • Cannot insert an employee unless a he is assigned to a project.

  14. EXAMPLE OF AN DELETE ANOMALY EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) • Delete Anomaly: • When a project is deleted, it will result in deleting all the employees who work on that project. • Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

  15. Two relation schemas suffering from update anomalies

  16. Base RelationsEMP_DEPT and EMP_PROJ formed after a Natural Join : with redundant information

  17. Redundant Information in Tuples and Update Anomalies • Design a schema that does not suffer from • insertion, • deletion and • update anomalies. • If there are any anomalies present, then note them so that applications can be made to take them into account. • Use anomaly free relations and define Views to join attributed frequently referenced in queries

  18. Null Values in Tuples • Relations should be designed such that their tuples will have as few NULL values as possible • Attributes that are NULL frequently could be placed in separate relations (with the primary key) •  Reasons for nulls: • Attribute not applicable or invalid • Attribute value unknown (may exist) • Value known to exist, but unavailable

  19. Relation Decomposition • In order to solve the previous problems we decomposed the relations into smaller relations. • Relation decomposition can be dangerous we can loose information. • Loss-less join decomposition • If we decompose a relation R into smaller relations, the join of those relations should return R, not more, not less.

  20. Example: Relation Decomposition • Redundant Information • Eliminates redundancy • To get back to the original relation use join

  21. Bad decomposition • Association between CID and grade is lost • Join returns more rows than the original relation

  22. Unnecessary decomposition • Join returns the original relation • Unnecessary: no redundancy is removed, and now SID is stored twice!

  23. Spurious Tuples • Bad designs for a relational database may result in erroneous results for certain JOIN operations • The relations should be designed to satisfy the lossless join condition. • No spurious tuples should be generated by doing a natural-join of any relations.

  24. Functional Dependencies

  25. Motivation • How do we tell if a design is bad? • This design has redundancy and update anomalies • How about a systematic approach to detecting and removing redundancy in designs? • Dependencies, decompositions, and normal forms

  26. Functional Dependencies • Functional dependencies play an important role in database design. • They describe relationships between attributes. • FDs are derived from the real-world constraints on the attributes

  27. Functional Dependencies • A functional dependency (FD) has the form X → Y, where X and Y are sets of attributes in a relation R • X → Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y • We say X functionally determines Y • X -> Y in R specifies a constraint on all relation instances r(R) • For any two tuples t1 and t2 • in any relation instance r(R): • If t1[X]=t2[X], then t1[Y]=t2[Y]

  28. Example: Student Grade Info • StudentGrade(SID, name, email, CID, grade) • SID → name, email • email → SID • SID, CID → grade • Not a good design

  29. EXAMPLE: ADDRESS Address (street_address, city, state, zip) • street_address, city, state → zip • zip → city, state • zip, state → zip? • This is a trivial FD • Trivial FD: LHS ⊇ RHS • zip → state, zip? • This is non-trivial, but not completely non-trivial • Completely non-trivial FD: LHS ∩ RHS = ∅

  30. Examples of FD constraints • If K is a key of R, then K functionally determines all attributes in R • SSN -> ENAME • PNUMBER -> {PNAME, PLOCATION} • {SSN, PNUMBER} -> HOURS

  31. FD is a property of relational schema R. Must be define by someone who knows the semantic of attributes of R. FD’s are a property of the meaning of data and hold at all times: certain FD’s can be ruled out based on a given state of the database

  32. Functional Dependencies • Given a relation R and a set of FD’s F • Does another FD follow from F? • Are some of the FD’s in F redundant (i.e., they follow from the others)? • Consider schema Emp_Dept(ename, ssn, bdate, address, dnumber, dname, dmgrssn) • F ={ssn -> {ename, bdate, address, dnumber), dnumber ->{dname, dmgrssn) } • Additional FD that can be infer from F • ssn-> dname, dmgrssn • ssn-> ssn • dnumber -> dname

  33. Inference Rules for FDs • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold • Armstrong's inference rules: • IR1. (Reflexive) If Y  X, then X -> Y • IR2. (Augmentation) If X -> Y, then XZ -> YZ (XZ stands for X U Z) • IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z • IR1, IR2, IR3 form a sound and complete set of inference rules

  34. Inference Rules for FDs • Some additional inference rules that are useful: • Decomposition: If X -> YZ, then X -> Y and X -> Z • Union: If X -> Y and X -> Z, then X -> YZ • Pseudotransitivity: If X -> Y and WY -> Z, then WX -> Z • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

  35. If X -> YZ, then X -> Y and X -> Z • YZ -> Y (IR1) • X -> YZ (given) • X -> Y (transitivity IR3) • If X -> Y and X -> Z, then X -> YZ • X ->XY (augmenting X in X-> Y) • XY -> YZ (augmenting Y in X-> Z) • X -> YZ • If X -> Y and WY -> Z, then WX -> Z • WX->WY (augmenting) • WY -> Z (given) • WX -> Z (transitivity)

  36. Inference Rules for FDs • Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F • Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X • X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

  37. Algorithm for computing Attribute Closure • The closure of X (denoted X +) with respect to F is the set of all attributes functionally determined by X • Algorithm for computing the closure X+ = X repeat OldX+ =X+ for each functional dependency Y → Z in F do if Y X+ then X+ := X+ Z until (X + = oldX+ )

  38. Example 1: Closure Algorithm Consider following schema EMP_PROJ(Ssn, Ename, Pno, Pname, Hours) And FDs • F = {Ssn -> Ename, Pno -> {Pname, Plocation}, {Ssn, Pno} -> Hours } • Find the following Closure Sets • {Ssn}+ = • {Pno} + = • {Ssn, Pno} + =

  39. Example 2: Closure Algorithm StudentGrade (SID, name, email, CID, grade) F includes: • SID → name, email • email → SID • SID, CID → grade • { CID, email }+ = ? • X+ = {CID, email} Consider email → SID • X+ = {SID, CID, email} Consider SID → name, email • X+ = {SID, CID, email, name} Consider SID, CID → grade • X+ = {SID, CID, email, name, grade}

  40. Using rules of FD’s • Given a relation R and set of FD’s F • Does another FD X → Y follow from F? • Use the rules to come up with a proof • Example: • F includes: • SID → name, email; email → SID; SID, CID → grade • CID, email → grade? • email → SID (given in F) • CID, email → CID, SID (augmentation) • SID, CID → grade (given in F) • CID, email → grade (transitivity)

  41. Equivalence of Sets of FDs • Two sets of FDs F and G are equivalent if: • Every FD in F can be inferred from G, and • Every FD in G can be inferred from F • Hence, F and G are equivalent if F+ =G+ • Covers: • F covers G if every FD in G can be inferred from F • (i.e., if G+ F+) • F and G are equivalent if F covers G and G covers F • There is an algorithm for checking equivalence of sets of FDs

  42. Minimal Sets of FDs • A set of FDs is minimal if it satisfies the following conditions: • Every dependency in F has a single attribute for its RHS. • We cannot remove any dependency from F and have a set of dependencies that is equivalent to F. • We cannot replace any dependency X -> A in F with a dependency Y -> A, Y  X and still have a set of dependencies that is equivalent to F. • Every set of FDs has an equivalent minimal set • There can be several equivalent minimal sets

  43. Algorithm for finding Minimal Cover F for a set of FDs E • Set F:= E • Replace each FD X-> {A1, A2, …, An} in F by n FDs X-> A1, X->A2, … , X-> An • For each FD X-> A in F For each attribute B of X if {{ F- { X -> A }} U { (X- B) -> A}} = F then replace X->A with (X- B) -> A in F • For each remaining FD X->A in F if {F – {X->A} } = F then remove X->A from F

  44. Computing the Minimal Sets of FDs Consider set of FDs E : {B → A, D → A, AB → D}. We have to find the minimum cover of E. • Step1: Ensure above dependencies are in canonical form • Step 2: Determine if AB → D has any redundant attribute, that is, can it be replaced by B → D or A → D? Consider B → A BB → AB (augmenting with B -- IR2) B → AB AB → D (given) We get B → D ( transitivity – IR3) Hence, AB → D may be replaced by B → D. • We now have a set equivalent to E, say E′ : {B → A, D → A, B → D}. • Step 3: Look for a redundant FD in E′ Using the transitive rule on B → D and D → A, we get B → A. Hence B → A is redundant in E’ and can be eliminated. • The minimum cover of E is {B → D, D → A}.

  45. Algorithm for finding Minimal Cover F for a set of FDs E • Set F:= E • Replace each FD X-> {A1, A2, …, An} in F by n FDs X-> A1, X->A2, … , X-> An • For each FD X-> A in F For each attribute B of X if {{ F- { X -> A }} U { (X- B) -> A}} = F then replace X->A with (X- B) -> A in F • For each remaining FD X->A in F if {F – {X->A} } = F then remove X->A from F

  46. Normalization

  47. Normalization of Relations • Normalization: • The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations • Normal form: • Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

  48. Normalization of Relations • 2NF, 3NF, BCNF • based on keys and FDs of a relation schema • 4NF • based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11) • Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

  49. Practical Use of Normal Forms • Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties • The database designers need not normalize to the highest possible normal form (usually up to 3NF, BCNF or 4NF) • Denormalization: The process of storing the join of higher normal form relations as a base relation—which is in a lower normal form

  50. SuperKeys and Keys • Superkey • Key • Candidate key • Primary key • Secondary keys • Prime attribute: it must be a member of some candidate key • Nonprime attribute: it is not a prime attribute—that is, it is not a member of any candidate key.

More Related