1 / 45

Chapter 10: Normalization of Relational DataBase

Chapter 10: Normalization of Relational DataBase. Csci455 Fall 2003. Objectives. Design Methodologies Goodness of design Define functional dependencies among columns of a table The normalization process Usefulness and limitations of normalization. Design Methodology.

natalie
Download Presentation

Chapter 10: Normalization of Relational DataBase

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 10: Normalization of Relational DataBase Csci455 Fall 2003 Csci455 Fall 2003

  2. Objectives • Design Methodologies • Goodness of design • Define functional dependencies among columns of a table • The normalization process • Usefulness and limitations of normalization Csci455 Fall 2003

  3. Design Methodology • Database system can be designed via • Bottom-up (design by synthesis) • Top-Down (design by analysis) Csci455 Fall 2003

  4. Bottom-up design • starts with the basic relationships among individual attributes • Uses these information to construct the relations • The approach is not scalable Csci455 Fall 2003

  5. Top-down design • Start with a number of groupings of attributes into relations • These attributes are obtained from conceptual design and ER mapping • Analyze each for possible refinement and/or decomposition Csci455 Fall 2003

  6. Informal Design Guidelines for Relational Schemas • Informal design metrics • Semantics of the related attributes • Reducing the redundant values in tuples • Minimizing the NULL values • Disallowing spurious tuples Csci455 Fall 2003

  7. Semantics of the Relation Attributes • Based on the semantics of attributes or how the attributes values in a tuple relate to one another • a schema should capture facts about one entity or one relationship type • See fig 10.1 Csci455 Fall 2003

  8. Csci455 Fall 2003

  9. Redundant Information in Tuples and Update Anomalies • The important objective of schema design • to minimize the storage space needed for base tables • to minimize unexpected side effects resulted from updates • Example • Compare relations in Fig10.2 with those in Fig.10.4 Csci455 Fall 2003

  10. Fig10-2 Csci455 Fall 2003

  11. Fig10-3 Csci455 Fall 2003

  12. Fig10-4 Csci455 Fall 2003

  13. Update problems • Insertion Anomalies • e.g., EMP_DEP fig 10.4 • Inconsistency • e.g., insert Dnumber=6 for research • Null values • e.g., employee without dept. Csci455 Fall 2003

  14. Deletion Anomalies • Deletion Anomalies • Lost of information • E.g., delete from EMP_DEPT the last employee working for particular department Csci455 Fall 2003

  15. Modification Anomalies • Modification Anomalies • Change for one, change for all • E.g., change dept. Mgr or dept. number • Recommendation 2 • Design anomaly free database SCHEMAS Csci455 Fall 2003

  16. Null Values in Tuples • Results from a set of attributes that do not apply to all tuples • E.g., Student Phone number • Hm-Ph, Wrk-Ph, Cll-Ph • Not every student has a cell phone or work phone • Recommendation: • As far as possible, avoid placing attributes in a base relation whose values may frequently be null Csci455 Fall 2003

  17. Generation of Spurious (or invalid) Tuples • Refers to the situation in which we decompose a relation say A into two undesirable relations, say A1 and A2; • Joining of A1 and A2 will result a relation that is different from original relation A • E.g., EMP_LOC and EMP_Proj1 in Figures10.5 and 10.6 Csci455 Fall 2003

  18. Fig10-5 Csci455 Fall 2003

  19. Fig10-6 Csci455 Fall 2003

  20. Recommendation • Design relation schema so that they can be JOINED with equality conditions on attributes that are either PKs or FKs in a way that guarantees no spurious tuples are generated Csci455 Fall 2003

  21. Functional Dependencies • Important tools when analyzing a table for excessive redundancies • Comes from math where a function gives one value • E.g. • SSN {Ename, Address, Bdate} • DNO DMGRSSN Csci455 Fall 2003

  22. Functional Dependencies (FD): Formal definition • X  Y • X (functionally) determines Y • Or Y functionally depends on X • X: left-hand-side (LHS) or determinant • For each X value, there is at most one Y value • Similar to candidate keys Csci455 Fall 2003

  23. More on FD • The notion of dependency has to do with a schema-based dependency • FD is a semantic notation • FD is part of the process of understanding what the data means • e.g., SSNEname (i.e., each employee has exactly one name) Csci455 Fall 2003

  24. More on FD • Legal extensions (or legal relation): • Refers to the extensions r(R) that satisfy the functional dependency • A FD is a property of the relation schema not the relation extension • Fig. 10.7 Csci455 Fall 2003

  25. Fig10-7 Csci455 Fall 2003

  26. Inference Rules for FDs • IR1. Reflexive : If XY, then X Y • a formal statement of trivial dependencies; useful for derivations • IR2. Augmentation:{X  Y }|=XZ  YZ • if a dependency holds, then we can freely expand its left hand side • IR3. Transitivity:{X  Y,Y Z }|=X Z • the "most powerful" inference rule; useful in multi-step derivation • IR4. Decomposition, or Projection: {X YZ} |= X Y • we can remove attributes from the RHS of a dependency; applying this rule repeatedly can decompose FD into a set of dependencies Csci455 Fall 2003

  27. More on IR • IR5. Union, or additive: • {X Y, X Z}|=X YZ • allows us to do composition (the reverse of IR4) • IR6. Pseudo-transitive : • {X Y, WY Z}|= WX Z • Similar to IR3 • Armstrong's rules are complete and sound; general rules of inference on fds. Csci455 Fall 2003

  28. Example • The following set of FD can be inferred from EMP_PROJ: • F={ SSN  ENAME, PNUMBER  { PNAME, PLOCATION}, (SSN, PNUMBER) HOURS} • see Figure 10.3.b Csci455 Fall 2003

  29. Introduction to Normalization • Normalization theory: • built around the concept of normal forms • useful aid in the design process, but is not a panacea • a relation is in a particularly normal form if it satisfies a specified set of requirements or conditions. • E.g., • a relation R is said to be in first normal form (1NF) iff all underlying domains contain atomic values Csci455 Fall 2003

  30. Normal Form • Type of Normal Forms • 1NF • 2NF • 3NF • BCNF • 4NF • 5NF (PJ/NF) • DKNF (absolute normal form) Csci455 Fall 2003

  31. Relationships of Normal Forms Csci455 Fall 2003

  32. First Normal Form (1NF) • 1NF prevents • multi-valued attributes, • composite attributes • Or their combinations • See fig 10.8 • See fig 10.9 (nested relation or multivalued composite attributes ) Csci455 Fall 2003

  33. Fig10-8 Csci455 Fall 2003

  34. Fig10-9 Csci455 Fall 2003

  35. Second Normal Form (2NF) • Based on the concepts of full functional dependency • Analogy to the traditional justice oath • Every non-key depends on a key, the whole key, and nothing but the key • A Relation R is in 2NF iff • it is in 1NF • and every nonkey attribute is fully depend on the PK • See fig 10.10.a Csci455 Fall 2003

  36. Fig10-10 Csci455 Fall 2003

  37. Third Normal Form • Based on the concepts of transitive dependency • A Relation R is in 3NF iff the nonkey attributes of R (if any) are • mutually independent, and • fully depend on the primary key of R • Or R is in 3NF iff it is in 2NF and every nonkey attribute is nontransitively dependent on the PK • See fig 10.10.b Csci455 Fall 2003

  38. Interpretation of 3NF • Formal Definition • R is in 3NF if, whenever a [nontrivial] functional dependency XY • X is super key or • Y is prime attribute of R • Trivial means Y X • e.g., • LOTS2 in fig.10.11.b is 3NF • LOTS1 (FD4) is NOT 3NF Csci455 Fall 2003

  39. Fig10-11 Csci455 Fall 2003

  40. Boyce/Codd NF • 3NF could not handle satisfactorily with the case of a relation that • had multiple CKs, where • those CKs were composite, and • the CKs overlapped (i.e., had at least one attribute in common • Fig.10.12.a • Fig.10.12.b Csci455 Fall 2003

  41. Fig10-12 Csci455 Fall 2003

  42. Boyce/Codd NF (Cont’) • A relation R is in Boyce/Codd normal form (BCNF) iff every determinant is a CK • Each attribute MUST describe the key, the whole key, and nothing but the key • BCNF ensures that there is no redundancy that can be detected using FD information alone • Considered the most desirable NF Csci455 Fall 2003

  43. More on BCNF: Example • Consider a relation TEACH with • FD1: {Student, Course} Instructor • FD2: Instructor  Course • The relation is 3NF • Is it in BCNF? • No Csci455 Fall 2003

  44. Fig10-13 Csci455 Fall 2003

  45. More on Example • Possible decompositions are • {Student, Instructor} and {Student, Course} • {Student, Instructor} and {Course, Student} • {Instructor, Course} and {Instructor, Student} • Which of the decomposition is better? Justify it. Csci455 Fall 2003

More Related