450 likes | 709 Views
Chapter 10: Normalization of Relational DataBase. Csci455 Fall 2003. Objectives. Design Methodologies Goodness of design Define functional dependencies among columns of a table The normalization process Usefulness and limitations of normalization. Design Methodology.
E N D
Chapter 10: Normalization of Relational DataBase Csci455 Fall 2003 Csci455 Fall 2003
Objectives • Design Methodologies • Goodness of design • Define functional dependencies among columns of a table • The normalization process • Usefulness and limitations of normalization Csci455 Fall 2003
Design Methodology • Database system can be designed via • Bottom-up (design by synthesis) • Top-Down (design by analysis) Csci455 Fall 2003
Bottom-up design • starts with the basic relationships among individual attributes • Uses these information to construct the relations • The approach is not scalable Csci455 Fall 2003
Top-down design • Start with a number of groupings of attributes into relations • These attributes are obtained from conceptual design and ER mapping • Analyze each for possible refinement and/or decomposition Csci455 Fall 2003
Informal Design Guidelines for Relational Schemas • Informal design metrics • Semantics of the related attributes • Reducing the redundant values in tuples • Minimizing the NULL values • Disallowing spurious tuples Csci455 Fall 2003
Semantics of the Relation Attributes • Based on the semantics of attributes or how the attributes values in a tuple relate to one another • a schema should capture facts about one entity or one relationship type • See fig 10.1 Csci455 Fall 2003
Redundant Information in Tuples and Update Anomalies • The important objective of schema design • to minimize the storage space needed for base tables • to minimize unexpected side effects resulted from updates • Example • Compare relations in Fig10.2 with those in Fig.10.4 Csci455 Fall 2003
Fig10-2 Csci455 Fall 2003
Fig10-3 Csci455 Fall 2003
Fig10-4 Csci455 Fall 2003
Update problems • Insertion Anomalies • e.g., EMP_DEP fig 10.4 • Inconsistency • e.g., insert Dnumber=6 for research • Null values • e.g., employee without dept. Csci455 Fall 2003
Deletion Anomalies • Deletion Anomalies • Lost of information • E.g., delete from EMP_DEPT the last employee working for particular department Csci455 Fall 2003
Modification Anomalies • Modification Anomalies • Change for one, change for all • E.g., change dept. Mgr or dept. number • Recommendation 2 • Design anomaly free database SCHEMAS Csci455 Fall 2003
Null Values in Tuples • Results from a set of attributes that do not apply to all tuples • E.g., Student Phone number • Hm-Ph, Wrk-Ph, Cll-Ph • Not every student has a cell phone or work phone • Recommendation: • As far as possible, avoid placing attributes in a base relation whose values may frequently be null Csci455 Fall 2003
Generation of Spurious (or invalid) Tuples • Refers to the situation in which we decompose a relation say A into two undesirable relations, say A1 and A2; • Joining of A1 and A2 will result a relation that is different from original relation A • E.g., EMP_LOC and EMP_Proj1 in Figures10.5 and 10.6 Csci455 Fall 2003
Fig10-5 Csci455 Fall 2003
Fig10-6 Csci455 Fall 2003
Recommendation • Design relation schema so that they can be JOINED with equality conditions on attributes that are either PKs or FKs in a way that guarantees no spurious tuples are generated Csci455 Fall 2003
Functional Dependencies • Important tools when analyzing a table for excessive redundancies • Comes from math where a function gives one value • E.g. • SSN {Ename, Address, Bdate} • DNO DMGRSSN Csci455 Fall 2003
Functional Dependencies (FD): Formal definition • X Y • X (functionally) determines Y • Or Y functionally depends on X • X: left-hand-side (LHS) or determinant • For each X value, there is at most one Y value • Similar to candidate keys Csci455 Fall 2003
More on FD • The notion of dependency has to do with a schema-based dependency • FD is a semantic notation • FD is part of the process of understanding what the data means • e.g., SSNEname (i.e., each employee has exactly one name) Csci455 Fall 2003
More on FD • Legal extensions (or legal relation): • Refers to the extensions r(R) that satisfy the functional dependency • A FD is a property of the relation schema not the relation extension • Fig. 10.7 Csci455 Fall 2003
Fig10-7 Csci455 Fall 2003
Inference Rules for FDs • IR1. Reflexive : If XY, then X Y • a formal statement of trivial dependencies; useful for derivations • IR2. Augmentation:{X Y }|=XZ YZ • if a dependency holds, then we can freely expand its left hand side • IR3. Transitivity:{X Y,Y Z }|=X Z • the "most powerful" inference rule; useful in multi-step derivation • IR4. Decomposition, or Projection: {X YZ} |= X Y • we can remove attributes from the RHS of a dependency; applying this rule repeatedly can decompose FD into a set of dependencies Csci455 Fall 2003
More on IR • IR5. Union, or additive: • {X Y, X Z}|=X YZ • allows us to do composition (the reverse of IR4) • IR6. Pseudo-transitive : • {X Y, WY Z}|= WX Z • Similar to IR3 • Armstrong's rules are complete and sound; general rules of inference on fds. Csci455 Fall 2003
Example • The following set of FD can be inferred from EMP_PROJ: • F={ SSN ENAME, PNUMBER { PNAME, PLOCATION}, (SSN, PNUMBER) HOURS} • see Figure 10.3.b Csci455 Fall 2003
Introduction to Normalization • Normalization theory: • built around the concept of normal forms • useful aid in the design process, but is not a panacea • a relation is in a particularly normal form if it satisfies a specified set of requirements or conditions. • E.g., • a relation R is said to be in first normal form (1NF) iff all underlying domains contain atomic values Csci455 Fall 2003
Normal Form • Type of Normal Forms • 1NF • 2NF • 3NF • BCNF • 4NF • 5NF (PJ/NF) • DKNF (absolute normal form) Csci455 Fall 2003
Relationships of Normal Forms Csci455 Fall 2003
First Normal Form (1NF) • 1NF prevents • multi-valued attributes, • composite attributes • Or their combinations • See fig 10.8 • See fig 10.9 (nested relation or multivalued composite attributes ) Csci455 Fall 2003
Fig10-8 Csci455 Fall 2003
Fig10-9 Csci455 Fall 2003
Second Normal Form (2NF) • Based on the concepts of full functional dependency • Analogy to the traditional justice oath • Every non-key depends on a key, the whole key, and nothing but the key • A Relation R is in 2NF iff • it is in 1NF • and every nonkey attribute is fully depend on the PK • See fig 10.10.a Csci455 Fall 2003
Fig10-10 Csci455 Fall 2003
Third Normal Form • Based on the concepts of transitive dependency • A Relation R is in 3NF iff the nonkey attributes of R (if any) are • mutually independent, and • fully depend on the primary key of R • Or R is in 3NF iff it is in 2NF and every nonkey attribute is nontransitively dependent on the PK • See fig 10.10.b Csci455 Fall 2003
Interpretation of 3NF • Formal Definition • R is in 3NF if, whenever a [nontrivial] functional dependency XY • X is super key or • Y is prime attribute of R • Trivial means Y X • e.g., • LOTS2 in fig.10.11.b is 3NF • LOTS1 (FD4) is NOT 3NF Csci455 Fall 2003
Fig10-11 Csci455 Fall 2003
Boyce/Codd NF • 3NF could not handle satisfactorily with the case of a relation that • had multiple CKs, where • those CKs were composite, and • the CKs overlapped (i.e., had at least one attribute in common • Fig.10.12.a • Fig.10.12.b Csci455 Fall 2003
Fig10-12 Csci455 Fall 2003
Boyce/Codd NF (Cont’) • A relation R is in Boyce/Codd normal form (BCNF) iff every determinant is a CK • Each attribute MUST describe the key, the whole key, and nothing but the key • BCNF ensures that there is no redundancy that can be detected using FD information alone • Considered the most desirable NF Csci455 Fall 2003
More on BCNF: Example • Consider a relation TEACH with • FD1: {Student, Course} Instructor • FD2: Instructor Course • The relation is 3NF • Is it in BCNF? • No Csci455 Fall 2003
Fig10-13 Csci455 Fall 2003
More on Example • Possible decompositions are • {Student, Instructor} and {Student, Course} • {Student, Instructor} and {Course, Student} • {Instructor, Course} and {Instructor, Student} • Which of the decomposition is better? Justify it. Csci455 Fall 2003