360 likes | 591 Views
Normalization. Database Management Systems, 3 rd ed., Ramakrishnan and Gehrke, Chapter 19. Motivation. Update anomalies – unexpected side effects as result of insertion, deletion, or modification of data.
E N D
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19
Motivation Update anomalies – unexpected side effects as result of insertion, deletion, or modification of data • Insertion anomaly – not possible to store certain information unless some other, unrelated, information is stored • Deletion anomaly - not possible to delete certain information w/o losing some other, unrelated, information • Modification anomaly – inconsistency due to not all copies of data item being updated
Example Advising (FacSSN, FacName, Office, FacDept, StuSSN, StuName, Address, GPA) What insertion anomalies may occur? What deletion anomalies may occur? What modification anomalies may occur?
Normalization Process that decomposes (poorly designed) relations into simpler (well designed) relations in a way that: • minimizes redundancy • ensures that a relation’s attributes are properly dependent on (or determined by) the PK
Functional Dependence For given relation, attribute X functionally determines attribute Y (or attribute Y is functionally dependent on attribute X) if each value of X has associated with it exactly one value of Y at any time. X →Y I.e., for every pair of tuples t1 and t2 in the relation, if t1.X = t2.X then t1.Y = t2.Y.
Candidate Key Let K represent an attribute or set of attributes of relation R; let T represent a tuple in R. K is a candidate key of R if • R.K → R.T, and • no attribute in K can be removed without violating the first condition. A superkey is an attribute or set of attributes in R that satisfies only the first condition.
Closure of Set of FDs Armstrong’s Axioms: Reflexivity: If X ⊇ Y, then X → Y Augmentation: If X → Y, then XZ → YZ for any Z Transitivity: If X → Y and Y → Z, then X → Z These axioms are both sound and complete.
Closure of Set of FDs (continued) Additional rules that are convenient: Union: If X → Y and X → Z, then X → YZ Decomposition: If X → YZ, then X → Y and X → Z
Example Given: Contracts (C, S, J, D, P, Q, V) F = {C → (C, S, J, D, P, Q, V), (J, P) → C, (S, D) → P} Show: (S, D, J) → (C, S, J, D, P, Q, V) What does this mean must be true of (S, D, J)?
Example (continued) • C →(C, S, J, D, P, Q,V)given • (J, P) → C given • (S, D) → P given • (J, P) → (C, S, J, D, P, Q, V) transitivity: 1 & 2 • (S, D, J) → (J, P) augmentation on 3 • 6. (S, D, J) → (C, S, J, D, P, Q, V) transitivity: 5 & 4
Trivial Functional Dependencies A trivial FD is an FD in which the right side contains only attributes that also appear on the left side. X → Y where Y ⊆ X Such FDs always hold due to reflexivity. Examples from the Contracts relation: S → S (S, J, D) → (S, D) (P, Q, V) → P (P, Q, V) → (Q, V)
Attribute Closure To determine if a given FD such as X → Y is in the closure set F+, determine attribute closure X+ with respect to (wrt) F. X+ is set of attributes A such that X → A can be inferred using Armstrong’s Axioms.
Attribute Closure Algorithm To compute X+ wrt F for some attribute set X: closure = X; repeat until there is no change: { if there is an FD U → V in F such that U ⊆ closure, then set closure = closure ∪ V } Figure 19.4
First Normal Form A relation is in 1NF if all of its attribute values are atomic. 1NF disallows nested relations. Example of “relation” not in 1NF: EmpProj {SSN, EName, {PNumber, Hours}}
Second Normal Form Let X and Y be sets of attributes of relation R where X consists of more than one attribute. Y is fully functionally dependent on X if (a) Y is functionally dependent on X, and (b) Y is not functionally dependent on some proper subset of X. A relation is in 2NF if it is in 1NF and all of its nonkey attributes are fully functionally dependent on every candidate key.
Third Normal Form Let X, Y, and Z be sets of attributes of relation R. Z is transitively dependent on X if X → Y, Y → Z, and Y -/-> X A relation is in 3NF if it is in 2NF and there are no transitive dependencies.
Motivation for BCNF Relation representing land for sale in various counties: Lots (PropertyID, CountyName, LotNum, Area, Price, TaxRate) CKs: PropertyID and (CountyName, LotNum) Suppose: tax rate is fixed for given county; price of lot (for tax purposes) is determined by area regardless of county. What are the FDs? Fundamentals of Database Systems by Elmasri and Navathe
Motivation (continued) Lots is not in 2NF because TaxRate is partially dependent on (CountyName, LotNum). Decomposition of Lots: Lots1 (PropertyID, CountyName, LotNum, Area, Price) Lots2 (CountyName, TaxRate) Lots1 and Lots2 are in 2NF. But Lots1 is not in 3NF. (Why not?)
Motivation (continued) Decomposition of Lots1: Lots1A (PropertyID, CountyName, LotNum, Area) Lots1B (Area, Price) Lots1A, Lots1B, and Lots2 are in 3NF.
Motivation (continued) Now suppose we will store information about only 2 counties, and that the lot sizes will only be as follows: Marion: 0.5, 0.6, …, 1.0 acres Liberty: 1.1, 1.2, …, 2.0 acres What additional FD is now added to F? Although Lots1A is in 3NF, there is a lot of unncessary redundancy. A stronger normal form is needed to suggest to us the need to decompose Lots1A.
Boyce-Codd Normal Form Let R be a relation schema, F be the set of FDs given to hold over R, X be a subset of the attributes of R, and A be an attribute of R. R is in BCNF if, for every FD X → A in F, one of the following statements is true: • A X, or • X is a superkey.
Decomposition into BCNF • Suppose R is not in BCNF. Let X R, A be a single attribute in R, and X → A be an FD that causes a violation of BCNF. Decompose R into R – A and XA. • If either R – A or XA is not in BCNF, decompose them further by a recursive application of this algorithm. Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, p. 623
Example of Multivalued Dependencies Relation in BCNF: CourseTeacherBook (Course, Teacher, Book) Sample tuples: <Physics101, Green, Mechanics> <Math301, Green, Mechanics> <Physics101, Green, Optics> <Math301, Green, Vectors> <Physics101, Brown, Mechanics> <Math301, Green, Geometry> <Physics101, Brown, Optics> Decomposition needed to eliminate redundancy: CourseTeacher (Course, Teacher) CourseBook (Course, Book) Adapted from Figure 19.13
Multivalued Dependencies Let R be a relation schema and let X, Y, and Z be subsets of attributes of R. X multidetermines Y (X →→ Y) iff the set of Y values that correspond to a given (X, Z)-pair value depends only on the value of X. Examples from CourseTeacherBook: Course →→ Teacher Course →→ Book
Fourth Normal Form Let R be a relation schema, X and Y be nonempty subsets of the attributes of R, and F be a set of dependencies that includes both FDs and MVDs. R is in 4NF if, for every MVD X →→ Y that holds over R, one of the following statements is true: • Y ⊆ X or XY = R, or • X is a superkey. I.e., for each MVD X →→ Y, all attributes of R are functionally dependent on X.
Quick Introduction to Relational Algebra Operations A projection extracts only those columns of interest. A natural join of two relations R1 and R2 constructs a new relation such that each tuple in the new relation is the concatenation of two tuples, one from R1 and one from R2, where the two tuples have the same value for two attributes defined over a common domain.
Lossless Decomposition The decomposition process used in normalization is a lossless (or nonloss) decomposition. I.e., the join of the projects is equal to the original relation. O/w spurious data is introduced.
Fifth Normal Form Let X, Y, …, Z represent subsets of the set of attributes for some relation R. R satisfies the join dependency iff it equals the join of its projections on X, Y, …, Z. A relation is in 5NF iff each of its join dependencies is a consequence of its candidate keys.
Example Relation in 4NF: Employee (ID, Skill, Job) Sample tuples: <100, Sk-1, Job-4> <100, Sk-1, Job-2> <100, Sk-14, Job-4> <200, Sk-14, Job-4> <100, Sk-7, Job-2> <200, Sk-14, Job-5>
Example (continued) Projections of Employee: R1 (ID, Skill) R2 (Skill, Job) R3 (ID, Job) Consider the tuples belonging to each projection. Is the join of the projects equal to Employee?
Example (continued) Join R1 and R2 over Skill. Spurious data introduced: <100, Sk-14, Job-5> Join resulting relation and R3 over (ID, Job). Result? original Employee tuples
Example (continued) Odd restriction on updating Employee. E.g., suppose Employee contains: <100, Sk-14, Job-4> <100, Sk-7, Job-2> If we wish to add: <200, Sk-14, Job-2> we must also add: <100, Sk-14, Job-2>