320 likes | 589 Views
Database. Normalization. Designing Good Schemas. We know how to create schemas, but ... how do we create good schemas? what does good mean? Schema quality measurements: semantics of the attributes minimal redundancy minimal frequency of null values. Functional Dependences.
E N D
Database Normalization
Designing Good Schemas • We know how to create schemas, but ... • how do we create good schemas? • what does good mean? • Schema quality measurements: • semantics of the attributes • minimal redundancy • minimal frequency of null values
Functional Dependences • A column Y of relational table R is functionally dependent up on column X of relational table R if and only if: Each value of X in R associated with each value of Y at any given time
Functional dependences • Y is functional dependent up on X same as values of X identify values of Y • If X Y then XZYZ • IF XY and Y Z then XZ • X Y means that Y depend on X or X identify Y
Examples • S# Ename • {S#, P#} Hours • If for each value of S#, there are exactly one corresponding value for sname, state, city then: S# Sname Sate City
Example • If {S#, p#} Qty S# P# QTY
Redundancy Example • Where’s the redundancy?
Example FDs Proper FDs Transitive FDs Partial Key FD Partial Key FDs
Normal Forms • Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) • The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)
remove multi-valued attributes remove partial dependencies remove transitive dependencies 1NF 2NF 3NF remove remaining FD anomal dependencies remove multivalue dependencies remove remaining anomalies BCNF 4NF 5NF Normalization 0NF
1 NF • First normal form is • NO multi-valued attributes • No composite attribute • No nested relation We create new table or new field (telephone, visiting)
1NF Normalization Proper translation from ER multi-value attributes will achieve 1NF. Still not a good solution,since we have redundancy in Dnumber and Dmgr_ssn.(This will be handled by 2NF.)
2 NF form • Second normal form that if primary key is multiple attribute and non-key attribute depend on part of primary key Hours Cname Loc S# P# pname
2NF Normalization Move the partial key and dependent attributes to a new relation.
Transitive Dependencies • X → Y is a transitive dependency (PD) if there exists Z ⊈ any key such that X → Z → Y • TDs can cause redundancy if there are multiple values of X that determine the same value of Z • the value of Y for that value of Z is stored multiple times • 3NF normalization: move (Z,Y) to new relation in which Z is the primary key
3 NF • The relation in 3NF if it is 2 NF and every non-key attribute is non-transitively dependent on primary key
3NF Normalization • Create new relation to hold the attributes in the transitive FD. • LHS of transitive FD becomes PK of new relation.
DEPT COURSE SECTION ROOM INSTR I_OFFICE DEPT COURSE SECTION ROOM INSTR I_OFFICE COMP 51 1 WPC122 DOHERTY CSB109 COMP 51 2 WPC219 CLIBURN CSB107 COMP 163 1 WPC122 DOHERTY CSB109 COMP 53 1 WPC130 CSB108 BOWRING COMP 53 2 WPC130 CSB104 CARMAN Transitive Dependency Example I_OFFICE (instructor's office) is determined by the non-PK attribute INSTR
DEPT COURSE SECTION ROOM INSTR I_OFFICE DEPT COURSE SECTION ROOM INSTR INSTR I_OFFICE NF Decomposition: Foreign Keys Decomposition:
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key everything.” • Formally, R is in BCNF if for every nontrivial FD for R, say XA, then X is a superkey. • “Nontrivial” = right-side attribute not in left side. Why? 1. Guarantees no redundancy due to FD’s. 2. Guarantees no update anomalies = one occurrence of a fact is updated, not all. 3. Guarantees no deletion anomalies = valid fact is lost when tuple is deleted. Arthur Keller – CS 180
Boyce-Codd Normal Form • Sample data for Course Section table • Because Prefix Department, we know that (Prefix, Num, SecNum) could also be a primary key for this table.
Example Students(name, addr, phones, CarLiked) • A student’s phones are independent of the cars they like. • Thus, each of a student’s phones appears with each of the cars they like in all combinations. • This repetition is unlike redundancy due to FD’s, of which name->addr is the only one.
Example • Students(name, addr, CarLiked, manf, favCar) • FD’s: name->addr favCar, carsLiked->manf • Only key is {name, CarsLiked}. • In each FD, the left side is not a superkey. • Any one of these FD’s shows Students is not in BCNF
Boyce-Codd Normal Form • We say a relation R is in BCNF if whenever X ->A is a nontrivial FD that holds in R, X is a superkey. • Remember: nontrivial means A is not a member of set X. • Remember, a superkey is any superset of a key (not necessarily a proper superset).
Example • Students(name, addr, CarsLiked, manf, favCar) • F = name->addr, name -> favCar, CarsLiked->manf • Pick BCNF violation name->addr. • Close the left side: {name}+ = {name, addr, favCar}. • Decomposed relations: • Students1(name, addr, favCar) • Students2(name, CarsLiked, manf)
3NF and BCNF • 3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation. • X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.
SHIPMENT SID FROM_CITY TO_CITY DISTANCE WEIGHT Exercises • The following relation schema is not in third normal form (3NF). Is this an example of a transitive dependencyor a partial key dependency?Give an equivalent schema that is in 3NF.
Exercises • This relation has been proposed to track Pacific alumni: Alumni( SID, LastName, FirstName, Degree, YearAwarded, Phone).Pacific allows students to receive multiple degrees,possibly in different years. Identify all FDs.Give a new schema that is in third normal form.
Exercises • Consider the following relation schema:Movie(title, genre, length, actor, sag_id, studio, studio_addr) • Every movie has a unique title. • A movie may have multiple actors. • Each actor has a unique sag_id. • An actor may appear in multiple movies. • A movie has exactly one studio, but a studio may produce more than one movie. • Each studio has exactly one address. • Identify all functional dependencies. • Normalize the schema to 3NF.