480 likes | 781 Views
Chapter 3.6-7. Normalization of Database Tables. Normalization. Normalization is process for assigning attributes to entities Reduces data redundancies Helps eliminate data anomalies Produces controlled redundancies to link tables Normalization stages 1NF - First normal form
E N D
Chapter 3.6-7 Normalization of Database Tables
Normalization • Normalization is process for assigning attributes to entities • Reduces data redundancies • Helps eliminate data anomalies • Produces controlled redundancies to link tables • Normalization stages • 1NF - First normal form • 2NF - Second normal form • 3NF - Third normal form • 4NF - Fourth normal form
title length year address Name Name address Example: Starts-in Stars Movies Owns Studios
Problem Example Movies • Update anomalies: If Harrison Ford’s phone # changes, must change it in each of his tuples. If Length value of Star Wars needs to be changed, must change all occurrences • Deletion anomalies: If we delete Wayne’s World entries from database, we also loose all info about Dana Carvey & Mike Meyers
Conversion to 1NF • Repeating groups must be eliminated • Proper primary key developed • Uniquely identifies each tuple • Dependencies can be identified • undesirable dependencies allowed • Partial • based on part of composite primary key • Transitive • one nonprime attribute depends on another nonprime attribute
Conversion to 1NF Cont. • An attribute that is at least part of a key is known as a prime attribute or key attribute or primary key.
Example • Projects assigned to employees • Each project has a number and a name • Each employee has a number, a name, a job class • Each employee working on a project, need to keep number of hours spent on project, and hourly rate. • Project Assignments Table : ( PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS) • What’s the Key for this relation?
Dependency Diagram (1NF) PROJ_ NUM, EMP_NUM --> PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS PROJ_NUM --> PROJ_NAME EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS JOB_CLASS --> CHG_HOUR
1NF Summarized • All key attributes defined • Primary Key identified • No repeating groups in table • All attributes dependent on primary key
2NF Summarized • In 1NF, but • Includes no partial dependencies • Partial dependency: • An attribute is functionally dependent on a portion of the primary key. • Example: • PROJ_NUM PROJ_NAME • EMP_NUM-->EMP_NAME, JOB_CLASS, CHG_HOURS
Conversion to 2NF • Start with 1NF format: • Write each key component on a separate line • Write dependent attributes after each key component • Write original key on last line • Write any remaining attributes after original key • Each component is new table PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
2NF Summarized • In 1NF • Includes no partial dependencies • Still possible to exhibit transitive dependency • Attributes may be functionally dependent on non-key attributes
Conversion to 3NF • decompose table(s) to eliminate transitive functional dependencies PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR)
3NF Summarized • In 2NF • Contains no transitive dependencies
Boyce-Codd Normal Form (BCNF) • Formally, R is in BCNF if for every nontrivial FD for R, say XA, X is a superkey. • “Nontrivial” = right-side attribute not in left side. • Trivial FDs examples • AA • ABA • Informally: the only arrows in the FD diagram are arrows out of superkeys • Note: BCNF violation when relation has more than one superkey that overlap
3NF Table Not in BCNF What normal form?
Decomposition to Reach BCNF Setting: relation R with FDs F. Suppose relation R has BCNF violation XB and X not a superkey.
1. Compute X+. • Cannot be all attributes – why? 2. Decompose R into X+ and (R–X+) X. 3. Find the FD’s for the decomposed relations. • Project the FD’s from F = calculate all consequents of F that involve only attributes from X+ or only from (RX+) X. R X+ X
Decomposition to Reach BCNF • Identify the violating FD • E.g. : X1X2…Xn B1B2…Bm • Add to the right-hand side of FD as many attributes as are functionally determined by (X1X2…Xn) • Decompose relation R into two relations: • One relation has all attributes Xs & Bs • Second relation has the Xs plus any other remaining attributes from R other than Bs
BCNF--Example Assume R(S, J, T) • S: Student • J: subject • T: Teacher • Student S is taught subject J by teacher T. • Constraints: • For each subject, each student of that subject is taught by only one teacher • Each teacher teaches only one subject (but each subject is taught by several teachers)
BCNF--Example • Functional Dependencies: • S , J T • T J
BCNF--Example • Candidate keys: {S, J} and {S, T} • 3NF but not in BCNF • Update anomaly: if we delete the info that Jane is studying Physics we also loose the info that Prof. Brown teaches Physics • Solution: two relations R1{S, T} R2{T, J} S T J
Decomposition Based on BCNF is Necessarily Correct Attributes A, B, C. FD: B C Relations R1[A,B] R2[B, C] Tuples in R: (a, b, c) Tuples in R1: (a, b) Tuples in R2: (b, c) Natural join of R1 and R2 = (a, b, c) original relation R can be reconstructed by forming the natural join of R1 and R2.
Decomposition Based on BCNF is Necessarily Correct Attributes A, B, C. FD: B C Relations R1[A,B] R2[B, C] Tuples in R: (a, b, c) , (d, b, e) Tuples in R1: (a, b), (d, b) Tuples in R2: (b, c), (b, e) Tuples in the natural join of R1 and R2: (a,b,c), (a,b, e) (d, b, c), (d, b, e) Can (a,b,e), (d, b, c) be a bogus tuples?
Decomposition Based on BCNF is Necessarily Correct • Answer: No • Because: B C i.e. if 2 tuples have same B attribute then they must have the same C attribute. (b,c) = (b,e) (a, b,e) = (a, b,c) and (d, b, c) = (d, b, e)
Theorem • Any two-attribute relation is in BCNF.
Decomposition Theorem • Suppose we decompose a relation R(X, Y, Z) into R1(X, Y) and R2(X,Z) and project the R onto R1 and R2. • Then join(R1, R2) is guaranteed to reconstruct R if and only if XY or XZ • Notice that whenever we decompose because of a BNCF violation, one of the above FDs holds.
3NF One FD structure causes problems in BCNF: • If you decompose, you can’t recover all of the original FD’s. • If you don’t decompose, you violate BCNF. Abstractly: ABC and CB. • Example : street city zip, and zip city. BCNF violation: CB has a left side that is not a superkey. • Based on previous algorithm, decompose into BC and AC. • But the FD ABC does not hold in new tables.
Example A = street, B = city, C = zip. zip city BCNF violation Decompose: It is a bad idea to decompose relation because you loose the ability to check the dependency: street city zip
Example zip city Decompose: It is a bad idea to decompose relation because you loose the ability to check the dependency: street city zip
“Elegant” Workaround Define the problem away. • A relation R is in 3NF iff (if and only if)for every nontrivial FD XA, either: 1. X is a superkey, or 2. A is prime = member of at least one key. • Thus, if we just normalize to the 3NF, the problem goes away.
What 3NF and BCNF Give You • There are two important properties of a decomposition: • Recovery: it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original. • Dependency Preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.
3NF and BCNF, Continued • We can get (1) with a BCNF decomposition. • We can get both (1) and (2) with a 3NF decomposition. • But we can’t always get (1) and (2) with a BCNF decomposition. • street-city-zip is an example.
Mutli-valued Dependencies Fourth Normal Form
Definition of MVD • A multivalued dependency is an assertion that two attributes (sets of attributes) are independent of one another. • Formally: A multivalued dependency (MVD) on R, X ->->Y, says that if two tuples of R agree on all the attributes of X, then their components in Y may be swapped, and the result will be two tuples that are also in the relation.
Example Actors(name,addr,phones,cars) with MVD Namephones. name addr phones cars sue ap1b1 sue ap2b2 it must also have the same tuples with phonescomponents swapped: name addr phones cars sue ap2b1 sue ap1b2 Note: we must check this condition for all pairs of tuples that agree on name, not just one pair.
Example 2 • An actor may have more than one address • Key? What normal form? • Note the redundancies • MVD: name street city • read: name determines 1 or more street & city independent of all other attributes
MVD Rules 1. Every FD is an MVD. • Because if X Y, then swapping Y’s between tuples that agree on X doesn’t create new tuples. • Example, in Actors: nameaddr. • Note: the opposite is not true i.e. not every MVD is a FD 2. Complementation: if XY, then XZ, where Z is all attributes not in X or Y. • Example: since namephones holds in Actors, the nameaddr cars.
Splitting Doesn’t Hold • namestreet cityholds, but • namestreet does not hold • Name does not determine 1 or more street independent of city. • namecitydoes not hold
Example 2 • An actor may have more than one address • MVD: name street city • read: name determines 1 or more street & city independent of all other attributes • Also (complement MVD): name title year
Fourth Normal Form • The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. • There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.
4NF Eliminate redundancy due to multiplicative effect of MVD’s. • Roughly: treat MVD’s as FD's for decomposition, but not for finding keys. • Formally: R is in Fourth Normal Form if whenever MVDXY is nontrivial (Y is not a subset of X, and X Y is not all attributes), then X is a superkey. • Remember, X Y implies XY, so 4NF is more stringentthan BCNF. • Decompose R, using4NF violation XY,into XY and X (R—Y). R Y X
Example Drinkers(name,addr,phones,cars) • FD: nameaddr • Nontrivial MVD’s: namephones namecars. • Only key: {name, phones, cars} • All three dependencies above violate 4NF. Why? • Successive decomposition yields 4NF relations: D1(name, addr) D2(name, phones) D3(name, cars)
Example 2 • namestreet city • Decompose into: R1(name, street, city) R2(name, title, year)