680 likes | 884 Views
CS157B Lecture 13. BCNF & Lossless Decomposition. Prof. Sin-Min Lee Department of Computer Science. Normalization. Review on Keys superkey: a set of attributes which will uniquely identify each tuple in a relation candidate key: a minimal superkey primary key: a chosen candidate key
E N D
CS157B Lecture 13 BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science
Normalization • Review on Keys • superkey: a set of attributes which will uniquely identify each tuple in a relation • candidate key: a minimal superkey • primary key: a chosen candidate key • secondary key: all the rest of candiate keys • prime attribute: an attribute that is a part of a candidate key (key column) • nonprime attribute: a nonkey column
Normalization • Functional Dependency Type by Keys • ‘whole (candidate) key nonprime attribute’: full FD (no violation) • ‘partial key nonprime attribute’: partial FD (violation of 2NF) • ‘nonprime attribute nonprime attribute’: transitive FD (violation of 3NF) • ‘not a whole key prime attribute’: violation of BCNF
Functional Dependencies • Let R be a relation schema R and R • The functional dependency holds onR iff for any legal relations r(R), whenever two tuples t1and t2 of r have same values for , they have same values for . t1[] = t2 [] t1[ ] = t2 [ ] • On this instance, AB does NOT hold, but BA does hold. A B • 4 • 1 5 • 3 7
1. Closure • Given a set of functional dependencies, F, its closure, F+ , is all FDs that are implied by FDs in F. • e.g. If A B, and B C, • then clearly A C
Armstrong’s Axioms • We can find F+ by applying Armstrong’s Axioms: • if , then (reflexivity) • if , then (augmentation) • if , and , then (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).
Additional rules • If and , then (union) • If , then and (decomposition) • If and , then (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • Some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • by augmenting CG I to infer CG CGI, and augmenting of CG H to inferCGI HI, and then transitivity
2. Closure of an attribute set • Given a set of attributes A and a set of FDs F, closure of A under F is the set of all attributes implied by A • In other words, the largest B such that: • A B • Redefining super keys: • The closure of a super key is the entire relation schema • Redefining candidate keys: • 1. It is a super key • 2. No subset of it is a super key
Computing the closure for A • Simple algorithm • 1. Start with B = A. • 2. Go over all functional dependencies, , in F+ • 3. If B, then • Add to B • 4. Repeat till B changes
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • (AG) + ? • 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH Is (AG) a candidate key ? 1. It is a super key. 2. (A+) = BC, (G+) = G. YES.
Uses of attribute set closures • Determining superkeys and candidate keys • Determining if A B is a valid FD • Check if A+ contains B • Can be used to compute F+
Database Normalization Functional dependency (FD) means that if there is only one possible value of Y for every value of X, then Y is Functionally dependent on X. Is the following FDs hold?
Database Normalization • Functional Dependencyis “good”. With functional dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.) • Transitive dependencyis “bad”. Transitive dependency exists if the primary/candidate key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C. • If a relation schema has more than one key, each is called a candidate key • An attribute in a relation schema R is called prim if it is a member of some candidate key of R
First Normal Form (1NF) • Each attribute must be atomic (single value) • No repeating columns within a row (composite attributes) • No multi-valued columns. • 1NF simplifies attributes • Queries become easier.
Second Normal Form (2NF) • Each attribute must be functionally dependent on the primary key. • If the primary key is a single attribute, then the relation is in 2NF • The test for 2NF involves testing for FDs whose left-hand-side • attribute are part of the primary key • Disallow partial dependency, where non-keys attributes depend on • part of a composite primary key • In short, remove partial dependencies • 2NF improves data integrity. • Prevents update, insert, and delete anomalies.
2NF Given the following FDs: Assuming all attributes are atomic, is the above relation in the 1NF, 2NF ? Relation X1 Relation X3 Relation X2
Third Normal Form (3NF) • Remove transitive dependencies. • Transitive dependency • A non-prime attribute is dependent on another, non-prime attribute or attributes • Attribute is the result of a calculation • Examples: • Area code attribute based on City attribute of a customer • Total price attribute of order entry based on quantity attribute and unit price attribute (calculated value) • Solution: • Any transitive dependencies are moved into a smaller table.
Transitive Dependence Give a relation R, Assume the following FD hold: Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists R2 R1 Note : If address is a prime attribute Then R is in 3NF
Modification Anomalies • What happens when you want to • add a new book? • change the address of a patron? • delete a patron record?
Modification Anomalies • Deletion anomaly • deleting one fact about an entity deletes a fact about another entity • Insertion anomaly • cannot insert one fact about an entity unless a fact about another entity is also added • Update anomaly • changing one fact about an entity requires multiple changes to a table
Referential Integrity Constraint • When we split a relation, we must pay attention to the references across the newly formed relations • E.g., a book must exist before it can be checked out: • CHECKOUT [BookID] ÍBOOK [BookID] • The DBMS or the applications will have to check/enforce constraints
Boyce-Codd Normal Form • Every determinant is a candidate key • ADVISER(SID,Major,Fname) • STU-ADV(SID,Fname)ADV-SUBJ(Fname,Subject)
Multi-valued Dependency • Two or more functionally independent multi-valued attributes are dependent on another attribute • EMPLOYEE(Name,Dependent,Project) • Data redundancy and modification anomalies • 4NF: BCNF & no multi-valued dependencies • EMPLOYEE(Name,Dependent) • EMPLOYEE(Name, Project)
Database Normalization • Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.
A Table That Is In 3NF But Not In BCNF Figure 5.7
The Decomposition of a Table Structure to Meet BCNF Requirements Figure 5.8
Lossless-join Decomposition • For the case of R = (R1, R2), we require that for all possible relations r on schema R r = R1(r ) |X| R2(r ) • A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+: • R1R2R1 • R1R2R2
R = (A, B, C) F = {A B, B C) • Can be decomposed in two different ways • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1 R2 = {B}and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1 R2 = {A}and A AB • Not dependency preserving (cannot check B C without computing R1 |X|R2)
Dependency Preservation • Let Fibe the set of dependencies F + that include only attributes in Ri. • A decomposition is dependency preserving, if (F1 F2 … Fn )+ = F + • If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.
Dependency Preservation • To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) • result = while (changes to result) dofor eachRiin the decompositiont = (result Ri)+ Riresult = result t • If result contains all attributes in , then the functional dependency is preserved.
Dependency Preservation • We apply the test on all dependencies in F to check if a decomposition is dependency preserving • This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2 … Fn)+
FD Example • R = (A, B, C )F = {AB, B C}Key = {A} • R is not in BCNF • Decomposition R1 = (A, B), R2 = (B, C) • R1and R2 now in BCNF • Lossless-join decomposition • Dependency preserving
Aim of Normalization • Goal for a relational database design is: • BCNF. • Lossless join. • Dependency preservation. • If we cannot achieve this, we accept one of • Lack of dependency preservation • Redundancy due to use of 3NF
Sample Data for a BCNF Conversion Table 5.2
Perform lossless-join decompositions of each of the following scheme into BCNF schemes: R(A, B, C, D, E) with dependency set {AB CDE, C D, D E} A B C D A B C D C D A B C E A B C D D E A B C A B C C D D E
Given the FDs {B D, AB C, D B} and the relation {A, B, C, D}, give a two distinct lossless join decomposition to BNCF indicating the keys of each of the resulting relations. A B C D A B C D B D A B C B D A C D
Definition of MVD • A multivalued dependency (MVD) X ->->Y is an assertion that if two tuples of a relation agree on all the attributes of X, then their components in the set of attributes Y may be swapped, and the result will be two tuples that are also in the relation.
Example • The name-addr-phones-beersLiked example illustrated the MVD • name->->phones • and the MVD • name ->-> beersLiked.
Picture of MVD X ->->Y XY others equal exchange
MVD Rules • Every FD is an MVD. • If X ->Y, then swapping Y ’s between two tuples that agree on X doesn’t change the tuples. • Therefore, the “new” tuples are surely in the relation, and we know X ->->Y. • Complementation : If X ->->Y, and Z is all the other attributes, then X ->->Z.
Fourth Normal Form • The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. • There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.