BCNF & Lossless Decomposition

CS157B Lecture 13 BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science

Normalization • Review on Keys • superkey: a set of attributes which will uniquely identify each tuple in a relation • candidate key: a minimal superkey • primary key: a chosen candidate key • secondary key: all the rest of candiate keys • prime attribute: an attribute that is a part of a candidate key (key column) • nonprime attribute: a nonkey column

Normalization • Functional Dependency Type by Keys • ‘whole (candidate) key  nonprime attribute’: full FD (no violation) • ‘partial key nonprime attribute’: partial FD (violation of 2NF) • ‘nonprime attribute nonprime attribute’: transitive FD (violation of 3NF) • ‘not a whole key prime attribute’: violation of BCNF

Functional Dependencies • Let R be a relation schema   R and   R • The functional dependency  holds onR iff for any legal relations r(R), whenever two tuples t1and t2 of r have same values for , they have same values for . t1[] = t2 []  t1[ ] = t2 [ ] • On this instance, AB does NOT hold, but BA does hold. A B • 4 • 1 5 • 3 7

1. Closure • Given a set of functional dependencies, F, its closure, F+ , is all FDs that are implied by FDs in F. • e.g. If A  B, and B  C, • then clearly A  C

Armstrong’s Axioms • We can find F+ by applying Armstrong’s Axioms: • if   , then   (reflexivity) • if  , then    (augmentation) • if  , and   , then   (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).

Additional rules • If   and , then  (union) • If   , then   and (decomposition) • If   and  , then   (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • Some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • by augmenting CG I to infer CG  CGI, and augmenting of CG H to inferCGI HI, and then transitivity

2. Closure of an attribute set • Given a set of attributes A and a set of FDs F, closure of A under F is the set of all attributes implied by A • In other words, the largest B such that: • A  B • Redefining super keys: • The closure of a super key is the entire relation schema • Redefining candidate keys: • 1. It is a super key • 2. No subset of it is a super key

Computing the closure for A • Simple algorithm • 1. Start with B = A. • 2. Go over all functional dependencies,  , in F+ • 3. If  B, then • Add  to B • 4. Repeat till B changes

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • (AG) + ? • 1. result = AG 2. result = ABCG (A C and A  B) 3. result = ABCGH (CG H and CG  AGBC) 4. result = ABCGHI (CG I and CG  AGBCH Is (AG) a candidate key ? 1. It is a super key. 2. (A+) = BC, (G+) = G. YES.

Uses of attribute set closures • Determining superkeys and candidate keys • Determining if A  B is a valid FD • Check if A+ contains B • Can be used to compute F+

Database Normalization Functional dependency (FD) means that if there is only one possible value of Y for every value of X, then Y is Functionally dependent on X. Is the following FDs hold?

Database Normalization • Functional Dependencyis “good”. With functional dependency the primary key (Attribute A) determines the value of all the other non-key attributes (Attributes B,C,D,etc.) • Transitive dependencyis “bad”. Transitive dependency exists if the primary/candidate key (Attribute A) determines non-key Attribute B, and Attribute B determines non-key Attribute C. • If a relation schema has more than one key, each is called a candidate key • An attribute in a relation schema R is called prim if it is a member of some candidate key of R

First Normal Form (1NF) • Each attribute must be atomic (single value) • No repeating columns within a row (composite attributes) • No multi-valued columns. • 1NF simplifies attributes • Queries become easier.

1NF

Second Normal Form (2NF) • Each attribute must be functionally dependent on the primary key. • If the primary key is a single attribute, then the relation is in 2NF • The test for 2NF involves testing for FDs whose left-hand-side • attribute are part of the primary key • Disallow partial dependency, where non-keys attributes depend on • part of a composite primary key • In short, remove partial dependencies • 2NF improves data integrity. • Prevents update, insert, and delete anomalies.

2NF Given the following FDs: Assuming all attributes are atomic, is the above relation in the 1NF, 2NF ? Relation X1 Relation X3 Relation X2

Third Normal Form (3NF) • Remove transitive dependencies. • Transitive dependency • A non-prime attribute is dependent on another, non-prime attribute or attributes • Attribute is the result of a calculation • Examples: • Area code attribute based on City attribute of a customer • Total price attribute of order entry based on quantity attribute and unit price attribute (calculated value) • Solution: • Any transitive dependencies are moved into a smaller table.

Transitive Dependence Give a relation R, Assume the following FD hold: Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists R2 R1 Note : If address is a prime attribute Then R is in 3NF

Modification Anomalies • What happens when you want to • add a new book? • change the address of a patron? • delete a patron record?

Modification Anomalies • Deletion anomaly • deleting one fact about an entity deletes a fact about another entity • Insertion anomaly • cannot insert one fact about an entity unless a fact about another entity is also added • Update anomaly • changing one fact about an entity requires multiple changes to a table

Referential Integrity Constraint • When we split a relation, we must pay attention to the references across the newly formed relations • E.g., a book must exist before it can be checked out: • CHECKOUT [BookID] ÍBOOK [BookID] • The DBMS or the applications will have to check/enforce constraints

Boyce-Codd Normal Form • Every determinant is a candidate key • ADVISER(SID,Major,Fname) • STU-ADV(SID,Fname)ADV-SUBJ(Fname,Subject)

Multi-valued Dependency • Two or more functionally independent multi-valued attributes are dependent on another attribute • EMPLOYEE(Name,Dependent,Project) • Data redundancy and modification anomalies • 4NF: BCNF & no multi-valued dependencies • EMPLOYEE(Name,Dependent) • EMPLOYEE(Name, Project)

Database Normalization • Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.

A Table That Is In 3NF But Not In BCNF Figure 5.7

The Decomposition of a Table Structure to Meet BCNF Requirements Figure 5.8

Lossless-join Decomposition • For the case of R = (R1, R2), we require that for all possible relations r on schema R r = R1(r ) |X| R2(r ) • A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+: • R1R2R1 • R1R2R2

R = (A, B, C) F = {A B, B C) • Can be decomposed in two different ways • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1 R2 = {B}and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1 R2 = {A}and A  AB • Not dependency preserving (cannot check B C without computing R1 |X|R2)

Dependency Preservation • Let Fibe the set of dependencies F + that include only attributes in Ri. • A decomposition is dependency preserving, if (F1 F2  … Fn )+ = F + • If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.

Dependency Preservation • To check if a dependency  is preserved in a decomposition of R into R1, R2, …, Rn we apply the following test (with attribute closure done with respect to F) • result = while (changes to result) dofor eachRiin the decompositiont = (result Ri)+ Riresult = result t • If result contains all attributes in , then the functional dependency  is preserved.

Dependency Preservation • We apply the test on all dependencies in F to check if a decomposition is dependency preserving • This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2 …  Fn)+

FD Example • R = (A, B, C )F = {AB, B  C}Key = {A} • R is not in BCNF • Decomposition R1 = (A, B), R2 = (B, C) • R1and R2 now in BCNF • Lossless-join decomposition • Dependency preserving

A Lossy Decomposition

Aim of Normalization • Goal for a relational database design is: • BCNF. • Lossless join. • Dependency preservation. • If we cannot achieve this, we accept one of • Lack of dependency preservation • Redundancy due to use of 3NF

Sample Data for a BCNF Conversion Table 5.2

Decomposition into BCNF

Perform lossless-join decompositions of each of the following scheme into BCNF schemes: R(A, B, C, D, E) with dependency set {AB  CDE, C  D, D  E} A B C D A B C D C D A B C E A B C D D E A B C A B C C D D E

Given the FDs {B  D, AB  C, D  B} and the relation {A, B, C, D}, give a two distinct lossless join decomposition to BNCF indicating the keys of each of the resulting relations. A B C D A B C D B D A B C B D A C D

Definition of MVD • A multivalued dependency (MVD) X ->->Y is an assertion that if two tuples of a relation agree on all the attributes of X, then their components in the set of attributes Y may be swapped, and the result will be two tuples that are also in the relation.

Example • The name-addr-phones-beersLiked example illustrated the MVD • name->->phones • and the MVD • name ->-> beersLiked.

Picture of MVD X ->->Y XY others equal exchange

MVD Rules • Every FD is an MVD. • If X ->Y, then swapping Y ’s between two tuples that agree on X doesn’t change the tuples. • Therefore, the “new” tuples are surely in the relation, and we know X ->->Y. • Complementation : If X ->->Y, and Z is all the other attributes, then X ->->Z.

Fourth Normal Form • The redundancy that comes from MVD’s is not removable by putting the database schema in BCNF. • There is a stronger normal form, called 4NF, that (intuitively) treats MVD’s as FD’s when it comes to decomposition, but not when determining keys of the relation.

BCNF & Lossless Decomposition