Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd

Fundamentals/ICY: Databases2013/14Week 10 –Monday –Normalization, contd John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK

Reminder

Second Normal Form • An entity type is in second normal form (2NF) if: • It is in 1NF and • It includes nopartialdependencies

Conversion to 2NF • For each determinantD involved in a partial dependency in the original entity type T, • use D as, also, the PK for a new entity typeNT(D) • andmove outthe attributes X determined by Dinto NT(D). • D itself stays in T as well as being copied into NT(D).

Reminder:Partial and Transitive Dependencies

Second Normal Form (2NF) Conversion results on example on previous slide

New

Third Normal Form • An entity type is in third normal form (3NF)if: • It is in 2NF and • It contains no transitive dependencies

Ent. Type in 2NF but not in 3NF because of a “transitive” dependency

Transitive Dependencies • A prime attribute is one that is within some candidate key • (not necessarily the primary key). • So a non-prime attribute is, in particular, not within the PK. • A transitive dependencyis where thedeterminant Disat least partially outside the PK and is not a superkey, • and the determined attribute X is non-prime (the reason for this restriction is on a later slide). • E.g.: previous Figure for simple case of a simple (= one-attribute) determinant. • Above definition is partly based on Garcia-Molina, Ullman & Widom 2009. More general than the account in our textbook.

Conversion to 3NF • For each determinantD involved in a transitive dependency in the original entity type T, • use D as, also, the PK for a new entity type NT(D) • andmove outthe attributes X transitively determined by Dinto NT(D). • NB: the determinants themselves stay in T as well.

Third Normal Form (3NF) Conversion Results on previous example

The Boyce-Codd Normal Form (BCNF) • Determinants of partial and transitive functional dependencies are not superkeys. • So the corresponding normalization gets rid of some non-superkey determinants used in functional dependencies. • Normalization into BCNF gets rid of all such determinants. • An entity type is in BCNF if it’s in 1NF and every determinant in a functional dependency is a superkey • i.e., every attribute-set that determines anyother attribute determines all the attributes, so there’s no redundancy problem

An Entity Type in 3NF but not in BCNF The dependency is NOT TRANSITIVE since B is prime

Decomposition to BCNF The middle diagram shows that changing the PK so as to include C instead of B changes the dependency into a partial one, which can then be removed in the usual way.

((ASIDE: A Simple Form of BCNF)) • Any simple (= one-attribute) superkey is a candidate key. • So BCNF requires, in particular, all simple determinants to be candidate keys. • Some books (incl. our textbook) define BCNF merely to mean in effect that all simple determinants are candidate keys. • This is a simpler, less general form of BCNF. • A table could be in simple-BCNF but not be in full BCNF. • My definition of (full) BCNF is from Garcia-Molina, Ullman & Widom, Database Systems: The Complete Book, 2nd. Ed., Pearson, 2009. • This book also gives a process for conversion to full BCNF.

BCNF versus 3NF • BCNF implies that there are no partial or transitive dependencies, so a table that is in BCNF is also in 3NF. • ((If a table is in 3NF but not BCNF then each of the non-superkey determinants D is partly outside the PK and determines only prime attributes. • If also the PK is the only candidate key, then: • the attributes determined by each D must all be in the PK; • but they cannot cover all of the PK (otherwise D would be a superkey). So the PK must be composite.))

((A Reason for Prime-X Exclusion in Transitive Dependencies)) • Earlier we said that in a transitive dependency the determined attribute X is non-prime (i.e. not within a candidate key). The reason is: • In removing a transitive dependency, we delete the dependent attribute X from the original entity type. If X were within the primary key (special case of candidate key), that key would therefore be disrupted, and this would affect other entity types referencing the table. • But non-primary candidate keys are also sometimes used for such referencing, and are then called secondary keys. So if X were in such a key, the conversion to 3NF would disrupt the referencing. • So, to keep things simple for the purposes of 3NF, all prime Xs are banned from being transitively dependent.

((Inter-Table Reference Disruption contd.)) • NB: Conversion to 2NF can, and from 3NF to BCNF does, remove dependent prime attributes, so is potentially disruptive of reference between entity types. • But I assume that in practice it’s rarely a problem in conversion to 2NF, because, in partial dependencies, the dependent attributes are rarely prime. In particular, they cannot be in the PK. • By contrast, if a 3NF table is not in BCNF then the troublesome dependencies necessarily involve prime Xs (see a previous slide).

((3NF and Reference Disruption contd.)) • Some textbooks (e.g., Connolly and Begg, Database Systems, Pearson, 2010) only require transitive dependencies to avoid non-primary-key attributes, rather than to avoid all prime attributes. In that case, conversion to 3NF can disrupt references using a secondary key. But at least the cases of 2NF and 3NF are now more similar to each other. • I haven’t seen a version of 2NF that is only concerned with non-prime Xs. But don’t be too surprised if you come across that!

Material on 4NF:in Week 11 if there’s time (or in Revision Week)

Normal Forms Overall • Let “<” mean “provides less protection than”. Then: • 1NF < 2NF < 3NF < BCNF ((and 3NF <4NF)) • ((Also BCNF < 4NF under the second definition of 4NF. • BCNF and 4NF guard against relatively unusual situations. BCNF is more disruptive to achieve than 2NF or 3NF. • Merely requiring 2NF is now unusual. • So 3NF is a reasonable target.

Non-Normalization/Denormalization • Normalization leads to more tables. • Joining larger number of tables takes additional disk input/output (I/O) operations, additional manipulation complexity, and possibly substantial communication delays. • Conflicts among design principles, information requirements, and processing speed are often resolved through compromises that may include ending up with some non-normalized tables.

Non-/Denormalization (continued) • Unnormalized tables in a production database tend to have these defects: • Data updates are less efficient to the extent that programs that read and update tables must deal with larger tables • ((Indexing is much more cumbersome)) • ((Unnormalized tables yield no simple strategies for creating virtual tables known as views))

Summary:Normalization and Database Design • Normalization helps eliminate data redundancies and some other aspects of poor structure. • Normalization focusses on problems in individual entity types. • Difficult to separate normalization from overall ER modelling process. • Normalization cannot, by itself, guarantee good designs. • 3NF is often enough, but BCNF, 4NF etc. may also need to be considered. • Non-normalized entity types may be desirable in some cases, to increase processing speed and/or reduce conceptual complexity of operations.

Fundamentals/ICY: Databases 2013/14 Week 10 –Monday –Normalization, contd