210 likes | 520 Views
Chapter 8 Normal Forms Based on Functional Dependencies. Deborah Costa Oct 18, 2007. 8.1 Normalization. Data redundancy and the consequent modification (insertion, deletion, and update) anomalies can be traced to “undesirable” functional dependencies in a relation schema
E N D
Chapter 8Normal Forms Based on Functional Dependencies Deborah Costa Oct 18, 2007
8.1 Normalization Data redundancy and the consequent modification (insertion, deletion, and update) anomalies can be traced to “undesirable” functional dependencies in a relation schema Desirable FD: is any FD in a relation schema, R where the determinant is a candidate key of R; this will not cause data redundancy. Undesirable FD: is where the determinant of an FD in R is not a candidate key of R and this will cause data redundancy.
A little bit of the History • Database Normalization was first proposed by Edgar F. Codd. • Codd defined the first three Normal Forms, which we’ll look into, of the 7 known Normal Forms. • In order to do normalization we must know what the requirements are for each of the three Normal Forms that we’ll go over. • One of the key requirements to remember is that Normal Forms are progressive. That is, in order to have 3rd NF we must have 2nd NF and in order to have 2nd NF we must have 1st NF.
Normalization: Update Anomaly The same information can be expressed on multiple records; therefore updates to the table may result in logical inconsistencies. Example: each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfully—if, that is, the employee's address is updated on some records but not others—then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly. An update anomaly. Employee 519 is shown as having different addresses on different records.
Normalization: Insertion Anomaly There are circumstances in which certain facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code—thus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly-hired faculty member who has not yet been assigned to teach any courses. This phenomenon is known as an insertion anomaly. An insertion anomaly.Until the new faculty member is assigned to teach at least one course, his details cannot be recorded.
Normalization: Deletion Anomaly There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears. This phenomenon is known as a deletion anomaly. A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses.
Normalization (cont) In order to eliminate this problem with undesirable FD is to somehow render the undesirable FDs desirable and the process of doing this is called normalization. Normal Forms (NFs) provides a stepwise progression towards the goal of a fully normalized relation schema that is guaranteed to be free of data redundancies that cause modification anomalies from a functional dependency perspective.
Normalization (cont) • A relation schema is said to be in a particular normal form if it satisfies certain prescribed criteria; otherwise the relation is said to violate the normal form. The violation of each of these normal forms signals the presence of a specific type of “undesirable” FD. • It is important to note that the normalization process is anchored to the candidate key of a relation schema, R. • We will use the primary key as the basis for evaluating and normalizing a relation schema.
First Normal Form (1NF) First Normal form imposes conditions sot that a base relation which is physically stored as a file does not contain records with a variable number of fields. This is accomplished by prohibiting multi-valued attributes, composite attributes, and combinations thereof in a relation schema. As a consequence the value of an attribute in a tuple of a relation can be neither a set of values, nor another tuple. Such constraint in effect prevents relations from containing other relations.
1NF Violation and Resolution Figure 8.1 pg 348 As you can see this is schema violates the 1NF because there are multiple Artirst_nm associated with an Album_no or the domain of Artist_nm does not have atomic values. In fact by definition, ALBUM is not even a relation.
1NF Violation and Resolution Figure 8.1 pg 348 In order to fix ALBUM we must expand the relation so that there is a tuple for each (atomic) Artist_nm for a given Album_no. The primary key for this is {Album_no, Artist_nm} as we all should hopefully know by now.
Second Normal Form (2NF) • The requirements to satisfy the 2nd NF: • All requirements for 1st NF must be met. • Redundant data across multiple rows of a table must be moved to a separate table. The resulting tables must be related to each other by use of foreign key.
2nd NF Example • Only Candidate key is (Employee, Skill) • Not in 2NF • Current Work Location is dependent on Employee • Can Cause an Anomaly Updating Jones Work location for Typing and Shorthand but not Whittling. Then asking “What is Jones current work location”, can cause a contradictory answer, because there are 2 different locations.
2nd NF Example • Both tables are in 2NF • Meets 1NF requirements • No non-primary key attribute is dependent on part of a key
Third Normal Form (3NF) The requirements to satisfy the 3rd NF: • All requirements for 2nd NF must be met. • Eliminate fields that do not depend on the primary key; That is, any field that is dependent not only on the primary key but also on another field must be moved to another table
Third Normal FormExample Eliminate Columns Not Dependent On Key i.e. if a column is in a relation, then it must be dependent on the key.
Third Normal FormExample Move non-key-dependent attributes to a new table.
Boyce-Codd Normal Form(BCNF) After Codd proposed the first three normal forms in 1972, it was discover that the 3NF did not satisfactorily handle a more general case of undesirable functional dependencies. In other words data redundancies and the consequent modification anomalies due to functional dependencies can persist even after a relation schema is normalized to 3NF. Use this mnemonic by Brian Moran to help you remember: • 3NF, but… • All functional dependencies imply the only whole key. • "The key, the whole key, and nothing but the key, so help me Codd.“
8.2 The Motivating Exemplar Revisited Normalization concepts have been presented by analyzing 1NF, 2NF 3NF and BCNF in isolation. However in practice normal form violations rarely occur in isolation. We can see from figure 8.8a that STOCK follows 1NF because there are not composite or multi-valued attributes in it.
Motivating Exemplar Revisited (cont) Using Armstrong’s axioms we get {Store, Product} and {Manager, Location, Product} for candidate keys, however we choose {Store, Product} as a primary key. Now that we have the primary key for STOCK we can see that: fd1, fd2, fd3 and fd4 violates 2NF in STOCK fd6 violates 3NF in STOCK. fd7 violates BCNF in STOCK To fix all of the violations above we must decompose the relational schema D:{R1 R2 R3 R4 R5}
Motivating Exemplar Revisited (cont) • This section is very confusing in my opinion. So for better understanding please read it more then once. • After reading a couple of times we should be able to know how to decompose the base relation schema under investigation and know if our decomposition is complete and correct without looking at the same data. • A decomposition is “complete” when it is a dependency-preserving lossless-join decomposition. Preservation of FDs is a verification process and is accomplished by inspecting the decomposition to see if the union of the FDs hold on individual relation schema of D is a cover for F. This is demonstrated in Section 8.1.5.1.You should also test for the lossless-join property, the method for testing is presented in Section 8.1.5.2.