240 likes | 431 Views
Database Management Fall 2003 Normalization Chapter 8. Why normalization?. o An “old” approach. (Some history… ) o There are people “out there” who still talk and think in these terms… and so your knowledge of normalization may be taken as one measure
E N D
Database Management Fall 2003 Normalization Chapter 8
Why normalization? o An “old” approach. (Some history… ) o There are people “out there” who still talk and think in these terms… and so your knowledge of normalization may be taken as one measure of your preparedness in the database area. o Normalization is part of the theory behind relational databases. Understanding it (and other aspects of the theory) will give you a firmer grasp of the entire topic. o Normalization remains a useful form of cross-checking, when reviewing data models for completeness and consistency. However, …
“If the principles of data modeling are followed faithfully, then the outcome should be a high-fidelity model and a normalized database. In other words, if you model data correctly, you create a normalized design.” (Watson, p. 206)
Core Concept A relationship between attributes in an entity One or more attributes determine the value of another attribute An identifier functionally determines all the attributes of an entity stock code firm name, stock price, stock quantity, stock dividend If we know stock code we know the value of firm name, etc. Multivalued dependency Formulae (stock dividend, stock price) yield Functional dependency
Yield is fully functionally dependent on stock dividend and stock price because both of these attributes are required to determine the value of yield (stock dividend, stock price) yield Determinant An attribute that fully functionally determines another attribute e.g., stock code determines stock PE Think identifier (primary key) Full functional dependency
In short, “… it is by understanding the relationships between data elements [i.e., attributes] that we determine functional dependency. (Watson, p. 207) Accordingly, normalization is something we do at the attribute level, in order to help us confirm that our entities, their lists of attributes, and their inter-relationships are properly drawn.
The normal forms – an overview Normalizing a relation (entity, table) means getting it into “normal form”… which entails: o first normal form (1NF) o second normal form (2NF) o third normal form (3NF) o Boyce-Codd normal form (BCNF) o fourth normal form (4NF) o fifth normal form (5NF) o domain/key normal form (DK/NF) The normal forms are ‘nested’. That is, if it’s in 2NF, it’s also in 1NF. If it’s in 3NF, then it’s in both 2NF and 1NF. And so on…
Normal Forms are “nested” 1st NF 2nd NF 3rd NF BC NF 4TH NF 5TH NF
First normal form (1NF) • A relation is in first normal form if and only if all columns are single-valued. In other words, no multi-valued attributes! • All rows must have the same number of columns
Second normal form (2NF) A relation is in second normal form if and only if it is in first normal form and all non-key columns are dependent on the entire key. In other words, no attribute should be determined by just part of the primary key.
Violated when a nonkey column is a fact about part of the primary key A column is not fully functionally dependent on the primary key customer-credit in this case Second normal form (2NF) ITEM itemno … CUSTOMER customerid customer-credit … ORDER quantity custid (FK) itemno (FK) …
Third normal form (3NF) A relation is in third normal form if and only if it is in second normal form and has no transitive dependencies. This means that no non-key attribute should be determined by another non-key attribute.
Violated when a non-key column is a fact about another non-key column A column is not fully functionally dependent on the primary key EXCHANGE RATE in this case Third normal form (3NF) STOCK NATION stock code nation code firm name nation name stock price exchange rate stock quantity stock dividend stock PE natcode (FK)
Every attribute must depend on the key the whole key and nothing but the key… so help me Codd! Summary of Normal Forms 1-3
Arises when a table has multiple candidate keys the candidate keys are composite the candidate keys overlap Boyce-Codd normal form (BCNF)
A row should not contain two or more multivalued independent facts Fourth normal form (4NF) STUDENT- STUDENT ENROLMENT SPORT studentid sectionno proficiency … … … SPORT SUBJECT sportcode subjectid … …
A relation is in Boyce-Codd normal form if and only if every determinant is a candidate key. This means that no key attribute should be determined by a non-key attribute or another key attribute. Boyce-Codd normal form (BCNF) Adapted from McFadden et al., 1999
BCNF: The problem resolved… (although it appears to create a new problem)
A table can be reconstructed from other tables There exists some rule that enables a relation to be inferred LEASE AUTOMOBILE AGENT autoid (FK) agentid autoid agentid (FK) . . . custid (FK) . . . . . . Customer custid . . . Fifth normal form (5NF)
AUTOMOBILE- AGENT AGENT AUTOMOBILE autoid agentid autoid (FK) automake agentname agentid (FK) automodel agentloc AUTOMOBILE- CUSTOMER CUSTOMER AGENT- CUSTOMER custid autoid (FK) agentid (FK) custname custid (FK) custaddress custid (FK) • Agent is assigned to customer
As noted, there’s also a domain/key normal form (DK/NF). However . . . “Relations in third normal form (3NF) are sufficient for most practical database applications.” Moreover: “[Violations of 5NF] occur very rarely and are difficult to detect in practice.” And… “[The] practical utility [of DK/NF] is quite limited.” (McFadden et al., 1999)
So, the overall point of normalization: . . . to eliminate problems in the design of database tables that can lead to modification anomalies (insertion, deletion, update). Which is great, but remember: “…if you model data correctly, you create a normalized design.” (Watson, p. 206) (Note that resolving violations of normal form generally involves adding entities… which makes a strong case for a ‘top-down’, business-driven approach to data modeling.)