380 likes | 729 Views
Data Normalization. N ormal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~. Activity: Sample User Report. Can you build the underlying data model from this ? How many tables? What are the relationships?. Understanding Functional Dependence.
E N D
Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~
Activity: Sample User Report Can you build the underlying data model from this? How many tables? What are the relationships?
Understanding Functional Dependence • For attributes A and B, B is functionally dependent on A means each value in column A determines one and only one value in column B. • Written: A B • A determines B • B is the determinant • Ex:SSN Name(Name is functionallydependent on SSN)
Normalization Lingo • Prime attribute = Any attribute which is a primary key, or in the case of a composite key is part of a PK • Non-Prime Attribute = Any attribute which is not part of the PK. • Key Attribute = Prime Attribute • Non-Key Attribute = Non-Prime Attribute
Normalization and FD • Technically, normalization is just the analysis of Functional Dependencies of all columns with respect to the primary key. • There are three “levels” of analysis: • Functional Dependence – any non-prime attributes which as FD on the PK. • Partial Functional Dependence – any non-key attributes which are FD on part of the PK. • Transitive Functional Dependence – any non-key attributes which are FD on some other non-key attribute(s).
Activity: IYCDTYCN! Identify the: • Primary Key? • Prime Attributes? • Non-Prime Attributes Identify the: • Functional Dependencies (WRT the PK) • Partial Functional Dependencies (WRT part of the PK) • Transitive Functional Dependencies (WRT some non-prime attribute)
The Dependency Diagram • The Dependency Diagram is a Very Useful Tool. It depicts the dependencies which exist among the attributes.
Normal Forms • A Normal Form represents the current “state” of the data model. • There are 4 basic normal forms: • Zero Normal Form (0NF) • Non-key attributes exist which are not FD on PK. • First Normal Form (1NF) • All non-key attributes FD on entire PK. • Second Normal Form (2NF) • In 1NF and • No partial functional dependencies exist. • Third Normal Form (3NF) • In 2NF and • No transitive functional dependencies exist.
First Normal Form (1NF) • Definition: • All non-key attributes must be FD on the entire PK. (There must be PKFD for all attributes.) • Rule: • Move each non-key FD column into its own new table. • How to Apply the Rule: For each non-key FD column: • Place non-FD column into a new table • Copy the PK (or part of it) from the original table into the new table. This will be a FK in the new table. • Assign a PK to the new table (typically a composite key of the original Non-FD column and the FK.)
1NF: Example 1/2 • What’s wrong with this data model? • What should be PK be? Why? • Is there an attribute not FD on the PK? • Is it in 1NF already? • What if Erin takes up bass fishing? • I’m planning a ski trip, whom should I contact? • (How do I know Hobby3, skiing and not Hobby1)?
1NF: Example 2/2 • What was done: • Hobbies table created. Contains the originally non FD column, “hobby” • The PK (FID) was copied into the hobbies table. • The PK of the Hobbies table is the combination of FID and Hobby. • Questions: • Is this in 1NF? • Can you reproduce the previous data model from this one? • Who likes skiing? Basketball?
Second Normal Form (2NF) • Definition: • The data model must be in 1NF AND • No partial functional dependencies can exist. • Rule: • Move each partially FD non-key column into its own new table. • How to Apply the Rule: For each partial dependency: • Move all partially FD columns into a new table • Copy the determinant into the new table. • Make the determinant of the partial dependency: • The PK for the new table, FK to the existing table.
2NF: Example 1/2 • What’s wrong with this data model? • What should be PK be? Why? • Do any partial dependencies exist? • Where? • What is the determinant for each, if any? • Is it in 1NF already? 2NF? • I made a mistake, 81HLV3 is a Power edge 5500, not a 4400?
2NF: Example 2/2 • What was done: • Serial Num + SWID is the primary key. • Servers, Software tables created from partial dependencies, where Serial Num,SWID are the determinants. • Serial Num, is the PK for Servers, SWID is the PK for Software, each are also FK’s for the SWInstallation table • Questions: • Is this in 2NF? • Can you reproduce the previous data model from this one?
Third Normal Form (3NF) • Definition: • The data model must be in 2NF AND • No transitive functional dependencies can exist. • Rule: • Move each transitive FD non-key column into its own new table. • How to Apply the Rule: For each transitive dependency: • Move all transitive FD columns into a new table. • Copy the determinant column into the new table. • Make the determinant of the transitive dependency: • The the PK for the new table. • The FK for the original table.
3NF: Example 1/2 • What’s wrong with this data model? • What should be PK be? Why? • Do any transitive dependencies exist? • Where? • What is the determinant for each, if any? • Is it in 1NF already? 2NF? 3NF? • I made a mistake, Koors phone number is 4905? • What’s wrong?
3NF: Example 2/2 • What was done: • Beer ID is the PK. • All transitive dependencies moved into a new table, Distributors. • Distrib ID is the determinant. PK of Distributors table, FK in original Beer table. • Questions: • Is this in 3NF? • Can you reproduce the previous data model from this one?
Higher Normal Forms Yes, there IS more… … and it will blow your mind.
Boyce-Codd Normal Form (BCNF) • Rule: Eliminate key-transitive dependencies • A table in BCNF Means: • The table is in 3NF • It includes no Non-Key attribute which determines a key attribute, or part of a key attribute.
Fourth Normal Form (4NF) • RULE: Eliminate multiple sets of multi-valued dependencies. • A table in 4NF Means: • The table is in 3NF • It includes no sets of attributes which contain multi-valued dependencies.
4NF: An Example Figure 4.15 Set of Tables in 4NF Figure 4.14 Multivalued Dependencies
How “far” should one Normalize? • For relational databases: • 1NF is required, at minimum for practical RDBMS implementations. • The majority of the time data models are normalized to 3NF. • Sometimes certain tables are left in 1NF or 2NF, for performance or practical reasons. • Higher normal forms BCNF, 4NF are rare. • In General, the Higher the NF of your DM: • The more complicated the internal DM • The more “programming” required to reproduce the external DM. • But, the lesser the chance for data anomalies!! • It’s a total trade-off: • Database complexity vs. data anomalies.
Mike’s “Road To 3NF” To normalize correctly, follow this process for each table in the data model: Designate acandidate key Any partialdependencies? Party Hard ! n 2NF PKFD for allattributes? Any transitivedependencies? y n 1NF 3NF y Apply2NF Rule n y Apply1NF Rule Apply3NF Rule
Normalization Summary Cheat Sheet • 0NF 1NF (Resolve non FD) • 1NF 2NF (Resolve Partial FD) • 2NF 3NF (Resolve Transitive FD) O O N O N1 O N2 O O N
Data Normalization Questions?