730 likes | 744 Views
Orange Coast College Business Division CS/CIS Department Fall 2004 CIS 182 Introduction to Database Concepts. Instructor Dr. Martha Malaty Text & Original Presentations Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel, 2004. Chapter 5.
E N D
Orange Coast CollegeBusiness DivisionCS/CIS DepartmentFall 2004 CIS 182Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original Presentations Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel, 2004
Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel
In this chapter, you will learn: • What normalization is and what role it plays in the database design process • About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF • How normal forms can be transformed from lower normal forms to higher normal forms • That normalization and ER modeling are used concurrently to produce a good database design • That some situations require denormalization to generate information efficiently Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Database Tables & Normalization • Good table structure is essential for good DB design • Normalization is the way to reach good table structure • Normalization is the process for assigning attributes to entities through evaluating and correcting table structures • Reduces data redundancies • Helps eliminate data anomalies • Produces controlled redundancies to link tables • Works through sequence of stages (normal forms) Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Database Tables & Normalization • Normalizationworks through a series of stages called normal forms: • First normal form (1NF) • Second normal form (2NF) • Third normal form (3NF) • Fourth normal form (4NF) • . . . • 2NF is better than 1NF; 3NF is better than 2NF; … • For most business database design purposes, 3NF is highest we need to go in the normalization process • Highest level of normalization is not always most desirable Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Tutorials Websites On Normalization • What do you need to know about • Webopedia • Rules of data normalization • 3NF Tutorial • ServerWatch • VBMySQL.com Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Unnormalized vs. Denormalized • Unnormalized data: • No ER model is developed when creating the database • Denormalized data • Starting with normalized data, adding redundancy for better performance • We cannot denormalize unnormalized data Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
The Need for Normalization • Example: A company manages building projects • Charges its clients by billing hours spent on each contract • Hourly billing rate is dependent on employee’s position • Periodically, a report is generated that contains information displayed in Table 5.1 • See RPT_FORMAT table in ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
A Sample Report Layout Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
A Table in the Report Format Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Example: Construction company • Three major problems: • PROJ_NUM contains repeating groups • Table entries invite data inconsistencies • Table displays data anomalies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Construction Company Problems • PROJ_NUM contains “repeating groups” • Several data entries with NULL entries indicating that all entries belong to the same PROJ_NUM • PRO_NUM is intended to be primary key • Primary key can’t be NULL • Relational table can’t contain repeating groups Repeating groups Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Construction Company Problems Possible inconsistency • Table entries invite data inconsistencies • Elect. Engineer could mistakenly be entered in another abbreviated form (e.g. Elect.Eng, EE, …) Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Construction Company Problems • Table displays data anomalies • Update anomalies • Modifying JOB_CLASS needs several alterations • Insertion anomalies • New employee must be assigned project • Deletion anomalies • If employee is deleted, other vital data might be lost Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Construction Company Problems • Structure of data set in Fig. 5.1 (ConstructCo.mdb) does not handle data very well • The table structure appears to work; report is generated with ease • Unfortunately, the report may yield different results, depending on what data anomaly has occurred Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Levels of Normalizations Un-normalized) 1NF 2NF 3NF BCNF 4NF 5NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Conversion to First Normal Form • Repeating group • Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence • Relational table must not contain repeating groups • Normalizing the table structure will reduce these data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Conversion to First Normal Form • Normalization is three-step procedure • Eliminate Repeating Groups • Identify the Primary Key • Identify all Dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 1: Eliminate Repeating Groups • Present data in a tabular format, where each cell has a single value and there are no repeating groups • Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Data Organization: 1NF • See DATA_ORG_INF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 2: Identify the Primary Key • Primary key must uniquely identify attribute values • New key must be composed • What would be the primary key for the DATA_ORG_INF table? Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 3: Identify all Dependencies • Dependencies can be depicted with the help of a “Dependency Diagram” • Depicts all dependencies found within a given table structure • Helpful in getting bird’s-eye view of all relationships among a table’s attributes • Makes it much less likely that an important dependency will be overlooked Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Dependency Diagram: 1NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
First Normal Form • Tabular format in which: • All key attributes are defined • There are no repeating groups in the table • All attributes are dependent on primary key • All relational tables satisfy 1NF requirements • Some tables contain “Partial dependencies” • Dependencies based on only part of the primary key • Sometimes used for performance reasons, but should be used with caution • Still subject to data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Conversion to Second Normal Form • Relational database design can be improved by converting the database into second normal form (2NF) • Two steps • Identify All key components • Identify the dependent attributes Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 1: Identify All Key Components • Write each key component on separate line, and then write the original (composite) key on the last line • Each component will become the key in a new table • Example: • Proj_Num • Emp_Num • Proj_Num, Emp_Num Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 2: Identify Dependent Attributes • Determine which attributes are dependent on which other attributes • At this point, most anomalies have been eliminated • Example: • Proj_Num Proj_name • Emp_Num Emp_Name, Job_Class, Chg_Hour • Proj_Num, Emp_Num Hours Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Second Normal Form (2NF) Conversion Results Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Second Normal Form • Table is in second normal form (2NF) if: • It is in 1NF and • It includes no partial dependencies: • No attribute is dependent on only a portion of the primary key Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Conversion to Third Normal Form • Three steps • Identify each new determinant • Identify the dependent attributes • Remove the dependent attributes from transitive dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 1: Identify New Determinants • Determinant • Any attribute whose value determines other values within a row • For every transitive dependency, write its determinant as a PK for a new table • Example • Job_Class Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 2: Identify Dependent Attributes • Identify the attributes dependent on each determinant identified in Step 1 and identify the dependency • Name the table to reflect its contents and function • Example • Job_Class Chg_Hour Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Step 3: Remove Dependent Attributes from Transitive Dependencies • Eliminate all dependent attributes in transitive relationship(s) from each table that has such a transitive relationship • Draw a new dependency diagram to show all tables defined in Steps 1–3 • Check new tables and modified tables from Step 3 to make sure that each has a determinant and does not contain inappropriate dependencies • Example: • Proj_Num Proj_name • Emp_Num Emp_Name, Job_Class • Proj_Num, Emp_Num Hours • Job_Class Chg_Hour Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Third Normal Form (3NF) Conversion Results Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Third Normal Form • A table is in third normal form (3NF)if: • It is in 2NF and • It contains no transitive dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Improving the Design • Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies • Normalization cannot, by itself, be relied on to make good designs • It is valuable because its use helps eliminate data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Improving the Design (continued) • The following changes were (or should be) made: • PK assignment • Naming conventions • Attribute atomicity • Adding attributes • Adding relationships • Refining PKs • Maintaining historical accuracy • Using derived attributes Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
PK assignment • Addition of JOB_CODE attribute greatly decreases the likelihood of referential integrity violation • Example: • One would write a job class as “DB Designer” and another as “Database Designer” although both refer to the same class • Problem: The new key produces transitive dependency • JOB_CODE -> JOB_CLASS, CHG_HOUR • JOB_CLASS -> CHG_HOUR • The effect of transitive dependency is less problematic than integrity violation Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Naming conventions • Adhere to naming convention by having the table name as the first prefix of the attribute name • Example: • CHG_HOUR is changed to JOB_CHG_HOUR • JOB_CLASS should be changed to JOB_DESCRIPTION to better describe the entity • HOURS is changed to ASSIGNED_HOURS to associate it with the ASSIGN table Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Attribute atomicity • Atomic attribute: • Attribute that cannot be further divided • Gains querying flexibility • Example: EMP_NAME is not atomic • Use EMP_LNAME, EMP_FNAME, EMP_INITIAL instead Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Adding Attributes • Adding attributes that are needed in the real-world environment • Example: EMP_HIREDATE can be used to track employee’s “job longevity” to award bonuses to long-term employees & other morale enhancing measures Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Adding Relationships • To avoid unnecessary data duplication • Example: • To be able to supply detailed information about each project’s manager, use EMP_NUM as a FK in PROJECT table Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Refining PKs • If an employee makes two or more entries for the same project, replace EMP_NUM & PROJ_NUM as a combined key, by a system-assigned, surrogate key, ASIGN_NUM, in the ASSIGN table • Surrogate key: • At the implementation level, a system-defined attribute that is created and managed by the DBMS. • It is automatically incremented for each row. • In Oracle for example we use the "sequence" object for surrogate keys. • EMP_NUM & PROJ_NUM can still be used as FK’s Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Maintaining historical accuracy • Writing the job charge per hour in ASSIGN table is crucial to maintain historical accuracy, in case that the job charge per hour changes over time Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Using derived attributes • Derived attributes can be calculated when needed • Storing a derived attribute makes it easier to write application SW to produce the desired results and saves reporting time • ASSIGN_CHARGE is the result of multiplying ASSIGN_HOURS by ASSIGN_CHG_HOUR Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
The Completed Database • See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
The Completed Database (continued) • See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
The Completed Database (continued) • See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Limitations on System-Assigned Keys • System-assigned (surrogate) primary key may not prevent confusing entries • Example: • JOB_CODE attribute was designed to be a PK of JOB table • This does not prevent duplicating existing records with different PK values • Data entries in Table 5.2 are inappropriate because they duplicate existing records • Yet there has been no violation of either entity integrity or referential integrity Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
The Boyce-Codd Normal Form (BCNF) • Google search • About.com • A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key. • Candidate key • Has same characteristics as primary key, but for some reason, not chosen to be primary key • If a table contains only one candidate key, the 3NF and the BCNF are equivalent • BCNF can be violated only if the table contains more than one candidate key • Most designers consider the BCNF as a special case of 3NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel