1 / 59

Chapter 5 Normalization of Database Tables

Chapter 5 Normalization of Database Tables. Database Systems: Design, Implementation, and Management Peter Rob & Carlos Coronel. In this chapter, you will learn:. What normalization is and what role it plays in database design About the normal forms 1NF , 2NF , 3NF , BCNF , and 4NF

rbean
Download Presentation

Chapter 5 Normalization of Database Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5Normalization of Database Tables Database Systems: Design, Implementation, and Management Peter Rob & Carlos Coronel

  2. In this chapter, you will learn: • What normalization is and what role it plays in database design • About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF • How normal forms can be transformed from lower normal forms to higher normal forms • That normalization and E-R modeling are used concurrently to produce a good database design • That some situations require denormalization to generate information efficiently

  3. Database Tables and Normalization • Normalization • Process for evaluating and correcting table structures to minimize data redundancies • process for assigning attributes to tables. It • reduces data redundancies • helps eliminate data anomalies. • produces controlled redundancies to link tables • Normalization works through a series of stages called normal forms: • First normal form (1NF) • Second normal form (2NF) • Third normal form (3NF) • Fourth normal form (4NF)

  4. Database Tables and Normalization • Normalization • 2NF is better than 1NF; • 3NF is better than 2NF • For most business database design purposes, 3NF is as high as we need to go in normalization process • The highest level of normalization is not always most desirable.

  5. Database Tables and Normalization • The Need for Normalization • Case of a Construction Company • Building project -- Project number, Name, Employees assigned to the project. • Employee -- Employee number, Name, Job classification • The company charges its clients by billing the hours spent on each project. The hourly billing rate is dependent on the employee’s position. • Periodically, a report is generated. Table 5.1 • The easiest way to generate the required report might seem to be a table whose contents correspond to the reporting requirements. Figure 5.1

  6. Database Tables and Normalization • Need for Normalization:Problems with the Figure 5.1 • The project number is intended to be a primary key, but it contains nulls. • The table displays data redundancies. • The table entries invitedata inconsistencies. • The data redundancies yield the following anomalies: • Update anomalies. (modify JOB_CLASS for Employee 105) • Insertion anomalies. (a new Employee not yet assigned) • Deletion anomalies. ( Employee 103 quits)

  7. The Normalization Process

  8. Database Tables and Normalization • Conversion to 1NF • Repeating groups – a group of multiple entries can exist for any single key attribute occurrence. • Repeating groups must be eliminatedAny project number (PROJ_NUM) can have a group of several data entries. • A relational table must not contain repeating groups.

  9. Step 1: Eliminate the Repeating Groups – Repeating groups can be eliminated by adding the appropriate entry in at least the primary key column(s). • Step 2: Identify the Primary Key • Uniquely identifies attribute values (rows) • Combination of PROJ_NUM and EMP_NUM

  10. Database Tables and Normalization • Step 3: Identify all Dependencies • Dependency Diagram • The primary key components are bold, underlined, and shaded in a different color. • The arrows above entities indicate all desirable dependencies ( dependencies based on PK ) • The arrows below the dependency diagram indicate less desirable dependencies – • partial dependencies ( dependencies based on only a part of PK ) • transitive dependencies ( nonprime attribute → nonprime attribute ) • Prime attribute = Key attribute • Nonprime attribute = Nonkey attribute

  11. Database Tables and Normalization • EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR • PROJ_NUM → PROJ_NAME • JOB_CLASS → CHG_HOUR

  12. Database Tables and Normalization • 1NF Definition • The term first normal form (1NF) describes the tabular format in which: • All the key attributes are defined. • There are no repeating groups in the table. Each row/col intersection can contain one and only one value, not set of values. • All attributes are dependent on the primary key. • All relational tables satisfy the 1NF requirements. • 1NF Drawback • Partial dependencies(EMP_NUM → EMP_NAME, JOB_CLASS_, CHG_HOUR)→→ data redundancies→→ data anomalies

  13. Database Tables and Normalization • Conversion to 2NF • Step 1: Identify All Key Components • Writing each key component on a separate line, and then • writing the original key on the last line and PROJ_NUM EMP_NUM PROJ_NUM, EMP_NUM • Step 2: Identify the Dependent Attributes • Writing the dependent attributes after each new key. PROJECT ( PROJ_NUM,PROJ_NAME) EMPLOYEE ( EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN ( PROJ_NUM, EMP_NUM, HOURS)

  14. Database Tables and Normalization • 2NF Definition • A table is in 2NF if: • It is in 1NF and • It includes nopartial dependencies; that is, no attribute is dependent on only portion of primary key. • A table whose primary key is notcompositemust automatically be in 2NF.BECAUSE a partial dependency can exist only if a table has a composite primary key • It is still possible for a table in 2NF to exhibit transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes. • 2NF Drawback • Transitive dependencies(JOB_CLASS → CHG_HOUR)→→ data redundancies→→ data anomalies

  15. Database Tables and Normalization • Conversion to 3NF • Create a separate table with attributes in a transitive functional dependence relationship. • Step 1: Identify Each New Determinant JOB_CLASS • Step 2: Identify the Dependent Attributes JOB_CLASS → CHG_HOUR • Step 3: Remove the Dependent Attributes from Transitive Dependencies EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR) PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, HOURS)

  16. Database Tables and Normalization • 3NF Definition • A table is in 3NF if: • It is in 2NF and • It contains notransitive dependencies.

  17. Improving the Design • Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies • Normalizationcannot, by itself, be relied on to make good designs • It is valuable because its use helps eliminate data redundancies

  18. Improving the Design • Issues to address in order to produce a good normalized set of tables: • Evaluate PK Assignments • Evaluate Naming Conventions • Refine Attribute Atomicity • Identify New Attributes • Identify New Relationships • Refine Primary Keys as Required for Data Granularity • Maintain Historical Accuracy • Evaluate Using Derived Attributes

  19. Improving the Design • Adding relationships • Project’s manager • EMP_NUM as a FK in PROJECT. Project manager 3NF

  20. Improving the Design • PK assignment • JOB_CODE • Naming conventions • JOB_CHG_HOUR • JOB_CLASS >> JOB_DESCRIPTION 2NF

  21. Improving the Design 3NF • Attribute atomicity • EMP_NAME >> EMP_LNAME,EMP_FNAME,EMP_INITIAL • Adding attributes • EMP_HIREDATE JOB_CLASS

  22. Improving the Design 3NF • Refining PKs • (EMP_NUM+PROJ_NUM) >> ASSIGN_NUM • Maintaining historical accuracy • ASSIGN_CHG_HOUR <<>> JOB_CHG_HOUR • Using derived attributes • ASSIGN_CHARGE = ASSIGN_HOURS × ASSIGN_CHG_HOUR

  23. Surrogate Key Considerations • When primary key is considered to be unsuitable, designers use system-defined surrogate keys • The DBMS can be used to have the system assign the PK values (JOB_CODE)>> to ensure entity integrity

  24. Limitations on system-defined surrogate keys • Data entries in Table 5.3 are inappropriatebecause they duplicate existing records • However, it does not prevent us from making the entries shown in Table 5.3. >> Multiple duplicate records problem • We still must ensure the uniqueness in JOB_DESCRIPTION through the use of a unique index.

  25. Database Tables and Normalization • Boyce-Codd Normal Form (BCNF) • A table is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidatekey. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF can be violatedonly if the table contains more than one candidate key • BCNF is a special case of 3NF. • Figure 5.7 illustrates a table that is in 3NF but not in BCNF. • Figure 5.8 shows how the table can be decomposed to conform to the BCNF form.

  26. A Table That Is In 3NF But Not In BCNF • A + B → C, D • C → B : Nottransitive dependencies (A nonkey attribute is the determinant of a key attribute) → → 3NF • C : Notcandidatekey → → Not In BCNF

  27. The Decomposition of a Table Structure to meet BCNF Requirements • A + B → C, D • C → B Change the PK to A+C • A + C → B, D • C → B • A + C → D • C → B

  28. The Boyce-Codd Normal Form (BCNF) • STU_ID + STAFF_ID → CLASS_CODE, ENROLL_GRADE • CLASS_CODE → STAFF_ID

  29. Decomposition into BCNF Figure 5.9

  30. Normalization and Database Design • Normalization should be part of the design process • E-R Diagram provides macro view • Normalization provides micro view of entities • Focuses on characteristics of specific entities • A micro view of the entities within the ER diagram • Difficult to separate normalization from E-R diagramming • Two techniques should be used concurrently

  31. Normalization and Database Design • Database Design and Normalization Example:(Construction Company) • Summary of Operations: • The company manages many projects. • Each project requires the services of many employees. • An employee may be assigned to several different projects. • Some employees are not assigned to a project and perform duties not specifically related to a project. • Some employees are part of a labor pool, to be shared by allproject teams. • Each employee has a (single) primary job classification. This job classification determines the hourly billing rate. • Many employees can have the same job classification.

  32. Normalization and Database Design • Two Initial Entities: • PROJECT (PROJ_NUM, PROJ_NAME) [3NF ] • EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOUR) No partial dep. Transitive dep. JOB_DESCRIPTION → JOB_CHG_HOUR [2NF ]

  33. Normalization and Database Design

  34. Normalization and Database Design • Three Entities After Transitive DependencyRemoved PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE ( EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE) JOB ( JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR) EMPLOYEE

  35. The Modified ERD For A Contracting Company Because the normalization process yields an additional entity (JOB), we modify the initial ERD. M Is held by

  36. Normalization and Database Design

  37. Normalization and Database Design

  38. Normalization and Database Design • Creation of the Composite EntityASSIGNMENT ASSIGNMENT Is held by The Final ( Implementable) ERD for the Contracting Company

  39. Normalization and Database Design

  40. Normalization and Database Design • Attribute ASSIGN_HOUR is assigned to the composite entity ASSIGN. • “Manages” relationship is created between EMPLOYEE and PROJECT. PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM) ASSIGNMENT(ASSIGN_NUM, ASSIGN_DATE , ASSIGN_HOURS, ASSIGN_CHG_HOUR , ASSIGN_CHARGE,EMP_NUM, PROJ_NUM) EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_HIREDATE, JOB_CODE) JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR) Manages

  41. Normalization and Database Design

  42. Higher-Level Normal Forms • In some databases, multiple multivalued attributes exist • 4NF Definition • A table is in 4NFif it is in 3NF and has no multiple sets of multivalued dependencies.

  43. Higher-Level Normal Forms • An employee can have multiple assignments and can also be involved in multiple service organization. • EMP_SERVICE (volunteer work) and EMP_ASSIGN (assigned project)each may have many different values. • The table contain two sets of multivalued dependencies. Independent 1

  44. A Set of Tables in 4NF • The solution is to eliminate the problems caused by independentmultivalued dependencies.

More Related