270 likes | 407 Views
Chapter 5. Normalization of Database Tables Database Systems: Design, Implementation, and Management, Rob and Coronel Special adaptation for INFS-3200. Database Tables and Normalization. Table is basic building block in database design
E N D
Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Rob and Coronel Special adaptation for INFS-3200
Database Tables and Normalization • Table is basic building block in database design • Normalization is process for assigning attributes to entities. Why: • Reduces data redundancies • Helps eliminate data anomalies • Produces controlled redundancies to link tables and therefore, establish relationships. • GENERAL GUIDELINES: • Define business rules • Define level of detail (granularity) – details & aggregates • Each table must represent one and only one subject only • All attributes in the table must be fully dependent on the PK, the entire PK and nothing but the PK. Database Systems: Design, Implementation, & Management, Rob & Coronel
Database Tables and Normalization • Normalization stages • 1NF - First normal form: • Put data in table format. • Eliminate repeating groups. • Select a suitable primary key. • 2NF - Second normal form • Eliminate partial dependencies. • 3NF - Third normal form • Eliminate transitive dependencies • BCNF - Boyce-Codd normal form • Every determinant in the table is a candidate key. • 4NF - Fourth normal form • Eliminate independent multi-valued set of facts. Database Systems: Design, Implementation, & Management, Rob & Coronel
Database Tables and Normalization Original Report Data Database Systems: Design, Implementation, & Management, Rob & Coronel
Sample Data for Project Report Report Data in Table Format(incomplete) • Observations • PROJ_NUM intended to be primary key • Table entries invite data inconsistencies • Table tend to create data anomalies: • Update • Modifying JOB_CLASS • Insertion • New employee must be assigned project • Deletion • If employee deleted, other vital data lost Database Systems: Design, Implementation, & Management, Rob & Coronel
Conversion to 1NF • Repeating groups must be eliminated (duplicated column types/multi-valued columns) • Tabular format • Each cell has single value - No repeating groups • In our case: PROJ_NUM, EMP_NUM, etc. • Proper primary key developed • Uniquely determines (identifies) attribute values (in each row) • In our case: combination of PROJ_NUM and EMP_NUM • Identify Dependencies • Desirable dependencies based on primary key • Less desirable dependencies • Partial • based on part of composite primary key • Transitive • one nonprime attribute depends on another nonprime attribute Database Systems: Design, Implementation, & Management, Rob & Coronel
Data Organization: 1NF Database Systems: Design, Implementation, & Management, Rob & Coronel
Dependency Diagram (1NF) 1NF (PROJ_NUM, EMP_NUM, PROJ_NUM,EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS) Partial Dependencies: (PROJ_NUM -> PROJ_NAME) (EMP_NUM -> EMP_NAME, JOB_CLASS, CHG_HOUR) Transitive Dependencies: (JOB_CLASS -> CHG_HOUR) Database Systems: Design, Implementation, & Management, Rob & Coronel
1NF Summarized • Data in a tabular format • No repeating groups in table (repeated attributes or multi-value attributes ) • Primary key attribute(s) identified • All dependent attributes depend on the primary key • Identify all partial and transitive dependencies • Partial • Attributes that depend on part of PK • Can only exist if the table has a composite PK • Transitive • Non-PK attribute(s) determines other attributes • JOB_CLASS determines CHG_HOUR • JOB_CLASS is the determinant attribute • CHG_HOUR is the dependent attribute Database Systems: Design, Implementation, & Management, Rob & Coronel
Conversion to 2NF • Start with 1NF format: • Write each key component on separate line • Write original key on last line • Each component is new table • Write dependent attributes after each key PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS) Transitive Dependencies: (JOB_CLASS -> CHG_HOUR) Database Systems: Design, Implementation, & Management, Rob & Coronel
2NF Conversion Results Table name: PROJECT PROJ_NUM PROJ_NAME Table name: EMPLOYEE EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR Transitive dependency Table name: ASSIGN EMP_NUM PROJ_NUM ASSIGN_HOURS Database Systems: Design, Implementation, & Management, Rob & Coronel
2NF Summarized • In 1NF • Includes no partial dependencies • No attribute dependent on a portion of primary key • Still possible to exhibit transitive dependency • Attributes may be functionally dependent on non-key attributes Database Systems: Design, Implementation, & Management, Rob & Coronel
Conversion to 3NF • Create separate table(s) to eliminate transitive functional dependencies • For every transitive dependency: • Write determinant as PK of new table • Write all dependent attributes for each determinant • Delete the dependent attribute(s) from the original table(s) • Leave the determinant attribute(s) in the original table(s) PROJECT (PROJ_NUM, PROJ_NAME) ASSIGN (PROJ_NUM, EMP_NUM, ASSIGN_HOURS) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS) JOB (JOB_CLASS, CHG_HOUR) Database Systems: Design, Implementation, & Management, Rob & Coronel
Conversion to 3NF Database Systems: Design, Implementation, & Management, Rob & Coronel
3NF Summarized • In 2NF • Contains no transitive dependencies Database Systems: Design, Implementation, & Management, Rob & Coronel
Improving the Design • Evaluate PK Assignment (meet PK guidelines) • Evaluate Naming Conventions • Refine Attribute Atomicity (simple, single-valued) • Identify New Attributes • Identify New Relationships (decompose M:M) • Refine Primary Keys (as required for data granularity) • Maintain Historical Transactional Accuracy • Identify Use of Derived Attributes Database Systems: Design, Implementation, & Management, Rob & Coronel
Evaluate PK Assignment Introduce a better suited PK free of semantic content (non-intelligent PK) Add JOB_CODE as surrogate PK. To reduce data entry errors Repeat for other tables (see # 6 – Assign)* Evaluate Naming Conventions JOB_CLASS is actually a description of the job, change to JOB_DESCRIPTION. CHG_HOURS should be JOB_CHG_HOUR Improving the Design(Job Table) Database Systems: Design, Implementation, & Management, Rob & Coronel
Refine Attribute Atomicity Decompose composite attributes into simple attributes EMP_NAME should be EMP_LNAME EMP_FNAME EMP_INITIAL Identify New Attributes Add new attributes that describe real world entity characteristics EMP_HIREDATE Improving the Design(Employee Table) Database Systems: Design, Implementation, & Management, Rob & Coronel
Improving the Design(Project Table) • Identify New Relationships • Add new relationships as required by business rules. • A project is managed by an employee, an employee can be the manager of only one project. • Add EMP_NUM as FK in PROJECT Database Systems: Design, Implementation, & Management, Rob & Coronel
Improving the Design(Assign Table) • Identify new attributes • Add ASSIGN_DATE • Refine Primary Keys • Consider the “granulity” of the data being represented in order to determine the PK. • Can an employee have multiple hours worked entries for a given day in a given project? • If yes, add ASSIGN_NUM as surrogate PK. • Maintain Historical Transaction Accuracy • Add ASSIGN_CHG_HOUR • Identify Use of Derived Attributes • Add ASSIGN_CHARGE Database Systems: Design, Implementation, & Management, Rob & Coronel
Limitations of System Assigned PK • Surrogate PK ensures that each row has an unique ID, not that the row’s dependent values are unique. • JOB_CODE system assigned PK • We still could have duplicate values: • 511 Programmer 35.75 • 512 Programmer 35.75 • Clearly, entries are duplicated! • To ensure unique values we must have create an unique index on all candidate keys. • Unique index on JOB_DESCRIPTION • This still will still not avoid data entry errors! • 513 Progranmer 35.75 Database Systems: Design, Implementation, & Management, Rob & Coronel
Normalization and Database Design • First, business rules must be determined • Determine the granularity of the data in each entity. • Normalization should be part of the design process • E-R Diagram provides macro view (conceptual) • Normalization provides micro view of entities (logical) • Focuses on characteristics of specific entities • May yield additional entities/relationships • Difficult to separate normalization from E-R diagramming – complementary Database Systems: Design, Implementation, & Management, Rob & Coronel
Initial ERD for Contracting Company • A company has many projects. • Each project requires the services of many employees. An employee may be assigned to several different projects. • Some employees are not assigned to a project. • Each employee has a single primary job classification. Many employees can have the same job classification. • The job classification determines the hourly billing rate. Database Systems: Design, Implementation, & Management, Rob & Coronel
Modified ERD for Contracting Company Figure 4.11 Database Systems: Design, Implementation, & Management, Rob & Coronel
Final ERD for Contracting Company Figure 4.12 Database Systems: Design, Implementation, & Management, Rob & Coronel
Denormalization • Normalization is one of many database design goals • Normalization creates many small tables with PK/FK • Reporting requirements over normalized tables requiresmultiple table joins to get complete data: • Additional processing (join operations) • Additional I/Os operations • Design must find right balance among: • Design Integrity requirements • Information requirements • Performance requirements Database Systems: Design, Implementation, & Management, Rob & Coronel
Unnormalized Table Defects • Data updates less efficient • Normalization ensures that data is updated (insert/update/delete)only once in one place. • Indexing more cumbersome • No simple strategies for creating views Database Systems: Design, Implementation, & Management, Rob & Coronel