330 likes | 487 Views
IT 20303. The Relational DBMS Section 05. Relational Database Theory. Normalization for Logical Database Design. Relational Database Theory. Normalization Process of analyzing a grouping of data items Based on inherent characteristics Often applied to existing files or databases.
E N D
IT 20303 • The Relational DBMS • Section 05
Relational Database Theory • Normalization for Logical Database Design
Relational Database Theory • Normalization • Process of analyzing a grouping of data items • Based on inherent characteristics • Often applied to existing files or databases
Relational Database Theory • Normalization • Principles • Data items belong together in a logical group • Group of items can be identified by own unique identifier
Relational Database Theory • Normalization • Data in the group describes one, and only one, thing • A Bottom-Up approach
Relational Database Theory • Why Normalize • Avoid update anomalies • Nasty side effects • Minimize storage of redundant data • Support simpler logic for manipulating data
Relational Database Theory • Why Not Normalize • Data is never (very rarely) updated • Data warehouse system is seldom normalized
Relational Database Theory • Sample Data Not Normalized
Relational Database Theory • How to Normalize Data using Functional Dependencies • Definition of Functional Dependency • Given a relation R, attribute Y of R is functionally dependent on attribute X of R, if and only if each X value in R has associated with it precisely one Y-value in R (at any one time)
Relational Database Theory • Y of R is Dependent on X of R • X (-->)functionally determines Y X Y
Relational Database Theory • Functional Dependency Diagram of Hospital Ward Example Patient Name Patient No Date of Birth Ward Type Ward Name No of Beds Senior Nurse
Relational Database Theory • Table structure based on FD Diagram WARD PATIENT
Relational Database Theory • Normalization using Codd’s Rules • Origin • Early enthusiasts wanted to use relational theory • Sought rules for structuring data in relational model
Relational Database Theory • Normalization using Codd’s Rules • Codd and contemporaries developed rules for “Normal Forms” • 1NF • 2NF • 3NF • Normal levels to do in database design • Boyce/Codd NF – 3.5NF • 4NF • 5NF
Relational Database Theory • Customer-Order-Line Item Example • Assume an existing order-entry program and data file:
Relational Database Theory • 1NF – Break out repeating groups ORDER ORDER LINEITEM
Relational Database Theory • 2NF- Break out attributes dependent on part of the primary key LINEITEM LINEITEM PRODUCT ORDER
Relational Database Theory • 3NF- Break out attributes wholly dependent on another key ORDER CUSTOMER ORDER LINEITEM PRODUCT
Relational Database Theory • Rules for 1NF, 2NF, & 3NF • 1NF • Break out repeating groups into a separate entity • 2NF • Break out attributes that are dependent on part of the primary key into a separate entity • Called Partial Dependency • 3NF • Break out attributes that are wholly dependent on another key (not PK) into a separate entity • Called Transitive Dependency
Relational Database Theory • Normalization • A relation R is in 3rd Normal Form (3NF) if and only if the non-key attributes of R (if any) are: • Mutually independent, and • Fully dependent on the primary key of R
Relational Database Theory • Normalization Cont’d • A relation is in 3NF if all the attributes are functionally dependent • On the Key • On the Whole Key, and • On Nothing but the Key • (So Help Me Codd)
Relational Database Theory • Reconcile differences between the Data Model and Normalized Data Structures • Data model and normalized data structures must be reconciled • Discard data items from old files that are no longer needed • Calculation fields • Redundant fields • Resolve discrepancies in data item names • Ensure that new fields are really necessary • Use standard naming conventions
Relational Database Theory • Example 01: • What happens when a part has more than four suppliers? • What happens when a supplier is dropped? • How do you query the parts with two or more suppliers? • Normalized Table:
Relational Database Theory • Example 02: Normalize this table
Relational Database Theory • Multiply ways to Normalize Data • Normalization can be accomplished in different ways • Well-formed E-R model is normalized • Functional dependencies • Codd’s Rules for 1NF, 2NF, & 3NF • Discrepancies indicate something is missing or changed • One approach validates or checks another approach
Relational Database Theory • Impact of Normalization • Improve the integrity of data • Purpose is to eliminate update anomalies • Minimize storage of redundant data • Reduce the complexity of programming logic • Emphasis now is on maintainability, simplicity of program • Normalized data can minimize complexity of code that manipulates the data • Enhance the stability, “goodness” of database design • Normalized data tends to be easier to understand • Normalized data can be used by many different applications more easily
Relational Database Theory • Impact of Normalization on Performance • Concern that a large number of tables-and table joins-will result in poor performance • Join can be a very expensive operation • Test to determine frequency of joins, number of tables joined • After database is created and available
Relational Database Theory • Impact of Normalization on Performance Cont’d • Requirements for application performance, response time dictate corrective actions • Performance addressed in section on physical database design • There are alternatives to de-normalizing data to improve performance
Relational Database Theory • Recommendations for Data that is Updated • First Normalize • Don’t be dismayed by too many tables • Normalization increases number of tables but improves logic • Normalization is a helpful logical database design technique…for any DBMS
Relational Database Theory • Objective of the design process is a “Good” design • The logical database design process • Is well understood • Uses complementary techniques • Can be automated with CASE tools
Relational Database Theory • Objective of the design process is a “Good” design cont’d • A “Good” database design • Contains all the important entities and data items • Has stable primary keys • Identifies clearly all relationships • Has table structures in 3NF • Is understood by designers and users • Accurately models the real world, as described in the requirements
Relational Database Theory • Questions?