1 / 29

Normalization of Tables

Normalization of Tables. “Between two evils, choose neither; between two goods, choose both.” Tryon Edwards. Steps to E-R Transformation. 1. Identify entities 2. Identify relationships 3. Determine relationship type 4. Determine level of participation

billy
Download Presentation

Normalization of Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization of Tables “Between two evils, choose neither; between two goods, choose both.” Tryon Edwards

  2. Steps to E-R Transformation 1. Identify entities 2. Identify relationships 3. Determine relationship type 4. Determine level of participation 5. Assign an identifier for each entity 6. Draw completed E-R diagram 7. Deduce a set of preliminary skeleton tables along with a proposed primary key for each table (using rules provided) 8. Develop a list of all attributes of interest (not already listed and systematically assign each to a table in such a way to achieve a 3NF design (i.e., no repeating groups, no partial dependencies, and no transitive dependencies)

  3. Tables • Database design is the process of separating information into multiple tables that are related to each other • Single table designs work only for the simplest of situations in which data integrity problems are easy to correct • Anomalies (abnormalities) often arise in single table designs as a result of inserting, deleting, or updating records • Some tables are better structured than others (i.e., result in fewer anomalies)

  4. Redundancy Unnecessary repetition or duplication of data • increases likelihood of errors due to keying inconsistencies

  5. Multi-valued Problems Solution 1? Include all author’s names in a single field • Difficult to search for a single author’s name or create an alphabetical list of authors

  6. Multi-valued Problems    Solution 2? Add multiple columns, one for each value • empty fields waste storage space • awkward to search across fields (e.g., Any books by Snoopy? Must search Author1, Author2, etc.) • necessitates the creation of a new column every time a book has an additional author

  7. Multi-valued Problems    Solution 3? Add multiple rows, one for each value • Data about a book must be repeated for as many times as there are authors of a book (also creates redundancy which lead to keying errors and unnecessarily wasting storage space with large files) • count of total # of books or # from each publisher would be wrong

  8. Update Anomalies sometimes previous errors propagate errors further • Update Anomalies • To update an agent’s telephone number, each instance must be changed • if we miss an item or enter it incorrectly we create an unreliable table • An update anomaly occurs when multiple record changes for a single attribute are necessary.

  9. Deletion Anomalies • Deletion anomalies • What happens if a customer record is deleted? • What happens if an agent record is deleted? • A deletion anomaly occurs when the removal of a record results in the unintended loss of important information.

  10. Insertion Anomalies • Insertion anomalies • What happens if we want to enter information regarding an agent for whom we do not have a customer? • Do we add null values (blanks) for the other fields? • An insertion anomaly occurs when there is not a reasonable place to assign attributes and attribute values to records.

  11. The Problem with Nulls 1. Nulls used in mathematical expressions - unknown quantity leads to unknown total value - misleading value of all inventory 2. Nulls used in aggregate functions - blanks exist under category - cannot be counted because they don’t exist!

  12. Database Design Problems • Use of the relational database model removes some database anomalies • Further removal of database anomalies relies on a structured technique called normalization • Presence of some of these anomalies is sometimes justified in order to enhance performance Thus, database design consists of balancing the art of design with the science of design

  13. Normalization • Goal in database design to create well-structured tables • Transform E-R models to tables following the rules provided • Assuring tables are well-structured with minimal problems (redundancy, multi-valued attributes, update anomalies, insertion anomalies, deletion anomalies) is achieved using structured technique called normalization • Normalization is the structured decomposition of one table into two or more tables using a procedure designed to determine the most appropriate split • Normalization our method of making sure the E-R design was correct in the first place • Normalization refers to a series of forms: we will cover 1NF to 3NF, which is usually sufficient. Note that there are also: 4NF, Boyce-Codd Normal Form (BCNF), Fifth Normal Form (5NF) and Domain-Key Normal Form (DKNF)

  14. First Normal Form • A table is in first normal form if it meets the following criteria: The data are stored in a two-dimensional table with no two rows identical and there are no repeating groups. • The following table in NOT in first normal form because it contains a multi-valued attribute (an attribute with more than one value in each row).

  15. Handling multi-valued attributes: Incorrect Solutions

  16. Handling multi-valued attributes: Correct Solution • Create another entity (table) to handle multiple instances of the repeating group. This second table is then linked to the original table with an identifier (i.e., foreign key). This solution has the following advantages: • no limit to the number of hobbies per member • no waste of disk space • searching becomes much easier within a column (e.g., who likes hiking?)

  17. Handling Repeating Groups • An attribute can have a group of several data entries. Repeating groups can be removed by creating another table which holds those attributes that repeat. This second table (validation table) is then linked to the original table with an identifier (i.e., foreign key) • Advantages: fewer characters tables; reduces miskeying, update anomalies

  18. Second Normal Form • A table is in second normal form if it meets the following criteria: The relation is in first normal form, and, all nonkey attributes are functionally dependent on the entire primary key. • Applies only to tables that have a composite primary key. • In the following table, both the EmpID and Training (composite primary key) determine Date, whereas, only EmpID (part of the primary key) determines Dept.

  19. Removing Partial Dependencies • Remove partial dependencies by separating the relation into two relations. Reduces the problems of • update anomalies • delete anomalies • insert anomalies • redundancies

  20. Third Normal Form • A table is in third normal form if it meets the following criteria: The relation is in second normal form, and, a nonkey field is not functionally dependent on another nonkey field. • The following table is in second normal form but NOT in third normal form because Member_Id (the primary key) does not determine every attribute (does not determine RegistrationFee). RegistrationFee is determined by Sport. Member ID  FName, LName, Lesson; Lesson  Cost

  21. Removing non-key Transitive Dependencies • Remove transitive dependencies by placing attributes involved in a new relational table. Reduces the problems of: • update anomalies • delete anomalies • insert anomalies • redundancies

  22. Normalization Example: Video Store A video rental shop tracks all of their information in one table. There are now 20,000 records in it. Is it possible to achieve a more efficient design? (They charge $10/movie/day.) VIDEO (Cust_name, Cust_address, Cust_phone, Rental_date, Video_1, Video_2, Video_3, VideoType_1, VideoType_2, VideoType3, Return_date, Total_Price, Paid?)

  23. Is the Video store in 1NF? No attributes should form repeating groups - remove them by creating another table. There are repeating groups for videos and customers. CUSTOMER (Cust_Num, Cust_Name, Cust_address_Cust_phone VIDEO (VideoNum, VideoName, VideoType RENTAL (Cust_num, VideoNum, Rental_date, Return_date, TotalPrice, Paid?)

  24. Video Store: 1NF (cont’d) Have not yet removed all repeating groups - video is a multi-valued attribute - move to another table. RENTALDETAILS (RentalNum, VideoNum) RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)

  25. The Video Store is now in 1NF CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone VIDEO (VideoNum, VideoName, VideoType RENTALDETAILS (RentalNum, VideoNum) RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)

  26. Is the Video Store in 2NF? The only table that has a composite primary key has no other fields, therefore, yes. CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone VIDEO (VideoNum, VideoName, VideoType RENTALDETAILS (RentalNum, VideoNum) RENTAL (RentalNum, Cust_Num, Rental_date, Return_Date, TotalPrice, Paid?)

  27. Is the Video Store in 3NF? Does each attribute in each table depend upon the primary key?

  28. The Video Store is now in 3NF Yes, because in each table, every attribute depends on the primary key and not on any other key. CUSTOMER (Cust_Num, Cust_Name, Cust_address, Cust_phone VIDEO (VideoNum, VideoName, VideoType RENTAL (RentalNum, Cust_Num, Rental_date) RENTALDETAILS (RentalNum, VideoNum, ReturnDate, Amt_Paid)

  29. Conflicting Goals of Design • Database design must reconcile the following requirements: • Design elegance requires that the design must adhere to design rules concerning nulls, derived attributes, redundancies, relationship types, etc. • Information requirements are dictated by the end users • Operational (transaction) speed requirements are also dictated by the end users • Clearly, an elegant database design that fails to address end user information requirements or one that forms the basis for an implementation whose use progresses at a snail's pace has little practical use.

More Related