1 / 39

Normalizing Your Database and Why you WANT to do it!

INFYS540. Normalizing Your Database and Why you WANT to do it!. Lesson 7 Chapter 5 Appendix. Data Redundancy Problems. Redundancy breeds errors Same data defined in multiple places is BAD Spelling/typographical error prone Lack of data integrity Inability to perform simple queries

yael
Download Presentation

Normalizing Your Database and Why you WANT to do it!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INFYS540 Normalizing Your Database and Why you WANT to do it! Lesson 7 Chapter 5 Appendix

  2. Data Redundancy Problems • Redundancy breeds errors • Same data defined in multiple places is BAD • Spelling/typographical error prone • Lack of data integrity • Inability to perform simple queries • Inflexibility and inscalability • Impossible to MAINTAIN!

  3. Relational Database PROJECTCHIEF Project Project Chief Computing 333-22-1111 Intranet987-65-4321 Contracting 123-45-6789 CAT 333-22-1111 EMPLOYEES LNameFName SSNDept Jones Mike 123-45-6789 M&B Smith Tony 987-65-4321 C2G Lee Bruce 567-89-1234 MLD Doodle Yankee 333-22-1111 M&B 1 DEPARTMENTS Dept Dept. DirectorRoom MLD 181-94-5676 B115 C2G 987-65-4321 123 M&B 123-45-6789 147 1 What is a candidate key? What is a primary key? What is a foreign key?

  4. Purpose of Normalization • Take advantage of the powerful tools available in a DBMS • There are five levels of Normalization • The higher the Normal Form the “better” and more efficient the database • But, increasing the levels of Normal Form takes time and effort • For most applications, 3rd Normal Form will solve most potential problems with a DB

  5. Normalizing Database • Process of creating well-structured tables. • Improve performance, integrity of data • 5-step process (w/ 2 rules) to achieve Third Normal Form (3NF) • First two steps put DB into a form so you can normalize it

  6. Rule #1 in Databases Never design redundant data into a Database duplicate data is not consistent duplicate data wastes space

  7. Step 1. Primary Keys • A primary key is one or more data fields (columns) that uniquely identify each record in the table • What would the primary key be below? • “table of employees, assigned to a department.” EMPLOYEES LNameFNameSSNDept Jones Mike 123-45-6789 Math Smith Tony 987-65-4321 M&B Lee Bruce 567-89-1234 Science

  8. Step 1. Primary Keys • Answer: The SSN • It is the only “guaranteed” unique column in the table. Names are easily repeated. EMPLOYEES LNameFName SSNDept Jones Mike 123-45-6789 Math Smith Tony 987-65-4321 M&B Lee Bruce 567-89-1234 Science

  9. Step 1. Primary Keys • Now try the following example: • “A table of projects assigned to employees, listing the project name and the employee’s function on the project.” A Counter --The MS Access Default Key EmpProj Counter SSNProject Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch

  10. Step 1. Primary Keys • It is the combination of the SSN and the Project fields. Why? EMPLOYEES’ PROJECTS Counter SSNProject Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 IntranetWebmaster 5222-99-7777 Dining Overwatch

  11. Step 1. Primary Keys • Because, you can have the following: EMPLOYEES’ PROJECTS Counter SSNProject Function 1 123-45-6789 Dining Designer 2 123-45-6789 Dining Designer 3 987-65-4321 Intranet Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch • Redundant records! (Redundancy = BAD)

  12. Rule #2 about Databases NEVER Use a Counter as a Primary Key

  13. Step 2: Eliminate Many-to-Many Relationships • What is wrong with the following table? • “a table of personnel authorized access to a project” PROJECTS QUERY ACCESS Project Access_1 Access_2 Access_3 Dining 222-99-7777 181-94-5676 Computing 222-99-7777 181-94-5676 Intranet 987-65-4321 818-49-6765 123-45-6789

  14. Step 2: Eliminate Many-to-Many Relationships • Here’s essentially what this table looks like within the Access relationships diagram: Employees: SSN Last Name First Name .... Projects: Project Project Chief Department Access_1 Access_2 Access_3 has access to info about

  15. Step 2: Eliminate Many-to-Many Relationships • Here’s how you model it in a database: • Break it up into two one-to-many relationships Projects: Project Project Chief Department .... Employees: SSN Last Name First Name .... 1 1 Access to Project Info: Project SSN

  16. Step 2: Eliminate Many-to-Many Relationships • How to do it: • The primary key of the new table is the composite of the primary keys of the existing tables. • Primary key of Projects = Project Name • Primary key of Employees = SSN • New table primary key of Project Name and SSN

  17. Step 2: Eliminate Many-to-Many Relationships • No artificial restrictions on number of people with access • You can add attributes about the types of access granted • You can easily query who has access to information about each project PROJ QUERY ACCESS Project SSN Dining 222-99-7777 Dining 181-94-5676 Computing 222-99-7777 Computing 181-94-5676 Intranet 987-65-4321 Intranet 818-49-6765 Intranet 123-45-6789 PROJECT ProjectProjectChief Dept Computing 333-22-1111 MATH Intranet987-65-4321 M&B Contracting 123-45-6789 M&B CAT 333-22-1111 Admin EMPLOYEE LNameFName SSN Jones Mike 123-45-6789 Smith Tony 987-65-4321 Lee Bruce 567-89-1234 Doodle Yankee 333-22-1111

  18. What is wrong with the following? “A table of PCs, which are loaded with many different applications, and assigned to a user.” PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones 10301 Word, Powerpoint, Lotus Notes Smith 10311 Word, LotusNotes, Borland C++ Hacker

  19. Step 3: Achieving 1NF:All Data must be Atomic • “Atomic” - the data occupying a field cannot be further broken down. • i.e., no multi-data entries • i.e., “No attributes can have more than one value for a single instance of an entity” PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones • If not atomic, updating is complex and error prone • If not atomic, can not easily query the database

  20. Step 3 Answer PCSerial# LoadedSoftware Assigned 10291 Word Jones 10291 Powerpoint Jones 10291 ccMail Jones 10301 Word Smith 10301 Powerpoint Smith 10301 LotusNotes Smith 10311 Word Hacker 10311 LotusNotes Hacker 10311 Borland C++ Hacker

  21. Step 3. Achieving 1NF:All Data must be Atomic Another source of redundancy: calculated fields TotalYTD Age DaysRemaining Solution: Use a Query! Remove all calculated fields from table and create a query ...then use the query whenever you need up-to-date data

  22. Step 4. Achieving 2NF:Eliminate Partial Dependencies • What is a partial dependency? • Look at the table. What’s redundant? • “A table of functions an employee is assigned to for a project, and the project chief.” EMPLOYEES’ PROJECTS SSNProject Function Project Chief 123-45-6789 Dining Designer 222-99-7777 123-45-6789 Computing Designer 333-88-5656 123-45-6789 Intranet Member 987-65-4321 987-65-4321 Intranet Designer 987-65-4321 444-55-6666 Intranet Webmaster 987-65-4321 222-99-7777 Dining Overwatch 222-99-7777

  23. EMPLOYEES’ PROJECTS SSNProject Function ProjectChief 123-45-6789 Dining Designer 222-99-7777 123-45-6789 Computing Designer 333-88-5656 123-45-6789 Intranet Member 987-65-4321 987-65-4321 Intranet Designer 987-65-4321 444-55-6666 Intranet Webmaster 987-65-4321 222-99-7777 Dining Overwatch 222-99-7777 Step 4. Achieving 2NF:Eliminate Partial Dependencies • Function depends on the entire primary key: SSN and Project. • ProjectChief is dependent on just a portion of the primary key

  24. Step 4. Achieving 2NF:Eliminate Partial Dependencies • Why is this bad? • Well, what’s wrong with the following? EMPLOYEES’ PROJECTS SSNProject Function Project Chief 123-45-6789 Dining Designer 222-99-7777 123-45-6789 Computing Designer 333-88-5656 123-45-6789 Intranet Member 987-65-4321 987-65-4321 Intranet Designer 987-65-4321 444-55-6666 Intranet Webmaster 222-99-7777 222-99-7777 Dining Overwatch 222-99-7777

  25. Step 4. Achieving 2NF:Eliminate Partial Dependencies • A partial dependency (PD) occurs when a non-key field depends on only a part of the primary key, and not the whole primary key. • PDs are a relation. So, we need a new table..... EMPLOYEES’ PROJECTS SSNProject Function Project Chief 123-45-6789 Dining Designer 222-99-7777 123-45-6789 Computing Designer 333-88-5656 123-45-6789 Intranet Member 987-65-4321 987-65-4321 Intranet Designer 987-65-4321 444-55-6666 Intranet Webmaster 987-65-4321 222-99-7777 Dining Overwatch 222-99-7777

  26. Step 4. Achieving 2NF:Eliminate Partial Dependencies • Here’s how it should look...... EMPLOYEES’ PROJECTS SSNProject Function 123-45-6789 Dining Designer 123-45-6789 Computing Designer 123-45-6789 Intranet Member 987-65-4321 Intranet Designer 444-55-6666 Intranet Webmaster 222-99-7777 Dining Overwatch PROJECTS Project Project Chief Dining 222-99-7777 Computing 333-88-5656 Intranet 987-65-4321

  27. Step 5: Achieving 3NF:Eliminate Transitive Dependencies • What is wrong with the following table? PROJECTS Project Project Chief Dept. Dept. Director Room Dining 222-99-7777 Admin 181-94-5676 B115 Computing 333-88-5656 Admin 181-94-5676 B115 Intranet987-65-4321 M&B 818-49-6765 123 Contracting 187-87-8787 M&B 818-49-6765 123 CAT 333-22-1111 Grounds 123-45-6789 147

  28. Step 5: Achieving 3NF:Eliminate Transitive Dependencies • We have fields dependent on a non-key field: • The Director and Room fields clearly relate to the Dept., and have nothing to do with the project. (Dept is a “determinant” that is not a candidate key) PROJECTS Project Project Chief Dept. Dept. Director Room Dining 222-99-7777 Admin 181-94-5676 B115 Computing 333-88-5656 Admin 181-94-5676 B115 Intranet987-65-4321 M&B 818-49-6765 123 Contracting 187-87-8787 M&B 818-49-6765 123 CAT 333-22-1111 GRND 123-45-6789 147

  29. Step 5: Achieving 3NF:Eliminate Transitive Dependencies • A transitive dependency occurs when a non-key field depends on another non-key field. • Why is this bad?. • A typo appeared in the Contracting line. A database without the transitive dependency would not have allowed this to happen. PROJECTS Project Project Chief Dept. Dept. Director Room Dining 222-99-7777 Admin 181-94-5676 B115 Computing 333-88-5656 Admin 181-94-5676 B115 Intranet987-65-4321 M&B 818-49-6765 123 Contracting 187-87-8787 M&B 818-49-6765 124 CAT 333-22-1111 GRND 123-45-6789 147

  30. Step 5: Achieving 3NF:Eliminate Transitive Dependencies • How to do it: a. Which fields are dependent on a non-key field in the table? (Director, Room) b. Which fields are these dependent on? (Dept) c. Create a new table with (b) as the primary key. d. Put (a) in the new table. e. Remove (a) from the old table.

  31. Step 5: Achieving 3NF:Eliminate Transitive Dependencies • Here are the new tables. PROJECTS Project Project ChiefDept. Dining 222-99-7777 Admin Computing 333-88-5656 Admin Intranet987-65-4321 M&B Contracting 187-87-8787 M&B CAT 333-22-1111 GRND DEPARTMENTS Dept. NameDept. DirectorRoom Admin 181-94-5676 B115 M&B 818-49-6765 123 GRND 123-45-6789 147

  32. Data Analysis: Normalization • An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity. • An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it. • An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes.

  33. Conclusion • Rule1: Never design redundant data into a database • Rule2: Never use a counter as Primary Key • Identify proper primary keys (1NF) • Break up many-to-many relationships (1NF) • 1NF: Break all data into atomic components • 2NF: Identify/eliminate partial dependencies • 3NF: Eliminate transitive dependencies • Common sense test

  34. What is a Good Data Model? • A good data model is simple. • As a general rule, the data attributes that describe an entity should describe only that entity. • A good data model is essentially non-redundant. • This means that each data attribute, other than foreign keys, describes at most one entity. • A good data model should be flexible and adaptable to future needs. • We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs.

  35. Data and Referential Integrity • There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity. • Key Integrity: • Every table should have a primary key (which may be concatenated). • The primary key must be controlled such that no two records in the table have the same primary key value. • The primary key for a record must never be allowed to have a NULL value.

  36. Data and Referential Integrity • Domain Integrity: • Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values. • Referential Integrity: • A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table.

  37. Referential Integrity: • Referential integrity is specified in the form of deletion rules as follows: • No restriction. • Any record in the table may be deleted without regard to any records in any other tables. • Delete:Cascade. • A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table. • Delete:Restrict. • A deletion of a record in the table must be disallowed until any matching records are deleted from a related table. • Delete:Set Null. • A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL.

  38. Database Capacity Planning • A database is stored on disk. • The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available. • Database capacity planning can be calculated with simple arithmetic as follows. • For each table, sum the field sizes. • This is the record size for the table. • For each table, multiply the record size times the number of entity instances to be included in the table. • This is the table size.

  39. Database Capacity Planning • Database capacity planning can be calculated with simple arithmetic as follows. (continued) • Sum the table sizes. • This is the database size. • Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above. • This is the anticipated database capacity.

More Related