1 / 31

Normalization

Normalization. Asif Sohail University of the Punjab Punjab University College of Information Technology (PUCIT). Introduction. It is a process of converting a complex, large and unstable relation into a set of simple, small and stable relations.

billy
Download Presentation

Normalization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Normalization Asif SohailUniversity of the Punjab Punjab University College of Information Technology (PUCIT)

  2. Introduction • It is a process of converting a complex, large and unstable relation into a set of simple, small and stable relations. • It is a process of efficiently organizing data in a database. • Normalization results in a well structure relation – a relation that contains min. redundancy and allows insert, update and delete without errors/inconsistencies. • Errors or inconsistencies caused by redundant data are also called anomalies. • There are three types of anomalies: • Insertion Anomaly. • Deletion Anomaly. • Updation Anomaly.

  3. Introduction a) Insertion Anomaly. • It occurs when extra data beyond the desired data must be added to the database. b) Deletion Anomaly. • It occurs when it is necessary to change multiple rows to modify only a single fact. c) Updation Anomaly. • It occurs when deleting a row causes some unwanted deletions.

  4. Introduction • Normalization is based on the analysis of Functional Dependency (FD). • A FD is a relationship between two attributes A & B of a relation R, such that attribute B is said to be functionally dependent on attribute A, if A uniquely determines the values of B.(A->B) • The attribute on the left-hand side of the arrow in a functional dependency is called Determinant and on the right are called Dependents. • An attribute may be functionally dependent on more than one attributes. • Functional Dependencies are helpful in identifying the keys for a given relation.

  5. Salary Dept_Name Salary Date_Completed EMP_ID EMP_ID Name Course_Title Dept_Name Name Continued… EMPLOYEE1 EMPLOYEE2 Date a Course is completed is completely determined by the EMP_ID and Course_Title

  6. Inference Rules • Rules of Inference for functional dependencies, called inference axioms or Armstrong axioms, after their developer, can be used to find all the FDs logically implied by a set of FDs. • There are the following inference rules: • Reflexivity. • Augmentation. • Transitivity. • Additivity or union. • Projectivity or Decomposition • Pseudo transitivity:

  7. Inference Rules a) Reflexivity. • {Y 1,...,Y n} ⊆ {X1,...,Xm} implies {X1,...,Xm} → {Y 1,...,Y n} • OR If B is a subset of A, then A → B. • For Example: StName,stAdr → stName b) Augmentation. • If we have A → B then AC → BC. For Example • If stId → stName then • StId,stAdr → stName,stadr c) Transitivity. • If A → B and B → C, then A → C • If stId → prName and prName → credits then • stId → credits

  8. Inference Rules d) Additivity or Union: • If A → B and A → C, then A → BC • If empId → eName and empId → qual Then we can write it as empId → eName, qual e) Projectivity or Decomposition • If A → BC then A → B and A → C • If empId → eName,qual Then we can write it as • empId → eName and empID → qual f) Pseudo transitivity: • If A → B and CB → D, then AC → D • If stID → stName and stName,fName → stAdr Then we can write it as StId,fName → stAdr

  9. Normal Forms • Normalization process is built around the concept of Normal Forms. • A Normal Form is a state of a relation that can be determined by applying simple rules regarding functional Dependencies. • First Normal Form • Second Normal Form • Third Normal Form • Boyce Codd Normal Form • Fourth Normal Form • Fifth Normal Form

  10. Table with Multivalued attribute Remove multivalued attributes First Normal Form Remove partial dependencies Second Normal Form Remove transitive dependencies Third Normal Form Removing remaining anomalies Boyce-Codd Normal Form Remove multivalued dependencies Fourth Normal Form Remove remaining anomalies Fifth Normal Form

  11. First Normal Form (1NF) • A relation is said to be in 1NF, if it contains no repeating group. • The value at the intersection of a row and column must be atomic(having one value) • If you developed a logical design by transforming ER diagram into relations, there should not be any multivalued attributes remaining • Consider the following relation: Student (RegNo,Name,Program, C-Code, C-Title, C-Grade) • This relation has a repeating group and therefore it has the insert, delete and update anomalies. • Multiple values create problems in performing operations like select or join.

  12. First Normal Form (1NF) • The relation Student can be converted into 1NF using either of the following methods: a) Change the PK of the relation and define a composite key RegNo & C_Code. b) Split the relation into 2 relations: Student (RegNo, Name, Program) Course (RegNo, C-Code, C-Title, C_Grade) Example 2: STD(stId, stName, stAdr, prName, bkId)

  13. Second Normal Form (2NF) • A relation is in 2NF if: • It is in 1NF • Every nonkey attribute is fully functionally dependent on the primary key • A situation of Partial Functional Dependency arises when PK of a relation is composite and a non key attribute is functionally dependent on part (but not all) of the PK. • Referring to the Course relation: Course (RegNo, C-Code,C-Title, C_Grade) • The functional dependencies are: C-Code -> C_Title (Partial FD) RegNo,C_Code -> C_Grade (Full FD)

  14. Second Normal Form (2NF) • Since all the non key attributes are not fully functionally dependent on the PK or there is partial functional dependency in the relation, therefore it is not in 2NF. • The Anomalies associated with the course relation are: a) Insert Anomaly: • A course instance cant be inserted without a student (RegNo) b) Delete Anomaly. • Deleting a student will unnecessarily delete course data. c) Update Anomaly. • A course cant be updated independently.

  15. Second Normal Form (2NF) • The process for transforming a 1NF table to 2NF is: • Identify any determinants other than the composite key, and the columns they determine. • Create and name a new table for each determinant and the unique columns it determines. • Move the determined columns from the original table to the new table. The determinate becomes the primary key of the new table. • Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key. • The original table may be renamed to maintain semantic meaning.

  16. Second Normal Form (2NF) • The relation Course can be converted into 2NF by decomposing it into the following relations: Course (C-Code,C-Title) Result (RegNo, C-Code, C_Grade) • A relation in 1NF will be in 2NF if: • The PK consists of only one attribute OR • No nonkey attributes exist in the relation OR • Every nonkey attribute is functionally dependent on the full set of primary key attributes

  17. Third Normal Form (3NF) • A relation is said to be in 3NF, if it is in 2NF and there is no Transitive Dependency. • A Transitive Dependency is a functional dependency between two or more non key attributes of a relation. • Consider the following relation: Emp (EmpNo, EName, Job, Sal, Proj-No,Proj-Details) • In the above relation, there is a following transitive dependency: Proj-No -> Proj-Details • Due to this, project information cant be maintained independent of a employee record and hence there are anomalies in the relation.

  18. Third Normal Form (3NF) • You can remove transitive dependency from a relation in the following way: • Create a new relation against transitively dependent attributes and leave the PK of new relation in the old relation to serve as a FK. Emp (EmpNo, EName, Job, Sal, Proj-No) Project (Proj-No, Proj-Details) More Examples: Ex1: PNo,PName, PBudget, EmpNo, EName, Job, ChgHour, Hours Ex2: STD (stId, stName, stAdr, prName, prCrdts) • stId -> stName, stAdr, prName, prCrdts • prName -> prCrdts

  19. Third Normal Form (3NF) • The process of transforming a table into 3NF is: • Identify any determinants, other the primary key, and the columns they determine. • Create and name a new table for each determinant and the unique columns it determines. • Move the determined columns from the original table to the new table. The determinate becomes the primary key of the new table. • Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key. • The original table may be renamed to maintain semantic meaning.

  20. Normalization - Exercise • Order Relation: OrderNo, OrderDetails, OrderDate, CustNo, CustName, ProductNo, ProdName, Price, QtyOrdered • Student Relation: RegNo, Name, Address, Program, C-Code, C-Title, C-Grade, T-Code, T-Name • Patient Relation: VisitNo, VisitDate, PatNo, PatName, PatAge, DNo, DName, DSpeciality, Diagnosis

  21. Boyce-Codd Normal Form (BCNF) • A relation is said to be in BCNF, if it is in 3NF and every determinant is a candidate key or there is no overlapping of candidate keys. • If a table contains atomic candidate keys, the 3NF and BCNF are equivalent. The BCNF differs from the 3NF only when there are more than one candidate keys and the keys are composite and overlapping. • A 3NF relation is not in BCNF and this happens only if • the candidate keys in the relation are composite keys (that is, they are not single attributes), • there is more than one candidate key in the relation, and • the keys are not disjoint, that is, some attributes in the keys are common. • Consider a relation R(A, B, C, D) such that A,B -> C, D and C->B. • The relation R has no partial dependency nor it contains transitive dependency. Thus the relation R is in 3NF.

  22. Boyce-Codd Normal Form (BCNF) • Consider the following relation: PROJECT (RegNo, PTool, Supervisor) Constraints: • For each Project Tool (PTool), a student has only one supervisor. • A project may be in more than one tools. • Each supervisor can supervise only one tool. • In the above relation, no single attribute is a PK. • Possible candidate keys are RegNo, PTool and RegNo, Supervisor and Ptool, Supervisor. • The candidate keys overlap as they share RegNo. • The Anomalies associated with the course relation are:

  23. Boyce-Codd Normal Form (BCNF) • Insert Anomaly: • Supervisor and PTool cant be defined unless a student takes a project. b) Delete Anomaly. • Deleting a student will unnecessarily delete project data. c) Update Anomaly. • Updating a PTool may result in unwanted changes. • The relation Project can be converted into BCNF by decomposing it into the following relations: PROJECT1 (RegNo, PTool) PROJECT1 (Supervisor, PTool)

  24. Fourth Normal Form (4NF) • A relation is said to be in 4NF, if it is in BCNF and there is no multivalued dependency. • A Multivalued Dependency is a type of dependency that exists in a relation R having at least 3 attributes R(A, B, C) such that, for each value of A, there is a well defined set of values of B and a well defined set of values of C, but the set of values of B and the set of values of C is independent or have no impact on each other.

  25. Fourth Normal Form (4NF) • Consider a relation COURSE (C_Code, C_Instructor, Book_Titles) • The constraints on the above relation are: • Each course may have several instructors. • Each course uses several book titles. • Set of Book titles used for a course is independent of the set of instructors. • Q: What will be the PK of the relation? • The Anomalies associated with the course relation are: • Insert Anomaly: • Adding a new book title will require adding as many rows as many instructors are there.

  26. Fourth Normal Form (4NF) b) Delete Anomaly. • Deleting an instructor will result in unwanted deletions. c) Update Anomaly. • Updating a book title requires updating several rows. • The relation Course can be converted into 4NF by decomposing it into the following relations: • COURSE (C_Code, C_Instructor) • BOOKS (C_Code, Book_Titles)

  27. Fifth Normal Form (5NF) • A relation is said to be in 5NF, if it is in 4NF and there is no Join Dependency. • A Join Dependency means that a relation cant be decomposed into two or more relations such that the resulting relations can be recombined to form original data. Thus if R(A, B, C) is decomposed into R1(A,B) and R2(B,C) – a join dependency exists if we can get back to R by taking natural join of R1 and R2. • In every Normal Form, we decompose a given relation. • The decomposition can be lossless or lossy decomposition. Join Dependency ensures that the decomposition is lossless.

  28. Fifth Normal Form (5NF) • Consider the following relation: STUDENT (RegNo, Name, Program, C_Code, C_Grade) • We decompose the relation into the following relations in order to convert it into 1NF. STUDENT (RegNo, Name, Program) RESULT(RegNo, C_Code, C_Grade) • Some tuples of the above relations are…. • Natural Join of the above two relations will give us the tuples of the original relation STUDENT. Thus the decomposition is lossless. • Now we decompose RESULT relation as:

  29. Fifth Normal Form (5NF) RESULT1(RegNo, C_Code) RESULT2(C_Code, C_Grade) • Some tuples of the above relations are…. • Natural Join of the above two relations will give us some extra tuples which were not in the original relation. • Even though we are getting more tuples, but still the decomposition is lossy, because without the original relation, we have no way to identify which tuples are extra or spurious. Thus we actually loose information. • We can guarantee that the decomposition if lossless by making sure that for each pair of relations that will be joined, the set of common attributes is a determinant in one of the relations.

  30. Fifth Normal Form (5NF) • 5NF is not easily verified as lower normal forms. • In fact, no systematic method exists for obtaining 5NF or for ensuring that a set of relations is indeed in 5NF. • It is tempting to suggest that such relations are pathological cases and are likely to be rare in practice. • 5NF is also called Project-Join Normal Form.

  31. Thank you for your attention. • Asif Sohail • Assistant Professor • University of the Punjab • Punjab University College of Information Technology (PUCIT) • Allama Iqbal (Old) Campus, Anarkali • Lahore, Pakistan • Tel: +92-(0)42-111-923-923 Ext. 154 • E-mail: asif@pucit.edu.pk

More Related