420 likes | 438 Views
Learn about database normalization, a formal method to minimize data redundancy and prevent update anomalies in relational schema. Understand functional dependencies and the process of normalization.
E N D
Normalization Normalization is a formal method involved with a series of test to help database designer to be able to identify the optimal grouping of attributes for each relation in the relational schema. Normalization can be applied to individual relation so that database can be normalized to a specific form to prevent the possible occurrence of update anomaly.
Data Redundancy and Update Anomalies The main purpose of database design is to identify the optimal grouping of attributes in order to minimize data redundancy which affect on saving space for data storage. Data redundancy always causes UPDATE ANOMALIES which are classified into 3 types: Insertion anomalies Deletion Anomalies Modification Anomalies
Insertion Anomalies Deletion Anomalies Modification Anomalies
Insertion Anomalies Class_Info To insert the details of new students into the Class_Info relation, we must include the details of the lecturer and subject in order to avoid null value. Deletion Anomalies If we delete a lecturer from the Class_Info relation, the details of students and subjects are also lost from the database. Modification Anomalies If we want to change the value of one of the attributes of a particular student in the Class_Info relation, we must update all rows which associate to the student. If this modification is not carried out on all the appropriate rows of the Class_Info relation, the database will become inconsistent.
Insertion Anomaly Class_Info Insert new records may cause data redundancy in other fields.
Deletion Anomaly Class_Info Deletion Anomaly may cause loss other necessary data.
Modification Anomaly Class_Info If we want to change the value of one of the attributes of a particular entity in the relation, we must update all rows that relate to this entity. If this modification is not carried out on all the appropriate rows ,the data base will become inconsistent.
To solve update anomalies, a relation must be normalized by using normalization process to remove existing data redundancy.
Functional Dependency One of the main concepts associated with normalization is functional dependency, which describes the relationship between attributes. Functional Dependency describes the relationship between attributes in a relation. For example, if A and B are attributes (or set of attributes) of relation R, B is functionally dependent on A (denoted AB), if each value of A is associated with exactly one value of B. The symbol of Functional Dependency (AB can be described as followings: B is functionally dependent on A or A determines B or B depends on A
Functional Dependencies One of the main concepts associated with normalization is functional dependency, which describes the relationship between attributes. (Definition of Functional Dependency) Suppose that B is an attribute and A is another one, we said that Bis functionally dependent on A (denoted A B), if each value of A is associated with exactly one value of B. ( A and B may each consists of one or more attributes.) The symbol of functional dependence (A B) means B is functionally dependent on A or A functionally definesB or B depends on A
If the functional dependency holds on schema R, in any legal relation r, for all pairs of tuples t1 and t2 in r such that t1[] = t2[], it is also the case that t1[] = t2[]. Given a relation r, attribute y of r is dependent on attribute x if and only if whenever two tuples of R agree on their x-value, they must necessarily agree on their y-value. For every tuple in the relation r, if the value of attribute in tuples are the same, DBMS guarantees that the value of the attribute in those tuples must be the same. That is If holds on R and if t1[] = t2[] DBMS must guarantee that t1[] = t2[]
B is functionally dependent on A A B When a functional dependency exists, the attribute or group Of attributes on the left-hand side of the arrow is called the determinant. Position is functionally dependent on Staff_No Staff_No Position SL21 System Engineer Staff_No is not functionally dependent on Position Position Staff_No SL21 SG5 System Engineer
( LID, Subject,SID ) Lname, Salary, Dept, Credit, Sname, GPA LID Lname, Salary, Dept Subject Credit SID Sname, GP
Utilization of FD to decompose a relation Lecturer Student Subject
Normalization is a formal method involved with a series of test to help database designer to be able to identify the optimal grouping of attributes for each relation in the relational schema. Unnormalized Form 1st Normal Form 2nd Normal Form 3rd Normal Form Boyce-Codd Normal Form Normalization can be applied to individual relation so that database can be normalized to a specific form to prevent the possible occurrence of update anomaly. The process of normalization is a formal method that identifies relations based on primary key (or candidate keys in the case of BCNF the functional dependencies among their attributes).
Relationships of Normal Forms Higher Normal forms
Case Study The DreamHome company manages property on behalf of the owners, and as part of this service, the company takes care of the property’s rental. To simplify this example, we assume that a customer rents a given property only once, and cannot rent more than one property at any one time. Unnormalized form (UNF) : A table that contains one or more repeating groups. Customer_Rental Relation A repeating group is an attribute or group of attributes within a table that occurs with multiple values for a single occurrence of the key attribute (s) for that table. The term key refers to the attribute (s) that uniquely identify each row within the unnormalized table.
Case Study The DreamHome company manages property on behalf of the owners, and as part of this service, the company takes care of the property’s rental. To simplify this example, we assume that a customer rents a given property only once, and cannot rent more than one property at any one time. Adjust Unnormalized form to 1st NF by removing of repeating groups in order to form relational data model (data are conceptually structured in the form of table) . Customer_Rental Relation CR76 John Kay Aline Stewart CR56 Aline Stewart CR56
First normal form (1NF) : A relation in which the intersection of each row and column contains one and only one value. Customer_Rental Relation For the relational data model, it is important to recognize that it is only first normal form(1NF) that is critical in creating appropriate relations. All the subsequent normal forms are optional. However, to avoid the update anomalies, it is recommended that we proceed to at least 3NF.
Set of the Functional Dependency of Customer_Rental relation fd1 Customer_No, Property_No RentStart, RentFinish(Primary key) fd2 Customer_No CName(Partial dependency) fd3 Property_No PAddress, Rent, Owner_No, OName(Partial dependency) fd4 Owner_No OName (Transitive dependency) fd5 Customer_No, RentStart Property_No, PAddress, RentFinish, Rent, Owner, OName(Candidate key) fd6 Property_No, RentStart Customer_No, CName, RentFinish(Candidate key)
Customer_No Property_No CName PAddress RentStart RentFinish Rent Owner_No OName fd1 (Primary key) fd2 (Partial dependency) (Partial dependency) fd3 fd4 (Transitive dependency) fd5 (Candidate key) fd6 (Candidate key)
Second Normal Form (2NF) : A relation that is in the first normal form and every non-primary key attribute is fully functionally dependenton the primary key. Full functional : Indicates that if A and B are attributes of a relation, B is fully functionally dependent dependency on A if B is functionally dependent on A, but not on any proper subset of A. ถ้า B เป็น Non-Key attribute ซึ่งมีฟังก์ชั่นการขึ้นต่อกันอยู่กับส่วนใดส่วนหนึ่งของคีย์หลัก เราจะเรียกว่า B partial dependence on A. Partial dependency ต้องถูกขจัดออกโดยการแยก ออกไปตั้งเป็นตารางใหม่ เพื่อให้ Non-Key attribute ตัวนี้ fully dependent on คีย์หลัก OName Customer_No Property_No CName PAddress RentStart RentFinish Rent Owner_No fd1 (Primary key) fd2 (Partial dependency) fd3 (Partial dependency)
Customer (Customer_No, CName) Rental (Customer_No, Property_No, RentStart, RentFinish) Property_Owner (Property_No, PAddress, Rent, Owner_No, OName) Rental Relation Customer Relation Property-Owner Relation 2NF applies to relations with composite keys, that is, relations with a primary key that composed of two or more attributes. A relation with a single attribute primary key is automatically in at least 2NF.
Transitive dependency Customer (Customer_No, CName) Rental (Customer_No, Property_No, RentStart, RentFinish) Property_Owner (Property_No, PAddress, Rent, Owner_No, Oname, address) Transitive dependency Property-Owner Relation Rental Relation Customer Relation
Transitive dependency : A condition where A, B, and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C). Definition of Third Normal Form: A relation that is in first and second normal form, and in which no non-primary key attribute is transitively dependent on the primary key. Customer (Customer_No, CName) Rental (Customer_No, Property_No, RentStart, RentFinish) Property_Owner (Property_No, PAddress, Rent, Owner_No, OName) Property-for-Rent Relation Owner Relation
Customer_Rental Relation Customer (Customer_No, CName) Rental (Customer_No, Property_No, RentStart, RentFinish) Property (Property_No, PAddress, Rent, Owner_No) Owner (Owner_No, Oname, address)
Customer_Rental 1NF Property_Owner 2NF Customer Rental Property_for_Rent Owner 3NF Rental Customer Property-Owner Owner
From 3NF to Boyce-Codd Normal Form (BCNF) BCNF is based on functional dependencies that take into account all candidate keys in a relation. For a relation with only one candidate key, 3NF and BCNF are equivalent. The difference between 3NF and BCNF is that for a functional dependency AB, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key. Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key. Therefore, BCNF is a stronger form of 3NF, such every relation in BCNF is also in 3NF. Boyce-Codd :A relation is in BCNF if and only if every determinant is normal form (BCNF)a candidate key. • Violation of BCNF is quite rare, since it may only happen under specific conditions. The potential to violate BCNF may occur in relation that • contains two (or more) composite candidate keys and • which overlap, that is share at least one attribute in common
Case Study In this example, Client_Interview relation is presented. It contains details of the arrangements for interviews of clients by members of staff of the DreamHome company. The members of staff involved in interviewing clients are allocated to a specific room on the day of interview. However, a room may be allocated to several members of staff as required throughout a working day. A client is only interviewed once on a given date, but may be requested to attend further interviews at later dates. This relation has three candidate keys: (Client_No, Interview_Date), (Staff_No, Interview_Date, Interview_Time), and (Room_No, Interview_Date, Interview_Time). Therefore the Client_Interview relation has three composite candidate keys, which overlap by sharing the common attribute Interview_Date. We select Client_No, Interview_Date) to act as the primary key for this relation.
Client_Interview (Client_No, Inverview_Date, Interview_Time, Staff_No, Room_No) The Client_Interview relation has the following functional dependencies : Fd1 Client_No, Interview_Date Interview_Time, Staff_No, Room_No (Primary key) Fd2 Staff_No, Interview_Date, Interview_Time Client_No(Candidate key) Fd3 Room_No, Interview_Date, Interview_Time Staff_No,Client_No(Candidate key) Fd4 Staff_No, Interview_Date Room_No Client_Interview Relation
Interview (Client_No, Interview-Date, Interview_Time, Staff_No) Staff_Room (Staff_No, Interview-Date, Room_No) Interview Relation Staff_Room Relation
Review of Normalization (1NF to BCNF) The DreamHome company manages property on behalf of the owners, and as part of this service the company undertakes regular inspections of the property by members of staff. When staff are required to undertake these inspections, they are allocated a company car for use on the day of the inspections. However, a car may be allocated to several members of staff, as required throughout the working day. A member of staff may inspect several properties on a given date, but a property is only inspected once on a given date. Property_Inspection Relation Property_Inspection (Property_No, PAddress, IDate, ITime, Comments, Staff_No, SName, OName)
1NF : Property_Inspection Relation Property_Inspection (Property_No, IDate, ITime, PAddress,Comments, Staff_No, SName, OName) FD1 (Primary key) (Partial dependency) FD2 (Transitive dependency) FD3 FD4 (Candidate key) FD5 FD6 (Candidate key)
FD1 (Primary key) (Partial dependency) FD2 Remove Partial dependency (decompose the relation) to obtain 2NF Property Relation Property_Inspection Relation
Property Relation (Property_No, PAddress) Property_Inspection Relation FD1 (Primary key) FD3 (Transitive dependency) FD4 (Candidate key) FD5 (Candidate key) FD6
Property Relation Remove Transitive dependency (decompose the relation) to obtain 3NF Staff Relation Property_Inspection Relation
Staff Relation Property Relation Remove remaining anomalies from functional dependencies to obtain BCNF Property_Inspection Relation Staff_Car (Staff_No, IDate, Car_Reg) Inspection (Property_No, IDate, ITime, Comments, Staff_No)
From BCNF to Fourth Normal Form (4NF) Although BCNF removes any anomalies due to functional dependencies, further research led to the identification of another type of dependency called multi-valued dependency (MVD), which can cause similar design problems for relations in terms of data redundancy. ตารางต่อไปนี้เป็น BCNF แต่ยังเกิดปัญหา update anomalies Lect_Sub_Research Relation
Multi-valued : Represents a dependency between attributes (for example, A, dependency B, and C) in a relation, such that for each value of A there is a (MVD) set of values for B, and a set of values for C. However, the set of values for B and C are independent of each other. Lecturer > Subject Lecturer > Research A > B A > C Lec_Sub_Research Relation Lec_Sub Relation Lec_Research Relation
Unnormalized form (UNF) Remove repeating groups First normal form (1NF) Remove partial dependencies Second normal form (2NF) Remove transitive dependencies Third normal form (3NF) Remove remaining anomalies From functional dependencies Boyce-Codd form (BCNF) Remove multi-valued dependencies Fourth normal form (4NF)