400 likes | 834 Views
Normalization. Normalization. Normalization : the process of converting complex data structures into simple, stable data structures. The main idea is to avoid duplication of large data. Why normalization?
E N D
Normalization • Normalization: the process of converting complex data structures into simple, stable data structures. • The main idea is to avoid duplication of large data. • Why normalization? • The relation derived from the user view or data store will most likely be unnormalized. • The problem usually happens when an existing system uses unstructured file, e.g. in MS Excel.
The Three Steps of Normalization • The standard normalization has more than three steps: • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) • Fourth Normal Form (4NF) • Fifth Normal Form (5NF) • Domain/Key Normal Form (DKNF) • However, only three steps (1NF, 2NF, 3NF) are sufficient for normalization.
I. First Normal Form (1NF) The official qualifications for 1NF are: • Each attribute must have a unique name. • Each attribute must have a single value. • Row cannot be duplicated. • There is no repeating groups. Additional: • Choose a primary key. The primary key can be an attribute or combined attributes.
Each attribute has unique name -> Good The Payment has multi data type (currency & string) -> Bad All rows are not duplicated -> Good The Course and Payment have repeating groups -> Bad
All correct? Not yet. Choose a primary key. Name? No. Name has duplicated values. Or DOB, or Course or Payment? No. Each one has duplicated values. Name and DOB? No. They still have duplicated values. Name and DOB and Course? No. Still duplicated. Combine all attribute? Still no. The last two rows are duplicated. So what else we can do? Of course, there is a way. Add a new attribute to be aprimary key. So let’s call it ID.
Now it is completely in 1NF. Next, check it if it is not in 2NF.
II. Second Normal Form (2NF) The official qualifications for 2NF are: 1. A table is already in 1NF. 2. All nonkey attributes are fully dependent on the primary key. All partial dependencies are removed and placed in another table.
Assume you have a table below contain a primary (CourseID + Semester): Primary Key The Course Name depends on only CourseID, a part of the primary keynot the whole primary (CourseID+ Semester).It’s called partial dependency. Solution: Remove CourseID and Course Name together to create a new table.
Done? Oh no, it is still not in 1NFyet. You have to remove the repeating groups too. • Semester
III. Third Normal Form (3NF) The official qualifications for 3NF are: 1. A table is already in 2NF. 2. Nonprimary key attributes do not depend on other nonprimary key attributes (i.e. no transitive dependencies) All transitive dependencies are removed and placed in another table.
Assume you have a table below contain a primary (CourseID): Primary Key The Teacher Tel is a nonkey attribute, andthe Teacher Name is also a nonkeyatttribute. But Teacher Tel depends on Teacher Name. It is called transitive dependency. Solution: Remove Teacher Name and Teacher Tel together to create a new table.
Done? Oh no, it is still not in 1NFyet. So you have to remove the repeating groups, and add a primary key. • Note about primary key: • In theory, you can choose Teacher Name to be a primary key. • But in practice, you should add Teacher ID as the primary key.
What about this table? In case of the above table, there is no 2NF because the primary keyis only one attribute, not the combined attributes. Therefore, you can skip 2NF and move to 3NF. In 3NF, you must remove transitive dependency. Both Name and DOB depend on ID. So they are not transitive dependency. Keep! Both Course and Payment does not depend on ID. So remove them.
Student Course Payment End of 3NF
For the Payment table, it is not done yet. It is a relationship between Student and Course. Payment Student Course M N PaymentID Payment Payment
Student Final Design Course Payment
Stop at 3NF • The most commonly used normal forms: • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Highest normalization is not always desirable • More JOINS are required • Affect data retrieval performance/high response time • For most business database design purposes, 3NF is as high as we need to go in normalization process
Normalization in Real-World • When you newly create a table in a database tool, e.g. MS Access, SQL Server, MySQL, or Oracle, you won’t need all the steps. • The mentioned tools help you to overcome the 1NF already. • The 2NF happens when the primary key iscombine attributes, e.g. StudentName + DOB. But to do so is unpractical. • Mostly, you only use 3NF. Because it can remove all transitive dependency.
Functional Dependency A Bit More About Theory
Functional Dependencies An important concept associated with normalization is functional dependencywhich describes the relationship between attributes.
Functional Dependencies • Functional dependency can be divided into two types: • Full functional dependency/Partial dependency(PD) • Will be used to transform 1NF 2NF • Transitive dependency (TD) • Will be used to transform 2NF 3NF
Functional Dependencies 1st row 2nd row Relational Schema STUDENT(Stud_ID, Name, (Course_ID, Units)) Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).
Functional Dependencies Cust_ID → Name Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key (The Primary Key must be a Composite Key).
Functional Dependencies Dept_ID → Dept_Name Transitive Dependency – when a non-key attribute determines another non-key attribute.
B is functionally B A depends on A Functional Dependencies R(A,B) A B Consider a relation with attributes A and B, where attribute B is functionally depends on attribute A. Let say an A is a PK of R. To describe the relationship between attributes A and B is to say that “A functionally determines B”.
A functionally determines B B A Functional Dependencies Determinant: Refers to the attributes, or a group of attributes, on the left handed side of the arrow of a functional dependency. When a functional dependency exist, the attribute or group of attributes on the left-handed side of the arrow is called determinant.
Functional Dependencies staff branch Determinant
position is functionally position staffNO depends on staffNO Functional Dependencies Staff number (S21) Position (manager) Consider the attributes staffNOand position of the staff relation. For a specific staffNO(S21), we can determine the position of that member of staff as Manager. staffNOfunctionally determinesposition.
staffNO does not functionally staffNO position depends on position Functional Dependencies staff number (S21) Position(manager) staff number (S5) However the next figure illustrate that the opposite is not true, as position does not functionally determinesstaffNO. A member of staff holds one position; however, they maybe several members of staff with the same position.
Functional Dependencies Partial Dependencies: • Full functional dependencyindicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A. • staff(staffNO,sName,position,salary,branchNO) • staffNO, staffName branchNO • True!!! each value of (staffNO, sName) is associated with a single value of branchNO. • however, branchNO is also functionally dependent on staffNO.
Functional Dependencies Transitive Dependencies: • staff(staffNO,sName,position,salary,*branchNO) • branch(branchNO,bAddress) • staffNO sName,position,salary,branchNO,bAddress • branchNO bAddress • True for transitive dependency!!! branchNO → bAddress • exists on staffNO via branchNO
Normalization Process Data Redundancies Formal technique for analyzing relations based on their Primary Key(or candidate keys) and functional dependencies. The techniqueexecuted as a series of steps (stage). Each step corresponds to a specific normal form, that have specific characteristic. As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to anomalies.
Normalization Process 1)Repeat Group 2)PK is not defined UNF 1)Remove Repeat Group 2)Defined PK composite PK consist of attributes 1NF Normalization Process Relation/Table Format • Have repeating group-PK not defined (1 Table) (1 or 2 Tables) • No repeating group-PK defined-Test partial dependency Test for partial dependency If (exist) (2 or 3 Tables) (more then 1 table) (ab …. TD) 1 (a ……. TD) 2 (b ….… TD) 3 2NF • No repeating group-PK defined-No partial dependency-Test transitive dependency Test for transitive dependency If (exist) (a, b x, y) (a c, d) (b z) (c d) • No repeating group-PK defined-No partial dependency-No transitive dependency 3NF (3 or 4 Tables)