580 likes | 604 Views
Learn about the process of normalization in database design, including the concept, objectives, and how to identify and address anomalies. Understand the different normal forms, such as First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), and the problems associated with relations that break these rules.
E N D
Normalization Part 1: The Concept
Objectives • How to undertake the normalization process. • How normalization uses functional dependencies to group attributes into relations that are in a known normal form. • How to identify the most commonly used normal forms: • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • The problems associated with relations that break the rules of 1NF, 2NF, or 3NF. • How to represent attributes shown on a form as 3NF relations using normalization.
Introduction • 2 strategies in creating data model: • We have adopted a top-down approach to database design that begins by identifying the entities and relationship. • Normalization is a bottom-up approach to database design that begins by examining the relationships between attributes.
Database Design Strategies • Two classical approaches/strategies to dB design: Top-down Bottom-up { Normalization }
Table & Relational Schema staff branch branch (branchNO, bAddress) staff (staffNO, sName, position, salary, *branchNO)
Normalization • When we design a database, the main objective is to create an accurate representation of data, relationship between the data, and constraints on the data that is relevant. • To achieve this objective, we have to identify suitable set of relations (table) by creating good table structure/process of assigning attributes to entities. The process is known as Normalization. • Process for evaluating and correcting table structures to minimize/control data redundancies {reduces data anomalies}. • Works through a series of stages called normal forms.
Normalization • The most commonly used normal forms: • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • 1NF < 2NF < 3NF • Highest normalization is not always desirable • More JOINS are required • Affect data retrieval performance/high response time • For most business database design purposes, 3NF is as high as we need to go in normalization process
Normalization • Every normal form is based on functional dependencies between attributes in a relationship. • Each relationship can be normalized into a specific form to avoid anomalies. • Anomalies? • Anomaly = abnormality • Ideally a field value change, should be made only in a single place. • Data redundancy, promotes an abnormal condition by forcing field value changes in many different locations. • Insertion anomalies • Deletion anomalies • Modification/Update anomalies
Functional Dependencies • An important concept associated with normalization is functional dependencywhich describes the relationship between attributes. • In this section, you will learn about functional dependency and then focus on the particular characteristics of functional dependency that are useful for normalization.
Functional Dependencies • Functional dependency can be divided into two types: • Full functional dependency/Partial dependency(PD) • Will be used to transform 1NF 2NF • Transitive dependency (TD) • Will be used to transform 2NF 3NF
Functional Dependencies • Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part). 1st row 2nd row Relational Schema STUDENT(Stud_ID, Name, (Course_ID, Units))
Functional Dependencies • Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key (The Primary Key must be a Composite Key). Cust_ID → Name
Functional Dependencies • Transitive Dependency – when a non-key attribute determines another non-key attribute. Dept_ID → Dept_Name
B is functionally B A depends on A Functional Dependencies • Consider a relation with attributes A and B, where attribute B is functionally depends on attribute A. Let say an A is a PK of R. • To describe the relationship between attributes A and B is to say that “A functionally determines B”. R(A,B) A B
A functionally determines B B A Functional Dependencies • When a functional dependency exist, the attribute or group of attributes on the left-handed side of the arrow is called determinant. Determinant: Refers to the attributes, or a group of attributes, on the left handed side of the arrow of a functional dependency.
Functional Dependencies staff branch Determinant
position is functionally position staffNO depends on staffNO Functional Dependencies • Consider the attributes staffNOand position of the staff relation. • For a specific staffNO(S21), we can determine the position of that member of staff as Manager. • staffNOfunctionally determinesposition. Staff number (S21) Position (manager)
staffNO does not functionally staffNO position depends on position Functional Dependencies • However the next figure illustrate that the opposite is not true, as position does not functionally determinesstaffNO. • A member of staff holds one position; however, they maybe several members of staff with the same position. staff number (S21) Position(manager) staff number (S5)
Functional Dependencies Partial Dependencies: • Full functional dependencyindicates that if A and B are attributes of a relation, B is fully functionally dependent on A, if B is functionally dependent on A, but not on any proper subset of A. • staff(staffNO,sName,position,salary,branchNO) • staffNO, staffName branchNO • True!!! each value of (staffNO, sName) is associated with a single value of branchNO. • however, branchNO is also functionally dependent on staffNO.
Functional Dependencies Transitive Dependencies: • staff(staffNO,sName,position,salary,*branchNO) • branch(branchNO,bAddress) • staffNO sName,position,salary,branchNO,bAddress • branchNO bAddress • True for transitive dependency!!! branchNO → bAddress • exists on staffNO via branchNO
Normalization Process • Formal technique for analyzing relations based on their Primary Key(or candidate keys) and functional dependencies. • The techniqueexecuted as a series of steps (stage). Each step corresponds to a specific normal form, that have specific characteristic. • As normalization proceeds, the relations become progressively more restricted (stronger) in format and also less vulnerable to anomalies. Data Redundancies
Normalization Process • Relationship between Normalize Form Denormalization Figure 1: Diagrammatic illustration of the relationship between the normal forms Normalization
Normalization Process Data Sources Users Users’ requirements specification Forms/reports that are used or generated by the enterprise Sources describing the enterprise such as data dictionary and corporate data model Transfer attributes into table format Unnormalized Form (UNF) Remove repeating group First Normal Form (1NF) Remove partial dependencies Second Normal Form (2NF) Remove transitive dependencies Third Normal Form (3NF)
1)Repeat Group 2)PK is not defined UNF 1)Remove Repeat Group 2)Defined PK composite PK consist of attributes 1NF Normalization Process Normalization Process Relation/Table Format • Have repeating group-PK not defined (1 Table) (1 or 2 Tables) • No repeating group-PK defined-Test partial dependency Test for partial dependency If (exist) (2 or 3 Tables) (more then 1 table) (ab …. TD) 1 (a ……. TD) 2 (b ….… TD) 3 2NF • No repeating group-PK defined-No partial dependency-Test transitive dependency Test for transitive dependency If (exist) (a, b x, y) (a c, d) (b z) (c d) • No repeating group-PK defined-No partial dependency-No transitive dependency 3NF (3 or 4 Tables)
Normalization Process • Remember this!!! • Unnormalized(UNF): There are multivalued attributes or repeating groups, not CK (or PKs) • 1NF: NO multivalued attributes or repeating groups, has CK (or PKs) • 2NF: 1NF with NO Partial Dependencies (PD) • 3NF: 2NF with NO Transitive Dependencies (TD)
Unnormalized Form (UNF) • To create an unnormalized table • Transform the data from the information source (e.g. form) into table format with columns and rows. UnnormalizedForm = A table that contains one or more repeating UNF/0NF groups
First Normal Form (1NF) First Normal Form = A relation in which the intersection of each row 1NF and column contains one and only one value. • A relation is in 1NF if every attribute for every tuple have a value and domain for each attribute can not be simplified anymore.
Transforming UNF to 1NF UNF 1 2 To-do list: Remove repeating groups Identify Composite Key 1 1NF 2 3
UNF to 1NF • Nominate an attribute or group of attributes to act as the key for the unnormalized table. • Identify the repeating group(s) in the unnormalized table which repeats for the key attribute(s). • Step 1: Eliminate the Repeating Groups • Eliminate nulls: each repeating group attribute contains an appropriate data value • Step 2: Identify the Primary Key • Must uniquely identify attribute value • New key must be composed • Step 3: Identify All Dependencies • Dependencies are depicted with a diagram
UNF to 1NF Relational Schema STUDENT(Stud_ID, Name, (Course_ID, Units))
UNF to 1NF Key Attribute: Stud_ID, Course_ID Repeating Group: (Course_ID, Units)
UNF to 1NF • Ensure a single value at the intersection of each row and column • Enter appropriate student data in each row Relational Schema STUDENT(Stud_ID, Name, Course_ID, Units)
UNF to 1NF Key Attribute: Stud_ID, Course_ID Repeating Group: (Course_ID, Units)
UNF to 1NF • Dependency diagram: • Depicts all dependencies found within given table structure • Helpful in getting bird’s-eye view of all relationships among table’s attributes • Makes it less likely that you will overlook an important dependency
UNF to 1NF • Remove the repeating group by • Entering appropriate data into the empty columns of rows containing the repeating data • Fill the blanks by duplicating the non repeating data, where required. • This approach is commonly referred to as ”flatteningtable”. • This approach will produce redundancy in a relationship, but it can be eliminated in higher normalization process.
UNF to 1NF • Example: DreamHome Case Study • A collection of DreamHomeleases (rent) form is shown in Figure 2. The lease on top is for a client called Rannia who is leasing a property in Skudai, Johor, which is owned by Dollah. For this worked example, we assume that a client rents a given property only once and cannot rent more than one property at any one time. • Sample data is taken from two leases for two different clients called Ranniaand Ahmad and is transformed into table format with rows and columns, as shown in Figure 3. This is an example of unnormalized table.
House Address Rent Start Rent Finish Monthly Rent Alamat Rumah House Address Mula Sewa Rent Start Rent Finish Tamat Sewa Harga Sewa Monthly Rent No Pemilik Owner No Nama Pemilik Owner Name Figure 2 : Collection of Dream Home leases (rent) form Page 2 DREAMHOUSE LEASEDate: 28/02/2007 Client Rental Information Client Name : Ahmad Client Number : CR56 PG04 PG16 PG04 PG16 House No House No Subang Jaya, Selangor. Pasir Gudang, Johor. Skudai, Johor Ampang, Selangor 1/7/93 1/9/00 1/7/93 1/9/00 31/8/00 1/9/01 31/8/00 1/9/01 750 850 750 850 C040 C093 C040 C093 Dolah Abdullah Karim Fendi Kasim Selamat Page 1 DREAMHOUSE LEASEDate: 28/02/2007 Client Rental Information Client Name :RanniaClient Number : CR76 UNF to 1NF
clientNo cName houseAdd rentStart rentFinish rent ownerNo oName Figure 3 : ClientRentalUNF CR76 Rannia Skudai, Johor. Ampang, Selangor 1/7/93 1/9/00 31/8/00 1/9/01 750 850 C040 C093 Dolah Abdullah houseNo PG04 PG16 CR56 Ahmad Skudai, Johor. Kuantan, Pahang. Ampang, Selangor. 20/3/90 21/6/93 25/1/00 19/6/93 3/1/00 30/8/00 750 1000 850 C040 C093 C093 Dolah Abdullah Abdullah PG04 PG 36 PG 16 UNF to 1NF Key attribute: clientNo Repeating group in the unnormalized table as the property rented details, which repeat for each client: Repeating Group:( houseNo, houseAdd, rentStart, rentFinish, rent, ownerNo, oName)