350 likes | 475 Views
Lecture 5 Term 2. Normalization . Overview . One of the principal objectives of RDBMS is to ensure that each item of data is only held once in the database. Minimise the amount of space required to hold the database Simplify the maintenance of data. Branch. Staff. Primary Key.
E N D
Lecture 5 Term 2 Normalization
Overview • One of the principal objectives of RDBMS is to ensure that each item of data is only held once in the database. • Minimise the amount of space required to hold the database • Simplify the maintenance of data
Branch Staff
Primary Key Foreign Key Related Columns
Null Values • Represents a value for an attribute that is currently unknown or is not applicable for this record. • Nulls can be taken to mean unknown values. • It may also mean that a value is not applicable to that particular record or the value has yet to be supplied. • Nulls are a way to deal with incomplete or exceptional data.
Integrity Rules • Integrity Rules are restrictions on the set of attributes allowed for the attributes of the entity/table. • There are two important integrity rules for the relational model are known as • entity integrity • referential integrity
Entity Integrity • Applies to the primary keys of the base tables • If a null value was inserted as a value for the primary key it is implying that not all the columns are needed to differentiate between the records
Entity Integrity • Example. • The branchNo is the primary key of the Branch table, we should not be able to insert a record into the Branch table with a null value for the branchNocolumn
Referential Integrity • This integrity rule applies to foreign keys.
Referential Integrity • Example. • branchNo in the Staff table is a foreign key targeting the branchNo column in the home table, Branch. • It should not be possible to create a staff record with branch number B300, for example, unless there is already a record for branch number B300 in the Branch table. • However, a new staff record with a null branch number to allow for the situation where a new member of staff has joined the company but has not yet been assigned to a particular branch.
What is a Good Data Model? • A good data model is simple • Attributes that describe any given entity should describe only that entity • Each attribute of an entity instance can have only one value
What is a Good Data Model? 2. A good data model is essentially nonredundant. • This means: • Each data attribute, other than foreign keys, describes at most one entity. • Look for the same attribute recorded more than once under different names. 3. A good data model should be flexible and adaptable to future needs.
Data Analysis & Normalization Data analysis – a technique used to improve a data model for implementation as a database. Goal is a simple, to eliminate redundancy, in order to create a flexible and adaptable database. Normalization – a data analysis technique that organizes data into groups to form nonredundant, stable, flexible, and adaptive entities.
Normalization • Convert complex tables into simpler tables • Why? • Ensures tables conform to rules • Ensures tables contains facts about one “theme” • Only a simple check IF good data model exists • Test the attributes using criteria defined over a couple of stages
Normalization • Anomalies – error prone situations that arise when tables are processes i.e. update, insert, delete • Insertion Anomalies • are experienced when we attempt to store a value for one field but cannot because the value of another field is unknown • Deletion Anomalies • are experienced when a value for one field we wish to keep is unexpectedly removed when a value for another field is deleted • Update Anomalies • are experienced when changes to multiple records of a table are needed to effect an update to a single value of a field
Example *This example assumes that a student can be on more than one course)
Anomalies • Insert • We cannot record the existence of a new course without a student being assigned to it • If we do we have an entity integrity issue Student No - Null • Update • If Jones moves to Aberdeen we must be sure to update every instance of Jones’ address • Delete • We cannot delete a student without loosing course details also
Normalization • Every attribute is dependent on: • the key (1NF) • the WHOLE key (2NF) • and nothing but the key (3NF)
First Normal Form (1NF) • An entity is in first normal form (1NF) • primary key must be assigned to table • if there are no attributes that repeat for a single occurrence of the entity • every attribute value is atomic • What are the issues here?
Normalisation • First Normal Form (1NF) says that all column values must be atomic. • 1NF dictates that, for every row-by-column position in a given table, there exists only one value, not an array or list of values. • Benefits: • If lists of values are stored in a single column, there is no simple way to manipulate those values. • Retrieval of data becomes much more laborious and difficult to generalize.
Dependencies • Functional dependency • when the value of one attribute can be determined based on the value of another attribute • e.g. attributes dependent in the primary key • Partial functional dependency • when a non-key attribute is functionally dependent on a part of the PK (i.e. attributes are dependent on only one piece of a concatenated or composite key)
Second Normal Form • Defined: • 1NF (removed all repeating groups of attributes) • Every non-key attribute is fully functionally dependent on the primary key • Note: • Tables with simple keys are automatically in 2NF
2NF • To correct: • decompose into 2 or more relations (tables) (if not already done) • one with original (concatenated) key + attributes • one (or more) with the “depended on” partial key as PK + attributes
2NF • Most tables have a single attribute primary key e.g. • Concert table below • 2 NF – every attribute should be dependent on the entire key, in this case Venue and Artist
Style attribute is only dependent on Artist pk Ensure that no redundant data is being stored Two points are worth noting here. • Don’t throw away information. In fact, this form of decomposition is termed non-loss decomposition because no information is sacrificed to the normalization process. • Decompose the tables in such a way as to allow them to be put back together again using queries.
Third Normal Form • Defined: • 2NF • No transitive dependencies • Transitive Dependency • A functional dependency between non-key attributes • i.e. an attribute is dependent on the existence of another attribute • Basically you have extra or misplaced attributes in your table • Can cause all types of update anomalies
What is Denormalization? • Normalization causes an explosion of tables • Sometimes normalization just isn’t worth it • Overly complex schema • Excessive resource overhead (from overly complex schema) • Excessively complex database operations • Extra reads • Extra Joins • Denormalize by combining tables for performance reasons