210 likes | 386 Views
Bad DB Design. Duplicate of data Updating Deleting. Redundant. Deleting. Update. Normalization.
E N D
Bad DB Design • Duplicate of data • Updating • Deleting
Normalization • Normalization is a design technique that is widely used as a guide in designing relational databases. Normalization is essentially a two step process that puts data into tabular form by removing repeating groups and then removes duplicated data from the relational tables. • Normalization theory is based on the concepts of normal forms. A relational table is said to be a particular normal form if it satisfied a certain set of constraints. There are currently five normal forms that have been defined. In this course, we will cover the first three normal forms
cont • The goal of normalization is to create a set of relational tables that are free of redundant data and that can be consistently and correctly modified. This means that all tables in a relational database should be in the third normal form (3NF).
Normalization • A relational table is in 3NF if and only if all non-key columns are: • mutually independent and • fully dependent upon the primary key • Mutual independence means that no non-key column is dependent upon any combination of the other columns • The first two normal forms are intermediate steps to achieve the goal of having all tables in 3NF • In order to better understand the 2NF and higher forms, it is necessary to understand the concepts of functional dependencies
Functional Dependencies • The concept of functional dependencies is the basis for the first three normal forms. A column, Y, of the relational table R is said to be functionally dependent upon column X of R if and only if each value of X in R is associated with precisely one value of Y at any given time. X and Y may be composite. Saying that column Y is functionally dependent upon X is the same as saying the values of column X identify the values of column Y. If column X is a primary key, then all columns in the relational table R must be functionally dependent upon X. • A short-hand notation for describing a functional dependency is: R.x —> R.y • which can be read as in the relational table named R, column x functionally determines (identifies) column y.
Functional Dependencies, example • Motivation: “normalization,” the process where we break a relation schema into two or more schemas. • Example: ABCD with FD’s AB ->C, C ->D, and D ->A. • Decompose into ABC, AD. What FD’s hold in ABC ? • Not only AB ->C, but also C ->A ! • AB ->C and C ->B. • Example: A = street address, B = city, C = zip code. • There are two keys, {A,B } and {A,C }.
Example FD • Drinkers(name, addr, drinkLiked, manf, favdrink). • Reasonable FD’s to assert: • name -> addr • name -> favdrink • drinkLiked -> manf
Because name -> favBeer Because name -> addr Because beersLiked -> manf Example DF name addr drinkLiked manf favDrink Janeway Voyager Bud A.B. WickedAle Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud
FD’s With Multiple Attributes • No need for FD’s with > 1 attribute on right. • But sometimes convenient to combine FD’s as a shorthand. • Example: name -> addr and name -> favDrink become name -> addr favDrink • > 1 attribute on left may be essential. • Example: Resturnt Drink -> price
Example, Cont • Consider relation Drinkers(name, addr, drinkLiked, manf, favdrink). • {name, beersLiked} is a superkey because together these attributes determine all the other attributes. • name -> addr favBeer • beersLiked -> manf • {name, drinksLiked} is a key because neither {name} nor {drinkLiked} is a superkey. • name doesn’t -> manf; drinkLiked doesn’t -> addr.
Basic Idea • To know what FD’s hold in a projection, we start with given FD’s and find all FD’s that follow from given ones. • Then, restrict to those FD’s that involve only attributes of the projected schema.
normalization • What is normalization? Basically, it's the process of efficiently organizing data in a database. • There are two goals of the normalization process: • Eliminate redundant data (for example, storing the same data in more than one table) and • Ensure data dependencies make sense (only storing related data in a table). • Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
First Normalization Form 1FN • Eliminate duplicative columns from the same table. BY the values in each column of a table are atomic. By atomic we mean that there are no sets of values within a column. • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
example Book (ISBN, Title, Pages) Author(Author_ID, First_Name, Last_name) Subject(Subject_ID, Name) Publisher (Publisher_ID, Name, Address, City, State, Zip) The relationship between the Book table and the Author table is a many-to-many relationship: Book_Author (ISBN, Author_ID) Book_Subject (ISBN Subject_ID) One-to-many relationship exists between the Book table and the Publisher table: Book (ISBN, Title, Pages, Publisher_ID)
Second normal form 2NF • Where the First Normal Form deals with atomicity of data, the Second Normal Form (or 2NF) deals with relationships between composite key columns and non-key columns: • Meet all the requirements of the first normal form. • Any non-key columns must depend on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key. • Create relationships between these new tables and their predecessors through the use of foreign keys. • A relation R is in 2nf if every non-primary attribute A in R is fully Functionally dependent on the primary key.
Example 2NF Registration Student
Third normal form 3NF • Remove columns that are not dependent upon the primary key. • Third Normal Form (3NF) requires that all columns depend directly on the primary key. • Example: • Publisher (Publisher_ID, Name, Address, City, State, Zip) • Zip (Zip, City, State)
Example 3NF • In the last example, Adv-Room (the advisor's office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown below: Faculty Student