Normalization

Normalization

Normalization • Normalization is the process of efficiently organizing data in a database with two goals in mind • First goal: eliminate redundant data • for example, storing the same data in more than one table • Second Goal: ensure data dependencies make sense • for example, only storing related data in a table

Benefits of Normalization Bad database designs results in: redundancy: inefficient storage. anomalies: data inconsistency, difficulties in maintenance • Less storage space • Quicker updates • Less data inconsistency • Clearer data relationships • Easier to add data • Flexible Structure

First Normal Form (1NF) • A database schema is in First Normal Form if all tables are flat Student Student Takes Course May needto add keys

Functional Dependencies • A form of constraint • hence, part of the schema • Finding them is part of the database design • Also used in normalizing the relations • Warning: this is the most abstract, and “hardest” part of the database design.

Functional dependency between A and B Functional Dependencies Definition: If two tuples agree on the attributes A1, A2, …, An then they must also agree on the attributes B1, B2, …, Bm Formally: A1, A2, …, An B1, B2, …, Bm

Examples • EmpID  Name, Phone, Position • Position  Phone • but Phone  Position EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer

In General • To check A  B, erase all other columns

Example EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Position  Phone

Functional Dependencies • Important concept in differentiating good database designs from bad ones • FD is a generalization of the notion of keys • An FD is a set of attributes whose values uniquely determine the values of the remaining attributes. Emp(eno, ename, sal) key FDs:eno => ename Dept(dno, dname, floor) eno => sal Works-in(eno,dno, hours) (eno,dno) => hours dno => dname dno => floor

Functional Dependencies • Trivial dependencies:  =>   =>  if    • Closure • we need to consider all FDs • some are implied by others; e.g., FDs are transitive; if A=>B and B=>C, then A=>C • Given F = set of FDs, we want to find F’ (the closure of all FDs logically implied by F)

Inference Rules for FD’s A1, A2, …, An B1, B2, …, Bm Splitting rule and Combining rule Is equivalent to A1, A2, …, An B1 A1, A2, …, An B2 . . . . . A1, A2, …, An Bm

Inference Rules for FD’s(continued) Trivial Rule A1, A2, …, An Ai where i = 1, 2, ..., n Why ?

Inference Rules for FD’s(continued) Transitive Closure Rule A1, A2, …, An B1, B2, …, Bm If and B1, B2, …, Bm  C1, C2, …, Cp A1, A2, …, An C1, C2, …, Cp then Why ?

Closure of a set of FDs • It is not suff. to consider just the given set of FDs • We need to consider all FDs that hold • Given F, more FDs can be inferred • Such FDs are said to be logically implied by F • F+ is the set of all FDs logically implied by F • We can compute F+using formal defn. of FD • If F were large, this process would be lengthy & cumbersome • Axioms or Rules of Inference provide simpler technique • Armstrong;s Axioms

Inference Rules for FDs Armstrong's inference rules: IR1. (Reflexive) If Y  X, then X Y IR2. (Augmentation) If X Y, then XZ YZ (Notation: XZ stands for X U Z) IR3. (Transitive) If X Y and Y Z, then X Z IR1, IR2, IR3 form a sound & complete set of inference rules Never generates any wrong FD Generate all FDs that hold

Inference Rules for FDs Some additional inference rules that are useful: Decomposition: If XYZ, then XY & XZ Union: If XY & XZ, then XYZ Psuedotransitivity: If XY & WYZ,then WXZ • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • By union rule

2NF – Example - 1 • Inventory (Item, Supplier, Cost, Supplier Address) • We first check if Cost is fully functionally dependent upon the ENTIRE Primary-Key • If I know just Item, can I find out Cost? • No. We can have > 1 supplier for the same product. • If I know just Supplier, and I find out Cost? • No. We need to know what the Item is as well. • So, Cost is fully functionally dependent upon the ENTIRE Primary-Key

2NF – Example - 2 • Inventory (Item, Supplier, Cost, Supplier Address) • We then check if Supplier Address is fully functionally dependent upon the ENTIRE Primary-Key • If I know just Item, can I find out Supplier Address? • No. We can have > 1 supplier for the same product. • If I know just Supplier, and I find out Supplier Address? • Yes. The supplier’s address does not depend on the Item. • So, Supplier Address is NOT fully functionally dependent upon the ENTIRE Primary-Key  NOT 2NF

So putting things together The above relation is now in 2NF since the relation has no non-key attributes.

Transitive Dependence Give a relation R, Assume the following FD hold: Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists R2 R1

Database Normalization • Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.

A Table That Is In 3NF But Not In BCNF

The Decomposition of a Table Structure to Meet BCNF Requirements

Sample Data for a BCNF Conversion

Decomposition into BCNF

BCNF • Based on FDs that take into account all candidate keys of a relation • For a relation with only 1 CK, 3NF & BCNF are equivalent • A relation is said to be in BCNF if every determinant is a CK • Is PLOTS in BCNF? • NO

BCNF vs 3NF • BCNF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R • 3NF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R • N.b., no subset of a key is a key

3NF Schema For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office

Lossless decomposition Account -> Office No non-trivial FDs BCNF vs 3NF 3NF has some redundancy BCNF does not Unfortunately, BCNF is not dependency preserving, but 3NF is For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office

Closure of a set of FDs • Given a set of FDs F on a relation R, it may be possible that several other FDs must also hold for R • For Example, R=(A,B,C) & FDs, AB & BC hold in R, then FD AC also holds on R • For a given value of A, there can be only one corresponding value of B, & for that value of B, there can be only one corresponding value for C • The closure of F is the set of all FDs that can be inferred from F, & is denoted by F+

Problem 1 • Consider the relation R(A,B,C) with functional dependencies ABC and CB. • Is R in 2NF? • Is R in 3NF? • Is R in BCNF?

Normalization

Normalization

Presentation Transcript

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

Normalization

NORMALIZATION

Normalization

Normalization

Normalization

Normalization