380 likes | 658 Views
Normalization. Normalization. Normalization is the process of efficiently organizing data in a database with two goals in mind First goal: eliminate redundant data for example, storing the same data in more than one table Second Goal: ensure data dependencies make sense
E N D
Normalization • Normalization is the process of efficiently organizing data in a database with two goals in mind • First goal: eliminate redundant data • for example, storing the same data in more than one table • Second Goal: ensure data dependencies make sense • for example, only storing related data in a table
Benefits of Normalization Bad database designs results in: redundancy: inefficient storage. anomalies: data inconsistency, difficulties in maintenance • Less storage space • Quicker updates • Less data inconsistency • Clearer data relationships • Easier to add data • Flexible Structure
First Normal Form (1NF) • A database schema is in First Normal Form if all tables are flat Student Student Takes Course May needto add keys
Functional Dependencies • A form of constraint • hence, part of the schema • Finding them is part of the database design • Also used in normalizing the relations • Warning: this is the most abstract, and “hardest” part of the database design.
Functional dependency between A and B Functional Dependencies Definition: If two tuples agree on the attributes A1, A2, …, An then they must also agree on the attributes B1, B2, …, Bm Formally: A1, A2, …, An B1, B2, …, Bm
Examples • EmpID Name, Phone, Position • Position Phone • but Phone Position EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer
In General • To check A B, erase all other columns
Example EmpID Name Phone Position E0045 Smith 1234 Clerk E1847 John 9876 Salesrep E1111 Smith 9876 Salesrep E9999 Mary 1234 Lawyer Position Phone
Functional Dependencies • Important concept in differentiating good database designs from bad ones • FD is a generalization of the notion of keys • An FD is a set of attributes whose values uniquely determine the values of the remaining attributes. Emp(eno, ename, sal) key FDs:eno => ename Dept(dno, dname, floor) eno => sal Works-in(eno,dno, hours) (eno,dno) => hours dno => dname dno => floor
Functional Dependencies • Trivial dependencies: => => if • Closure • we need to consider all FDs • some are implied by others; e.g., FDs are transitive; if A=>B and B=>C, then A=>C • Given F = set of FDs, we want to find F’ (the closure of all FDs logically implied by F)
Inference Rules for FD’s A1, A2, …, An B1, B2, …, Bm Splitting rule and Combining rule Is equivalent to A1, A2, …, An B1 A1, A2, …, An B2 . . . . . A1, A2, …, An Bm
Inference Rules for FD’s(continued) Trivial Rule A1, A2, …, An Ai where i = 1, 2, ..., n Why ?
Inference Rules for FD’s(continued) Transitive Closure Rule A1, A2, …, An B1, B2, …, Bm If and B1, B2, …, Bm C1, C2, …, Cp A1, A2, …, An C1, C2, …, Cp then Why ?
Closure of a set of FDs • It is not suff. to consider just the given set of FDs • We need to consider all FDs that hold • Given F, more FDs can be inferred • Such FDs are said to be logically implied by F • F+ is the set of all FDs logically implied by F • We can compute F+using formal defn. of FD • If F were large, this process would be lengthy & cumbersome • Axioms or Rules of Inference provide simpler technique • Armstrong;s Axioms
Inference Rules for FDs Armstrong's inference rules: IR1. (Reflexive) If Y X, then X Y IR2. (Augmentation) If X Y, then XZ YZ (Notation: XZ stands for X U Z) IR3. (Transitive) If X Y and Y Z, then X Z IR1, IR2, IR3 form a sound & complete set of inference rules Never generates any wrong FD Generate all FDs that hold
Inference Rules for FDs Some additional inference rules that are useful: Decomposition: If XYZ, then XY & XZ Union: If XY & XZ, then XYZ Psuedotransitivity: If XY & WYZ,then WXZ • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • By union rule
2NF – Example - 1 • Inventory (Item, Supplier, Cost, Supplier Address) • We first check if Cost is fully functionally dependent upon the ENTIRE Primary-Key • If I know just Item, can I find out Cost? • No. We can have > 1 supplier for the same product. • If I know just Supplier, and I find out Cost? • No. We need to know what the Item is as well. • So, Cost is fully functionally dependent upon the ENTIRE Primary-Key
2NF – Example - 2 • Inventory (Item, Supplier, Cost, Supplier Address) • We then check if Supplier Address is fully functionally dependent upon the ENTIRE Primary-Key • If I know just Item, can I find out Supplier Address? • No. We can have > 1 supplier for the same product. • If I know just Supplier, and I find out Supplier Address? • Yes. The supplier’s address does not depend on the Item. • So, Supplier Address is NOT fully functionally dependent upon the ENTIRE Primary-Key NOT 2NF
So putting things together The above relation is now in 2NF since the relation has no non-key attributes.
Transitive Dependence Give a relation R, Assume the following FD hold: Note : Both Ename and Address attributes are non-key attributes in R, and since Address depends on a non-Prime attribute Name, which depends on the primary key(EmpNo), a transitive dependency exists R2 R1
Database Normalization • Boyce-Codd Normal Form (BCNF) • A relation is in Boyce-Codd normal form (BCNF) if every determinant in the table is a candidate key. (A determinant is any attribute whose value determines other values with a row.) • If a table contains only one candidate key, the 3NF and the BCNF are equivalent. • BCNF is a special case of 3NF.
The Decomposition of a Table Structure to Meet BCNF Requirements
BCNF • Based on FDs that take into account all candidate keys of a relation • For a relation with only 1 CK, 3NF & BCNF are equivalent • A relation is said to be in BCNF if every determinant is a CK • Is PLOTS in BCNF? • NO
BCNF vs 3NF • BCNF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R • 3NF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R • N.b., no subset of a key is a key
3NF Schema For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office
3NF Schema For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R, or • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office
Lossless decomposition Account -> Office No non-trivial FDs BCNF vs 3NF 3NF has some redundancy BCNF does not Unfortunately, BCNF is not dependency preserving, but 3NF is For every functional dependency X->Y in a set F of functional dependencies over relation R, either: • Y is a subset of X or, • X is a superkey of R • Y is a subset of K for some key K of R Client, Office -> Client, Office, Account Account -> Office
Closure of a set of FDs • Given a set of FDs F on a relation R, it may be possible that several other FDs must also hold for R • For Example, R=(A,B,C) & FDs, AB & BC hold in R, then FD AC also holds on R • For a given value of A, there can be only one corresponding value of B, & for that value of B, there can be only one corresponding value for C • The closure of F is the set of all FDs that can be inferred from F, & is denoted by F+
Problem 1 • Consider the relation R(A,B,C) with functional dependencies ABC and CB. • Is R in 2NF? • Is R in 3NF? • Is R in BCNF?