430 likes | 751 Views
Chapter 14 Database Design Theory: Introduction to Normalization Using Functional Dependencies. Functional Dependencies. R : Relation schema. r(R) : Relation instance (state) A1, A2, …, An : Attributes of R. X, Y : Sets of attributes. X { A1, A2, …, An}. Functional Dependencies.
E N D
Chapter 14Database Design Theory:Introduction to Normalization Using Functional Dependencies
Functional Dependencies • R: Relation schema. • r(R): Relation instance (state) • A1, A2, …, An: Attributes of R. • X, Y: Sets of attributes. • X {A1, A2, …, An}
Functional Dependencies • A functional dependency (X Y), between X and Y • specifies a constraint on the possible tuples that can form a relation instance r of R. • The constraint states that for any two tuples t1 and t2 in r(R) • such that t1[X] = t2[X], then t1[Y] = t2[Y]. • This means that the values of Ydepends on (or determined by) the values of the X. • the values of the Xuniquely or functionally determines the values of Y.
Functional Dependencies • Note that: • if X Y, this does not say whether Y X is correct or not. • Student(SSN, STNO, Name, Major) • SSN Name • t1(91910, 980090012, Ahmed, …..) • t2(91910, 980090012, Ahmed, …..) • STNO Major • t10((980090012,…, Math) • t12(980090012, …, Math) • SSN {SSN, Name, STNO, Major}
Functional Dependencies 1 Manage 1 Dept Manager • Manager(SSN, Name, …. Dept) • SSN {Name, Dept} (correct) • SSN uniquely determines the Name and Dept • Dept SSN (correct) • Dept uniquely determines the SSN • Name SSN (incorrect)
Inference Rules for Functional Dependencies • Let • F: set of functional dependencies defined on R • F+ (Closure of F): is the set of all functional dependencies that can be defined on R • The closure of F is denoted by F+ • F+ = { X Y | F ╞X Y} • We usually write F ╞ X Y to denote that • the FD X Y is inferred from F • X, Y are subsets of attributes
Inference Rules for Functional Dependencies • Emp-Dept(SSN, Ename, Bdate, Address, DNo, Dname, MGR-SSN) F = { SSN {Ename,Bdate,Address,DNo}, DNo {Dname, MGR-SSN} } • We can infer the following FDs: • SSN SSN (Reflexive) • SSN Ename (Decomposition) • SSN {Dname, MGR-SSN} (Transitive) • A BIG F+ may be derived from a small F • For R(A, B, C) and F = {A B, B C} • F+ = {A B, B C, A C, A A, B B, C C, AB AB, AB A, AB B, ... }
Functional Dependency • Question: if all X-values are different in all possible r(R), does X Y in R? • Theorem 1: • If X is a superkey of R and Y is any subset of R, then X Y in R • If X is a candidate key then X Y is correct for any subset of attributes of R. • Note that X Y in R is a property that must be true for all possible legal r(R), not just for the present r(R) • Note that we cannot infer (نشتق) FDs from r(R)
Functional Dependency Example: which is true? A B C D A B a1 b1 c1 d1 A C a1 b2 c1 d2 C A a2 b2 c2 d2 A D a2 b3 c2 d3 B D a3 b3 c2 d4 AB D AB C AB CD
Identify Functional Dependency • FD created by assertions. • Employees(SSN, Name, YearOfEmp, Salary, Bonus) • Assertion: • Employees hired the same year have the same salary • This assertion implies: • YearsOfEmp Salary
Inference Rules • IR1: Reflexive Rule ( منعكس) • Y X, then X Y • e.g. • SSN SSN • {P#, S#, Qty} Qty • IR2: Augmentation Rule زيادة)) • { X Y} ╞ XZ YZ • e.g.: F = {SSN Address} • F ╞ {SSN, Name} {Address, Name}
Inference Rules • IR3: Transitive rule: (متعدّي) • {X Y, Y Z} ╞ X Z • e.g. • F = {SSN DNo, DNo Dname} • F ╞ SSN Dname • IR4: Decomposition Rule: (التّحلّل) • X {Y, Z} ╞ X Y and ╞ X Z • e.g. • F = {SSN {Ename, BDate, Address, DNo}} • F ╞ SSN Ename • ╞ SSN Bdate • ╞ SSN Address • ╞ SSN DNo
Inference Rules • IR5: Union (Additive) Rule: • {X Y, X Z} ╞ X {Y, Z} • e.g. • F = {SSN Ename, SSN Bdate} • F ╞ SSN {Ename, Bdate} • IR6: Pseudo-transitive Rule: • {X Y, WY Z} ╞ WX Z • e.g. • F = { SSN STNO, {Major, STNO} Name} • F ╞ {Major, SSN} Name
Closure of Attributes • How to determine if F ╞ X Y is true? • Compute X+ : the closure of X under F • X+ denotes the set of attributes that are functionally determined by X under F. • X+ = { Y | X Y F+ } • Theorem: • X Y F+ if and only if Y X+
Algorithm for Computing X+ • Input: • a set of FDs F, a set of attributes X in R • Output: • X+ • Begin • X+ = X; • Repeat • oldX+ = X+ • for each FD Y Z in F do • if Y X+ then X+ = X+Z; • until (X+ = oldX+ ) • end
Algorithm for Computing X+ • Example: • R(A, B, C, G, H, I) = ABCGHI • X = AG • F = {A B, CG HI, B H, A C } • Compute X+ = (AG)+ • Initialization: • X+ = AG;
Algorithm for Computing X+ • 1st iteration: X+ = AG • consider A B: • since A is a subset of X+, X+ = X+ {B} = ABG; • consider CG HI: • since CG is not a subset of X+, X+ = ABG; • consider B H: • since B is a subset of X+, X+=X+ {H} = ABGH; • consider A C: • since A is a subset of X+, X+=X+{C}=ABCGH; • X+ is changed from AG to ABCGH
Algorithm for Computing X+ • 2nd iteration: X+ = ABCGH • consider CG HI: • since CG is a subset of X+, X+=X+{HI}=ABCGHI; • X+ is changed from ABCGH to ABCGHI
Algorithm for Computing X+ • 3rd iteration: • consider each FD in F again, but there is no change to X+, exit • Result: • (AG)+ = ABCGHI.
Algorithm for Computing X+ • Theorem: • Given R(A1, ..., An) and a set of FDs F in R, K R is a • superkey if K+ = {A1, ..., An}; • candidate key if K is a superkey and for any proper subset X of K, X+{A1, ..., An}.
Algorithm for Computing X+ • Continue the above example: • AG is a superkey of R since • (AG)+ = ABCGHI. • Since A+ = ABCH, G+ = G, neither A nor G is a superkey. • Hence, AG is a candidate key
Normal Forms Based on Primary Keys • Normalization of data: • is a process during which unsatisfactory relation schemas are decomposed by breaking up their attributes into smaller relation schemas that posses desirable properties • We normalize data for several reasons one of them is to avoid update anomalies • We have • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) (a stronger definition of 3NF) • All the above normal forms are based on functional dependencies.
Basic Definitions • R = {A1, …, An) • Superkey: is set of attributes S R and t1[S] t2[S] i =1, ..., n. S may contain redundant attributes. • Key (K): is a superkey with no redundant attributes, i.e. removal of any attribute from K will no longer make it a superkey. • Candidate Key: if a relation has more than one key, each is called a candidate key. One of these keys is arbitrarily chosen as a primary key. • Prime Attribute: is an attribute which is a member of any key (primary or candidate); other attributes are called nonprime attributes.
Basic Definitions • Student(SSN, STNO, Name, Address, Salary) • Superkeys • SSN • {SSN, Name} • {SSN, STNO, Name, Address, Salary} • Candidate keys • SSN • STNO • Primary Key • SSN • Prime Attribute: • SSN • STNO • Nonprime Attributes: • Name • Address • Salary
1NF (First Normal Form) • A relation schema R is in 1NF if every attribute of R takes only single and simple values. • In other words, multivalued and composite attributes are disallowed.
1NF (First Normal Form) • Student(STNO, StName, {Course(CNO, Ctitle)} • The set braces {} identify the attribute Course as multivalued • The set braces () identify the attribute Course as a composite attribute • Because of the last attribute (Course), the Student relation schema is not in 1NF.
1NF (First Normal Form) • Student(STNO, StName, {Course(CNO, Ctitle)} • To normalize it to 1NF: • Student(STNO, StName, CNO, Ctitle)
Second Normal Form (2NF) • Prime attribute: an attribute in any candidate key. • Y is fully functionally dependent on X if • X Y and • no proper subset of X functionally determines Y • FD X Y is a fully functional dependency if removal of any attribute A from X means that the dependency does not hold any more. • In other words (X - {A}) does not determine Y • A FD X Y is a partial FD if exist some attribute A which belongs to X and (X - {A}) Y still holds
Second Normal Form (2NF) • General Definition of 2NF • A relation schema R is in 2NF: • if everynonprime attribute A in R is not partially dependent on any key of R • Or • if every nonprime attribute is fully functionally dependent on every key of R
Second Normal Form (2NF) • This relation Emp-Proj is not in 2NF • Emp-Proj(SSN, Pnumber, Hours, Ename, Pname, Plocation) • FD1: {SSN, Pnumber} Hours (FD) • FD2: SSN Ename (PD) • FD3: Pnumber {Pname, Plocation} (PD) • Because of FD2 and FD3 • Convert to 2NF Normalization • Emp(SSN, Ename) • Proj(Pnumber, Pname, Plocation) • Work(SSN, Pnumber, Hours)
Second Normal Form (2NF) • Consider • Bank-Loans (Bank_name, Assets, Headquarter, Loan_no, Customer_name, Amount), • FD1: Bank_name {Assets, Headquarter} • FD2: {Bank_name, Loan_no} {Customer_name, Amount} • Because of FD1, Bank-Loans is not in 2NF. • 2NF Normalization • Banks(Bank_name, Assets, Headquarter) • Loans(Bank_name, Loan_no, Customer_name, Amount)
Second Normal Form (2NF) • 2NF is not good enough: • A relation schema in 2NF can still have serious redundancy problem as well as insertion and deletion anomalies. • Consider Parts(Part_no, Name, Location, Unit_price, Manu_id, Manu_name, Manu_Address) • It is obvious that Parts is in 2NF • Redundancy and various anomalies are introduced by • Manu_id {Manu_name, Manu_Address}
Second Normal Form (2NF) • Consider • EMP_DEPT(SSN, EName, BDate, Address, DNo, DName, DMGRSSN) • It is obvious that EMP_DEPT is in 2NF • Redundancy and various anomalies are introduced by • DNo {DName, DMGRSNN}
Third Normal Form (3NF) • A relation schema R is in 3NF if • for every FD X A, where A is a single attribute, at least one of the following is true: • (a) A X; • (b) A is a prime; • (c) X is a superkey • R is not in 3NF if a non-prime non-trivially depends on a non-superkey. • If R in 3NF, it should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes)
Third Normal Form (3NF) • Emp-Dept(SSN, Ename, Bdate, Address, Dnumber, Dname, DMGR-SSN) • FD1: SSN {Ename, Bdate, Address, Dnumber, Dname, DMGR-SSN} • FD2: Dnumber {Dname, DMGR-SSN} • Emp-Dept is in 1NF, 2NF, but because of • SSN Dnumber and • Dnumber Dname and • Dname is a nonprime attribute, and • Dnumber is not a superkey, • Emp-Dept is not in 3NF. • To transform Emp-Dept into 3NF: • Emp(Enam, SSN, Bdate, Address, Dnumber) • Dept(Dnumber, Dname, DMG-SSN)
Third Normal Form (3NF) • Employees (SSN, Name, Age, Salary, Dept_name, Dept_manager_SSN). • Employees is in 2NF • since SSN is the only candidate key and every attribute is fully dependent on it. • Employees is not in 3NF because • Dept_name Dept_manager_SSN
Third Normal Form (3NF) • LOTS(Property-ID#, County-Name, Lot#, Area, Price, Tax-Rate) • FD1: Property-ID# {County-Name,Lot#,Area, Price, Tax-Rate} • FD2: {County-Name,Lot#}{Property-ID#,Area,Price,Tax-Rate} • FD3: County-Name Tax-Rate • FD4: Area Price • 2 candidate keys: {Property-ID#},{County-Name, Lot#} • LOTS is not in 2NF, because of County-Name Tax-Rate • Tax-Rate is partially dependent on the candidate key {County-Name, Lot#}. • 2NF: • LOTS1(Property-ID#, County-Name, LOT#, Area, Price) • LOTS2(County-Name, Tax-Rate)
Third Normal Form (3NF) • The relation LOTS1 is not in 3NF, because of Area Price • Area is not a superkey and Price is not prime attribute • 3NF: • LOTS1A(Property-ID#, County-Name, Lot#, Area) • LOTS1B(Area, Price) • LOT2(County-Name, Tax-Rate) • The above relation schemas are in 3NF
Boyce-Codd Normal Form (BCNF) • Assume that we have thousands of lots, but two-counties: • Marion county and Liberty county. • Lot areas in Marion county are • .5, .6, .7, .8, .9 and 1 acres • Lot areas in Liberty county are • 1.1,1.2, …, 1.9, 2.0 acres • In this case we have : AREA County-Name • LOTS1A(Property-ID#, County-Name, Lot#, Area) • FD5: Area County-Name • Still in 3NF , since County-Name is a prime attribute
Boyce-Codd Normal Form (BCNF) • A relation schema R is in BCNF if whenever a FD X A holds in R, then X is a superkey of R. • R is in BCNF if for every non-trivial FD, the left side is a superkey. • LOTS1A-X(Property-ID#, Area, Lot#) • LOTS1A-Y(Area, County-Name) • To describe a relation schema R as “good” it should be at least in 3NF (general) or BCNF • If R is in BCNF, then R is also in 3NF • However, R in 3NF R in BCNF