Chapter 14 Database Design Theory: Introduction to Normalization Using Functional Dependencies

Chapter 14Database Design Theory:Introduction to Normalization Using Functional Dependencies

Functional Dependencies • R: Relation schema. • r(R): Relation instance (state) • A1, A2, …, An: Attributes of R. • X, Y: Sets of attributes. • X  {A1, A2, …, An}

Functional Dependencies • A functional dependency (X  Y), between X and Y • specifies a constraint on the possible tuples that can form a relation instance r of R. • The constraint states that for any two tuples t1 and t2 in r(R) • such that t1[X] = t2[X], then t1[Y] = t2[Y]. • This means that the values of Ydepends on (or determined by) the values of the X. • the values of the Xuniquely or functionally determines the values of Y.

Functional Dependencies • Note that: • if X  Y, this does not say whether Y X is correct or not. • Student(SSN, STNO, Name, Major) • SSN  Name • t1(91910, 980090012, Ahmed, …..) • t2(91910, 980090012, Ahmed, …..) • STNO  Major • t10((980090012,…, Math) • t12(980090012, …, Math) • SSN  {SSN, Name, STNO, Major}

Functional Dependencies 1 Manage 1 Dept Manager • Manager(SSN, Name, …. Dept) • SSN  {Name, Dept} (correct) • SSN uniquely determines the Name and Dept • Dept  SSN (correct) • Dept uniquely determines the SSN • Name  SSN (incorrect)

Inference Rules for Functional Dependencies • Let • F: set of functional dependencies defined on R • F+ (Closure of F): is the set of all functional dependencies that can be defined on R • The closure of F is denoted by F+ • F+ = { X  Y | F ╞X  Y} • We usually write F ╞ X  Y to denote that • the FD X Y is inferred from F • X, Y are subsets of attributes

Inference Rules for Functional Dependencies • Emp-Dept(SSN, Ename, Bdate, Address, DNo, Dname, MGR-SSN) F = { SSN  {Ename,Bdate,Address,DNo}, DNo  {Dname, MGR-SSN} } • We can infer the following FDs: • SSN  SSN (Reflexive) • SSN  Ename (Decomposition) • SSN {Dname, MGR-SSN} (Transitive) • A BIG F+ may be derived from a small F • For R(A, B, C) and F = {A  B, B  C} • F+ = {A  B, B  C, A  C, A  A, B  B, C  C, AB  AB, AB  A, AB  B, ... }

Functional Dependency • Question: if all X-values are different in all possible r(R), does X  Y in R? • Theorem 1: • If X is a superkey of R and Y is any subset of R, then X  Y in R • If X is a candidate key then X  Y is correct for any subset of attributes of R. • Note that X  Y in R is a property that must be true for all possible legal r(R), not just for the present r(R) • Note that we cannot infer (نشتق) FDs from r(R)

Functional Dependency Example: which is true? A B C D A  B a1 b1 c1 d1 A  C a1 b2 c1 d2 C  A a2 b2 c2 d2 A  D a2 b3 c2 d3 B  D a3 b3 c2 d4 AB  D AB  C AB  CD

Identify Functional Dependency • FD created by assertions. • Employees(SSN, Name, YearOfEmp, Salary, Bonus) • Assertion: • Employees hired the same year have the same salary • This assertion implies: • YearsOfEmp  Salary

Inference Rules • IR1: Reflexive Rule ( منعكس) • Y  X, then X Y • e.g. • SSN  SSN • {P#, S#, Qty}  Qty • IR2: Augmentation Rule زيادة)) • { X  Y} ╞ XZ  YZ • e.g.: F = {SSN  Address} • F ╞ {SSN, Name}  {Address, Name}

Inference Rules • IR3: Transitive rule: (متعدّي) • {X  Y, Y  Z} ╞ X  Z • e.g. • F = {SSN  DNo, DNo  Dname} • F ╞ SSN  Dname • IR4: Decomposition Rule: (التّحلّل) • X  {Y, Z} ╞ X  Y and ╞ X  Z • e.g. • F = {SSN  {Ename, BDate, Address, DNo}} • F ╞ SSN  Ename • ╞ SSN  Bdate • ╞ SSN  Address • ╞ SSN  DNo

Inference Rules • IR5: Union (Additive) Rule: • {X  Y, X  Z} ╞ X  {Y, Z} • e.g. • F = {SSN  Ename, SSN  Bdate} • F ╞ SSN  {Ename, Bdate} • IR6: Pseudo-transitive Rule: • {X  Y, WY  Z} ╞ WX  Z • e.g. • F = { SSN  STNO, {Major, STNO}  Name} • F ╞ {Major, SSN}  Name

Closure of Attributes • How to determine if F ╞ X  Y is true? • Compute X+ : the closure of X under F • X+ denotes the set of attributes that are functionally determined by X under F. • X+ = { Y | X  Y  F+ } • Theorem: • X Y  F+ if and only if Y  X+

Algorithm for Computing X+ • Input: • a set of FDs F, a set of attributes X in R • Output: • X+ • Begin • X+ = X; • Repeat • oldX+ = X+ • for each FD Y Z in F do • if Y  X+ then X+ = X+Z; • until (X+ = oldX+ ) • end

Algorithm for Computing X+ • Example: • R(A, B, C, G, H, I) = ABCGHI • X = AG • F = {A  B, CG  HI, B  H, A  C } • Compute X+ = (AG)+ • Initialization: • X+ = AG;

Algorithm for Computing X+ • 1st iteration: X+ = AG • consider A  B: • since A is a subset of X+, X+ = X+ {B} = ABG; • consider CG  HI: • since CG is not a subset of X+, X+ = ABG; • consider B  H: • since B is a subset of X+, X+=X+ {H} = ABGH; • consider A C: • since A is a subset of X+, X+=X+{C}=ABCGH; • X+ is changed from AG to ABCGH

Algorithm for Computing X+ • 2nd iteration: X+ = ABCGH • consider CG  HI: • since CG is a subset of X+, X+=X+{HI}=ABCGHI; • X+ is changed from ABCGH to ABCGHI

Algorithm for Computing X+ • 3rd iteration: • consider each FD in F again, but there is no change to X+, exit • Result: • (AG)+ = ABCGHI.

Algorithm for Computing X+ • Theorem: • Given R(A1, ..., An) and a set of FDs F in R, K  R is a • superkey if K+ = {A1, ..., An}; • candidate key if K is a superkey and for any proper subset X of K, X+{A1, ..., An}.

Algorithm for Computing X+ • Continue the above example: • AG is a superkey of R since • (AG)+ = ABCGHI. • Since A+ = ABCH, G+ = G, neither A nor G is a superkey. • Hence, AG is a candidate key

Normal Forms Based on Primary Keys • Normalization of data: • is a process during which unsatisfactory relation schemas are decomposed by breaking up their attributes into smaller relation schemas that posses desirable properties • We normalize data for several reasons one of them is to avoid update anomalies • We have • First Normal Form (1NF) • Second Normal Form (2NF) • Third Normal Form (3NF) • Boyce-Codd Normal Form (BCNF) (a stronger definition of 3NF) • All the above normal forms are based on functional dependencies.

Basic Definitions • R = {A1, …, An) • Superkey: is set of attributes S  R and t1[S]  t2[S] i =1, ..., n. S may contain redundant attributes. • Key (K): is a superkey with no redundant attributes, i.e. removal of any attribute from K will no longer make it a superkey. • Candidate Key: if a relation has more than one key, each is called a candidate key. One of these keys is arbitrarily chosen as a primary key. • Prime Attribute: is an attribute which is a member of any key (primary or candidate); other attributes are called nonprime attributes.

Basic Definitions • Student(SSN, STNO, Name, Address, Salary) • Superkeys • SSN • {SSN, Name} • {SSN, STNO, Name, Address, Salary} • Candidate keys • SSN • STNO • Primary Key • SSN • Prime Attribute: • SSN • STNO • Nonprime Attributes: • Name • Address • Salary

1NF (First Normal Form) • A relation schema R is in 1NF if every attribute of R takes only single and simple values. • In other words, multivalued and composite attributes are disallowed.

1NF (First Normal Form) • Student(STNO, StName, {Course(CNO, Ctitle)} • The set braces {} identify the attribute Course as multivalued • The set braces () identify the attribute Course as a composite attribute • Because of the last attribute (Course), the Student relation schema is not in 1NF.

1NF (First Normal Form) • Student(STNO, StName, {Course(CNO, Ctitle)} • To normalize it to 1NF: • Student(STNO, StName, CNO, Ctitle)

Second Normal Form (2NF) • Prime attribute: an attribute in any candidate key. • Y is fully functionally dependent on X if • X  Y and • no proper subset of X functionally determines Y • FD X  Y is a fully functional dependency if removal of any attribute A from X means that the dependency does not hold any more. • In other words (X - {A}) does not determine Y • A FD X  Y is a partial FD if exist some attribute A which belongs to X and (X - {A})  Y still holds

Second Normal Form (2NF) • General Definition of 2NF • A relation schema R is in 2NF: • if everynonprime attribute A in R is not partially dependent on any key of R • Or • if every nonprime attribute is fully functionally dependent on every key of R

Second Normal Form (2NF) • This relation Emp-Proj is not in 2NF • Emp-Proj(SSN, Pnumber, Hours, Ename, Pname, Plocation) • FD1: {SSN, Pnumber}  Hours (FD) • FD2: SSN Ename (PD) • FD3: Pnumber {Pname, Plocation} (PD) • Because of FD2 and FD3 • Convert to 2NF Normalization • Emp(SSN, Ename) • Proj(Pnumber, Pname, Plocation) • Work(SSN, Pnumber, Hours)

Second Normal Form (2NF) • Consider • Bank-Loans (Bank_name, Assets, Headquarter, Loan_no, Customer_name, Amount), • FD1: Bank_name {Assets, Headquarter} • FD2: {Bank_name, Loan_no}  {Customer_name, Amount} • Because of FD1, Bank-Loans is not in 2NF. • 2NF Normalization • Banks(Bank_name, Assets, Headquarter) • Loans(Bank_name, Loan_no, Customer_name, Amount)

Second Normal Form (2NF) • 2NF is not good enough: • A relation schema in 2NF can still have serious redundancy problem as well as insertion and deletion anomalies. • Consider Parts(Part_no, Name, Location, Unit_price, Manu_id, Manu_name, Manu_Address) • It is obvious that Parts is in 2NF • Redundancy and various anomalies are introduced by • Manu_id  {Manu_name, Manu_Address}

Second Normal Form (2NF) • Consider • EMP_DEPT(SSN, EName, BDate, Address, DNo, DName, DMGRSSN) • It is obvious that EMP_DEPT is in 2NF • Redundancy and various anomalies are introduced by • DNo  {DName, DMGRSNN}

Third Normal Form (3NF) • A relation schema R is in 3NF if • for every FD X  A, where A is a single attribute, at least one of the following is true: • (a) A  X; • (b) A is a prime; • (c) X is a superkey • R is not in 3NF if a non-prime non-trivially depends on a non-superkey. • If R in 3NF, it should not have a nonkey attribute functionally determined by another nonkey attribute (or by a set of nonkey attributes)

Third Normal Form (3NF) • Emp-Dept(SSN, Ename, Bdate, Address, Dnumber, Dname, DMGR-SSN) • FD1: SSN  {Ename, Bdate, Address, Dnumber, Dname, DMGR-SSN} • FD2: Dnumber  {Dname, DMGR-SSN} • Emp-Dept is in 1NF, 2NF, but because of • SSN  Dnumber and • Dnumber  Dname and • Dname is a nonprime attribute, and • Dnumber is not a superkey, • Emp-Dept is not in 3NF. • To transform Emp-Dept into 3NF: • Emp(Enam, SSN, Bdate, Address, Dnumber) • Dept(Dnumber, Dname, DMG-SSN)

Third Normal Form (3NF) • Employees (SSN, Name, Age, Salary, Dept_name, Dept_manager_SSN). • Employees is in 2NF • since SSN is the only candidate key and every attribute is fully dependent on it. • Employees is not in 3NF because • Dept_name  Dept_manager_SSN

Third Normal Form (3NF) • LOTS(Property-ID#, County-Name, Lot#, Area, Price, Tax-Rate) • FD1: Property-ID#  {County-Name,Lot#,Area, Price, Tax-Rate} • FD2: {County-Name,Lot#}{Property-ID#,Area,Price,Tax-Rate} • FD3: County-Name  Tax-Rate • FD4: Area  Price • 2 candidate keys: {Property-ID#},{County-Name, Lot#} • LOTS is not in 2NF, because of County-Name  Tax-Rate • Tax-Rate is partially dependent on the candidate key {County-Name, Lot#}. • 2NF: • LOTS1(Property-ID#, County-Name, LOT#, Area, Price) • LOTS2(County-Name, Tax-Rate)

Third Normal Form (3NF) • The relation LOTS1 is not in 3NF, because of Area  Price • Area is not a superkey and Price is not prime attribute • 3NF: • LOTS1A(Property-ID#, County-Name, Lot#, Area) • LOTS1B(Area, Price) • LOT2(County-Name, Tax-Rate) • The above relation schemas are in 3NF

Boyce-Codd Normal Form (BCNF) • Assume that we have thousands of lots, but two-counties: • Marion county and Liberty county. • Lot areas in Marion county are • .5, .6, .7, .8, .9 and 1 acres • Lot areas in Liberty county are • 1.1,1.2, …, 1.9, 2.0 acres • In this case we have : AREA  County-Name • LOTS1A(Property-ID#, County-Name, Lot#, Area) • FD5: Area  County-Name • Still in 3NF , since County-Name is a prime attribute

Boyce-Codd Normal Form (BCNF) • A relation schema R is in BCNF if whenever a FD X  A holds in R, then X is a superkey of R. • R is in BCNF if for every non-trivial FD, the left side is a superkey. • LOTS1A-X(Property-ID#, Area, Lot#) • LOTS1A-Y(Area, County-Name) • To describe a relation schema R as “good” it should be at least in 3NF (general) or BCNF • If R is in BCNF, then R is also in 3NF • However, R in 3NF R in BCNF

Chapter 14 Database Design Theory: Introduction to Normalization Using Functional Dependencies

Chapter 14 Database Design Theory: Introduction to Normalization Using Functional Dependencies

Presentation Transcript

Functional Dependencies and Relational Schema Design

CHAPTER 6

Accounting Information Systems 9 th Edition

Chapter 7: Relational Database Design

Chapter 1

Chapter 5

An Introduction to Functional Data Analysis

Hardware Functional Verification Class

Database Management Systems

Chapter 3 Managing the firm

Introduction Background Distributed DBMS Architecture Distributed Database Design

Chapter 23: XML

Functional Dependencies and Normalization for Relational Databases

Chapter 10: XML

Database Systems

Concepts of Database Management Seventh Edition

Graph Theory Chapter 1 An Introduction to Graphs

Introduction to Database Design Methodology

Introduction to Database

Database Design Methodology

Time-Dependent Density-Functional Theory