Functional Dependencies and Normalization for Relational Databases

CHAPTER 10 Functional Dependencies andNormalization for Relational Databases

Informal Design Guidelines for Relational Databases • Semantics of the Relation Attributes • Reducing the redundant values in tuples • Reducing the null values in tuples • Disallowing the possibility of generating spurious tuples

Semantics of the Relation Attributes • GUIDELINE 1: • Each tuple in a relation should represent one entity or relationship instance. • Attributes of different entities should not be mixed in the same relation. • Only foreign keys should be used to refer to other entities • Entity and relationship attributes should be kept apart as much as possible.

Semantics of the Relation Attributes • Example:

Redundant Information in Tuples andUpdate Anomalies • Mixing attributes of multiple entities may cause problems • Information is stored redundantly wasting storage • Problems with update anomalies • Insertion anomalies • Deletion anomalies • Modification anomalies

Redundant Information in Tuples andUpdate Anomalies

Redundant Information in Tuples andUpdate Anomalies • Consider the relation: EMP_PROJ ( Emp#, Proj#,Ename, Pname, No_hours) • Update Anomaly: Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1. • Insert Anomaly: Cannot insert a project unless an employee is assigned to . • Inversely - Cannot insert an employee unless an he/she is assigned to a project

Redundant Information in Tuples andUpdate Anomalies • Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project

Redundant Information in Tuples andUpdate Anomalies • GUIDELINE 2: Guideline to Redundant Information in Tuples and Update Anomalies • Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. • If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.

Null Values in Tuples GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible • Attributes that are NULL frequently could be placed in separate relations • Reasons for nulls: • Attribute not applicable or invalid • Attribute value unknown (may exist) • Value known to exist, but unavailable

Spurious Tuples • Bad designs for a relational database may result in erroneous results for certain JOIN operations • The "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.

Functional dependencies • Consider the following relation schema • {Plane}→ StartTime • {Pilot, StartDate, StartTime}→ Plane • {Plane, StartDate}→ Pilot

Functional dependencies • A functional dependency is a constraint between two sets of attributes from the database. Suppose that relational database schema has n attributes R{A1, A2, … An}, X and Y are subset of R. • Definition. • A functional dependency between two sets of attributes X and Y: XY, specifies a constraint on the possible tuples that can form a relation state r of R. • The constraint is that, for any two tuples t1 and t2 in r that have t1[X]=t2[X] t1[Y] = t2[Y] .

Functional dependencies • Example:

Functional dependencies • We say that there is a functional dependency from X to Y, or that Y is functionally dependent on X. • The abbreviation for functional dependency is FD or f.d. • The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side. X → Y ⇔ (t1.X = t2.X ⇒ t1.Y = t2.Y)

Functional dependencies • Example: • Social security number determines employee name: SSN -> ENAME • Project number determines project name and location PNUMBER -> {PNAME, PLOCATION} • Employee SSN and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER} -> HOURS

Functional dependencies • If a constraint on R states that there cannot be more than one tuple with a given X value in any relation instance r(R), that is, X is a candidate key of R, this implies that XY for any subset of attributes Y of R • If XY in R, this does not say whether or not Y X in R • If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])

Functional dependencies(FDs) • A functional dependency is a property of the semantics or meaning of the attributes. • Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states of R. • A functional dependency is a property of the relation schema R, not of a particular legal relation state r of R. Hence, an FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R

Inference Rules for FDs • Given a set of FDs F on relation schema R. Any other functional dependencies hold in all legal relation instances that satisfy the dependencies in F can be said to be inferred or deduced from the FDs in F. • The following six inference rules for functional dependencies (Armstrong's inference rules) • IR1 (reflexive : phảnxạ): If X Y, then XY. • IR2. (Augmentation) If X Y, then XZ YZ. • IR3. (Transitive) If X Y and Y Z, then X Z

Inference Rules for FDs • IR4(Decomposition): If X YZ, then X Y and X Z • IR5(Union): If X Y and X Z, then X YZ • IR6(pseudo-transitive) If X Y and WY Z, then WX Z • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)

Inference Rules for FDs • Closure: the set of all dependencies that can be inferred from F is called the closure of F; it is denoted by F+. • Closureof a set of attributes X with respect to F is the set X +of all attributes that are functionally determined by X. • X +can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F

Inference Rules for FDs • Algorithm: Determining X+, the Closure of : the set of attribute X under F X+= X; repeat oldX+= X+; for each functional dependency YZ in F do If X+ Y then X += X+ ∪Z; until (X+ = oldX+);

Inference Rules for FDs • Example 1: Let Q be a relation with attributes (A,B,C,D,E,G,H) and let the following functional dependencies hold F={f1: BA; f2: DACE; f3: DH; f4: GH C; f5: ACD} • Find the closure X+of X = {AC}. • Answer: X+=ACDEH

Inference Rules for FDs • Example 2: Let Q be a relation with attributes (A,B,C,D,E,G) and let the following functional dependencies hold F = { f1: A → C; f2: A → EG; f3: B → D; f4: G → E} • Find the closure X+ and Y+ of X = {A,B}; Y = {C,G,D} • Answer: X+ = {ABCDEG} , Y+= {CGDE}

Inference Rules for FDs • Example 3: F = {SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER}HOURS} Calculate the following closure sets with respect to F; {SSN }+ = {SSN, ENAME} {PNUMBER }+ = {PNUMBER, PNAME, PLOCATION} {SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}

Inference Rules for FDs • Algorithm: Determining F+ • Find all subsets of relation Q+ • Find all possible functional dependencies of Q • Find all subset closure of Q • Based on the closure of all subsets sought to determine which the functional dependencies belong F+

Inference Rules for FDs • Example: Q(A,B,C) F = {AB  C,C  B} F+ ? • All subset of attributes • Closure of all subsets • A+ = A • B+ = B • C+ = BC • AC+= ABC • AB+ = ABC • BC+ = BC

Inference Rules for FDs • All functional dependencies:

Inference Rules for FDs • Result: F+ = { ABC, ABAC, ABBC, ABABC, CB, CBC, ACB, ACAB, ACBC, ACABC}

Equivalence of Sets of FDs • Two sets of FDs F and G are equivalent if: • Every FD in F can be inferred from G, and • Every FD in G can be inferred from F Hence, F and G are equivalent if F + =G + • Definition: F covers G if every FD in G can be inferred from F (i.e., if G + subset-of F +) • F and G are equivalent if F covers G and G covers F • There is an algorithm for checking equivalence of sets of FDs

Minimal Sets of FDs • A set of FDs is minimal if it satisfies the following conditions: • Every dependency in F has a single attribute for its right-hand side. Example: F = {A → B, A → C , B → C, AB → D} • Cannot replace any dependency XA in F with a dependency YA, where Y is a subset of X, and still have a set of dependencies that is equivalent to F. Example: F= {AB→C; B→C}

Minimal Sets of FDs • Cannot remove any dependency from F and still have a set of dependencies that is equivalent to F. Example: F = {A→BC, B→D, AB→D} redundant because: F ≡ F’= {A→BC, B→D}

Minimal Sets of FDs • Algorithm 10.2: Finding a Minimal Cover (phủtốithiểu)for a set of Functional Dependencies F • Remove all FDs have left hand side redundant • Converse all FDs with the right hand side of more than one attribute to FDs having the right hand side of one attribute • Remove all redundant FDs

Minimal Sets of FDs • Example: Let Q be a relation with attributes Q(A,B,C,D) and FD F={AB → CD, B → C, C → D} • Finding a Minimal Cover of F • AB→CD is FD with the left hand side redundant A because B+=BCD ⇒ B → CD ∈ F+ =>F≡{B → CD;B → C;C → D} • F≡{B → D; B → C;C → D}=F1mc , • In F1tt , B → D is redundant. • F≡{B → C;C → D}=Fmc

Minimal Sets of FDs • Example: Q(MSCD,MSSV,CD,HG) F = {MSCD → CD; CD → MSCD; CD,MSSV → HG; MSCD,HG → MSSV; CD,HG → MSSV; MSCD,MSSV → HG} • Find minimal cover for FD F Result: Fmc= { MSCD → CD; CD → MSCD; CD,HG → MSSV; MSCD,MSSV → HG}

Normalization of Relations • Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations • Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

Normalization of Relations • 2NF, 3NF, BCNF based on keys and FDs of a relation schema • 4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11) • Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)

Practical Use of Normal Forms • Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties • The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect • The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF) • Denormalization: the process of storing the join of higher normal form relations as a base relation—which is in a lower normal form

Definitions of Keys and Attributes Participating in Keys • A super keyof a relation schema R= {A1, A2, ...., An} is a set of attributes Ssubset-ofR with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] • A key Kis a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more.

Definitions of Keys and Attributes Participating in Keys • Algorithm 10.3: Finding all key for a relation Q and set of Functional Dependencies F • Determine all subset of Q+: X1, X2, …, Xn • Find closure of the Xi. • Super key is Xi that they are closure =Q+, suppose that set supper key are S= {S1, S2, …Sn} • Create the set contain all key of Q from S by consider subset Si, Sj of S • S (i ≠j), if Si Sjthen remove Sj(i,j=1..n).

Definitions of Keys and Attributes Participating in Keys • Example:

Definitions of Keys and Attributes Participating in Keys • Algorithm 10.4: Finding one key K for a relation Q and set of Functional Dependencies F: • Step 1: Assign K = Q+ • Step 2: A is an attribute of K, K’=K-A. If K’+=Q+ then K=K’.

Definitions of Keys and Attributes Participating in Keys • Example: U={A,B,C,D,E} , F={AB->C, AC->B, BC->DE} Find K Step1:K=U >K=ABCDE Step2:(K\A)+=>(BCDE)+=BCDE ≠ U+  K=ABCDE Step3:(K\B)+=>(ACDE)+= ABCDE = U+ K=ACDE Step4:(K\C)+ =>(ADE)+ = ADE ≠U+ K=ACDE Step5:(K\D)+ =>(ACE)+ = ACEBD=U+K=ACE Step6: (K\E)+ =>(AC)+ = ACBDE =U+K=AC

Definitions of Keys and Attributes Participating in Keys • If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. • A Prime attribute (thuộctínhkhóa) must bea member of some candidate key • A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.

First Normal Form • First normal form (INF): • Disallow multivalued attributes, composite attributes, and their combinations. • It states that the domain of an attribute must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.

First Normal Form

Second Normal Form • Second normal form (2NF) is based on the concept of full functional dependency. • A functional dependency XY is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Examples: • {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold • {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds

Second Normal Form • Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. • A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key • R can be decomposed into 2NF relations via the process of 2NF normalization

Second Normal Form • Algorithm 10.4 :The test for 2NF Give a relational Q, a set FDF Determine Q belong 2FN ? • Step1: Find all key of Q • Step2: With each keyK, Find closure of all the subsets S of K • Step3: If there are S+ contain Nonprime attribute then Q is not 2NF else Q is 2NF

Functional Dependencies and Normalization for Relational Databases