860 likes | 1.42k Views
CHAPTER 10. Functional Dependencies and Normalization for Relational Databases. Informal Design Guidelines for Relational Databases. Semantics of the Relation Attributes Reducing the redundant values in tuples Reducing the null values in tuples
E N D
CHAPTER 10 Functional Dependencies andNormalization for Relational Databases
Informal Design Guidelines for Relational Databases • Semantics of the Relation Attributes • Reducing the redundant values in tuples • Reducing the null values in tuples • Disallowing the possibility of generating spurious tuples
Semantics of the Relation Attributes • GUIDELINE 1: • Each tuple in a relation should represent one entity or relationship instance. • Attributes of different entities should not be mixed in the same relation. • Only foreign keys should be used to refer to other entities • Entity and relationship attributes should be kept apart as much as possible.
Semantics of the Relation Attributes • Example:
Redundant Information in Tuples andUpdate Anomalies • Mixing attributes of multiple entities may cause problems • Information is stored redundantly wasting storage • Problems with update anomalies • Insertion anomalies • Deletion anomalies • Modification anomalies
Redundant Information in Tuples andUpdate Anomalies • Consider the relation: EMP_PROJ ( Emp#, Proj#,Ename, Pname, No_hours) • Update Anomaly: Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1. • Insert Anomaly: Cannot insert a project unless an employee is assigned to . • Inversely - Cannot insert an employee unless an he/she is assigned to a project
Redundant Information in Tuples andUpdate Anomalies • Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project
Redundant Information in Tuples andUpdate Anomalies • GUIDELINE 2: Guideline to Redundant Information in Tuples and Update Anomalies • Design the base relation schemas so that no insertion, deletion, or modification anomalies are present in the relations. • If any anomalies are present, note them clearly and make sure that the programs that update the database will operate correctly.
Null Values in Tuples GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible • Attributes that are NULL frequently could be placed in separate relations • Reasons for nulls: • Attribute not applicable or invalid • Attribute value unknown (may exist) • Value known to exist, but unavailable
Spurious Tuples • Bad designs for a relational database may result in erroneous results for certain JOIN operations • The "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.
Functional dependencies • Consider the following relation schema • {Plane}→ StartTime • {Pilot, StartDate, StartTime}→ Plane • {Plane, StartDate}→ Pilot
Functional dependencies • A functional dependency is a constraint between two sets of attributes from the database. Suppose that relational database schema has n attributes R{A1, A2, … An}, X and Y are subset of R. • Definition. • A functional dependency between two sets of attributes X and Y: XY, specifies a constraint on the possible tuples that can form a relation state r of R. • The constraint is that, for any two tuples t1 and t2 in r that have t1[X]=t2[X] t1[Y] = t2[Y] .
Functional dependencies • Example:
Functional dependencies • We say that there is a functional dependency from X to Y, or that Y is functionally dependent on X. • The abbreviation for functional dependency is FD or f.d. • The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side. X → Y ⇔ (t1.X = t2.X ⇒ t1.Y = t2.Y)
Functional dependencies • Example: • Social security number determines employee name: SSN -> ENAME • Project number determines project name and location PNUMBER -> {PNAME, PLOCATION} • Employee SSN and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER} -> HOURS
Functional dependencies • If a constraint on R states that there cannot be more than one tuple with a given X value in any relation instance r(R), that is, X is a candidate key of R, this implies that XY for any subset of attributes Y of R • If XY in R, this does not say whether or not Y X in R • If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K])
Functional dependencies(FDs) • A functional dependency is a property of the semantics or meaning of the attributes. • Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation states of R. • A functional dependency is a property of the relation schema R, not of a particular legal relation state r of R. Hence, an FD cannot be inferred automatically from a given relation extension r but must be defined explicitly by someone who knows the semantics of the attributes of R
Inference Rules for FDs • Given a set of FDs F on relation schema R. Any other functional dependencies hold in all legal relation instances that satisfy the dependencies in F can be said to be inferred or deduced from the FDs in F. • The following six inference rules for functional dependencies (Armstrong's inference rules) • IR1 (reflexive : phảnxạ): If X Y, then XY. • IR2. (Augmentation) If X Y, then XZ YZ. • IR3. (Transitive) If X Y and Y Z, then X Z
Inference Rules for FDs • IR4(Decomposition): If X YZ, then X Y and X Z • IR5(Union): If X Y and X Z, then X YZ • IR6(pseudo-transitive) If X Y and WY Z, then WX Z • The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property)
Inference Rules for FDs • Closure: the set of all dependencies that can be inferred from F is called the closure of F; it is denoted by F+. • Closureof a set of attributes X with respect to F is the set X +of all attributes that are functionally determined by X. • X +can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
Inference Rules for FDs • Algorithm: Determining X+, the Closure of : the set of attribute X under F X+= X; repeat oldX+= X+; for each functional dependency YZ in F do If X+ Y then X += X+ ∪Z; until (X+ = oldX+);
Inference Rules for FDs • Example 1: Let Q be a relation with attributes (A,B,C,D,E,G,H) and let the following functional dependencies hold F={f1: BA; f2: DACE; f3: DH; f4: GH C; f5: ACD} • Find the closure X+of X = {AC}. • Answer: X+=ACDEH
Inference Rules for FDs • Example 2: Let Q be a relation with attributes (A,B,C,D,E,G) and let the following functional dependencies hold F = { f1: A → C; f2: A → EG; f3: B → D; f4: G → E} • Find the closure X+ and Y+ of X = {A,B}; Y = {C,G,D} • Answer: X+ = {ABCDEG} , Y+= {CGDE}
Inference Rules for FDs • Example 3: F = {SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN, PNUMBER}HOURS} Calculate the following closure sets with respect to F; {SSN }+ = {SSN, ENAME} {PNUMBER }+ = {PNUMBER, PNAME, PLOCATION} {SSN, PNUMBER}+ = {SSN, PNUMBER, ENAME, PNAME, PLOCATION, HOURS}
Inference Rules for FDs • Algorithm: Determining F+ • Find all subsets of relation Q+ • Find all possible functional dependencies of Q • Find all subset closure of Q • Based on the closure of all subsets sought to determine which the functional dependencies belong F+
Inference Rules for FDs • Example: Q(A,B,C) F = {AB C,C B} F+ ? • All subset of attributes • Closure of all subsets • A+ = A • B+ = B • C+ = BC • AC+= ABC • AB+ = ABC • BC+ = BC
Inference Rules for FDs • All functional dependencies:
Inference Rules for FDs • Result: F+ = { ABC, ABAC, ABBC, ABABC, CB, CBC, ACB, ACAB, ACBC, ACABC}
Equivalence of Sets of FDs • Two sets of FDs F and G are equivalent if: • Every FD in F can be inferred from G, and • Every FD in G can be inferred from F Hence, F and G are equivalent if F + =G + • Definition: F covers G if every FD in G can be inferred from F (i.e., if G + subset-of F +) • F and G are equivalent if F covers G and G covers F • There is an algorithm for checking equivalence of sets of FDs
Minimal Sets of FDs • A set of FDs is minimal if it satisfies the following conditions: • Every dependency in F has a single attribute for its right-hand side. Example: F = {A → B, A → C , B → C, AB → D} • Cannot replace any dependency XA in F with a dependency YA, where Y is a subset of X, and still have a set of dependencies that is equivalent to F. Example: F= {AB→C; B→C}
Minimal Sets of FDs • Cannot remove any dependency from F and still have a set of dependencies that is equivalent to F. Example: F = {A→BC, B→D, AB→D} redundant because: F ≡ F’= {A→BC, B→D}
Minimal Sets of FDs • Algorithm 10.2: Finding a Minimal Cover (phủtốithiểu)for a set of Functional Dependencies F • Remove all FDs have left hand side redundant • Converse all FDs with the right hand side of more than one attribute to FDs having the right hand side of one attribute • Remove all redundant FDs
Minimal Sets of FDs • Example: Let Q be a relation with attributes Q(A,B,C,D) and FD F={AB → CD, B → C, C → D} • Finding a Minimal Cover of F • AB→CD is FD with the left hand side redundant A because B+=BCD ⇒ B → CD ∈ F+ =>F≡{B → CD;B → C;C → D} • F≡{B → D; B → C;C → D}=F1mc , • In F1tt , B → D is redundant. • F≡{B → C;C → D}=Fmc
Minimal Sets of FDs • Example: Q(MSCD,MSSV,CD,HG) F = {MSCD → CD; CD → MSCD; CD,MSSV → HG; MSCD,HG → MSSV; CD,HG → MSSV; MSCD,MSSV → HG} • Find minimal cover for FD F Result: Fmc= { MSCD → CD; CD → MSCD; CD,HG → MSSV; MSCD,MSSV → HG}
Normalization of Relations • Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations • Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form
Normalization of Relations • 2NF, 3NF, BCNF based on keys and FDs of a relation schema • 4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11) • Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)
Practical Use of Normal Forms • Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties • The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect • The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF) • Denormalization: the process of storing the join of higher normal form relations as a base relation—which is in a lower normal form
Definitions of Keys and Attributes Participating in Keys • A super keyof a relation schema R= {A1, A2, ...., An} is a set of attributes Ssubset-ofR with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] • A key Kis a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more.
Definitions of Keys and Attributes Participating in Keys • Algorithm 10.3: Finding all key for a relation Q and set of Functional Dependencies F • Determine all subset of Q+: X1, X2, …, Xn • Find closure of the Xi. • Super key is Xi that they are closure =Q+, suppose that set supper key are S= {S1, S2, …Sn} • Create the set contain all key of Q from S by consider subset Si, Sj of S • S (i ≠j), if Si Sjthen remove Sj(i,j=1..n).
Definitions of Keys and Attributes Participating in Keys • Example:
Definitions of Keys and Attributes Participating in Keys • Example:
Definitions of Keys and Attributes Participating in Keys • Algorithm 10.4: Finding one key K for a relation Q and set of Functional Dependencies F: • Step 1: Assign K = Q+ • Step 2: A is an attribute of K, K’=K-A. If K’+=Q+ then K=K’.
Definitions of Keys and Attributes Participating in Keys • Example: U={A,B,C,D,E} , F={AB->C, AC->B, BC->DE} Find K Step1:K=U >K=ABCDE Step2:(K\A)+=>(BCDE)+=BCDE ≠ U+ K=ABCDE Step3:(K\B)+=>(ACDE)+= ABCDE = U+ K=ACDE Step4:(K\C)+ =>(ADE)+ = ADE ≠U+ K=ACDE Step5:(K\D)+ =>(ACE)+ = ACEBD=U+K=ACE Step6: (K\E)+ =>(AC)+ = ACBDE =U+K=AC
Definitions of Keys and Attributes Participating in Keys • If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. • A Prime attribute (thuộctínhkhóa) must bea member of some candidate key • A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key.
First Normal Form • First normal form (INF): • Disallow multivalued attributes, composite attributes, and their combinations. • It states that the domain of an attribute must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
Second Normal Form • Second normal form (2NF) is based on the concept of full functional dependency. • A functional dependency XY is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more. Examples: • {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold • {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Second Normal Form • Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. • A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key • R can be decomposed into 2NF relations via the process of 2NF normalization
Second Normal Form • Algorithm 10.4 :The test for 2NF Give a relational Q, a set FDF Determine Q belong 2FN ? • Step1: Find all key of Q • Step2: With each keyK, Find closure of all the subsets S of K • Step3: If there are S+ contain Nonprime attribute then Q is not 2NF else Q is 2NF