310 likes | 570 Views
Closure. The closure of { B 1 …B k } under the set of FDs S , denoted by { B 1 …B k } + , is defined as follows: { B 1 …B k } + = {B | any relation satisfies S will also satisfies B 1 …B k B}. Computing the closure. Given: the set S and { A 1 ,…,A n }
E N D
Closure • The closure of {B1…Bk} under the set of FDs S, denoted by {B1…Bk}+, is defined as follows: {B1…Bk}+ = {B | any relation satisfies S will also satisfies B1…BkB}
Computing the closure • Given: the set S and {A1,…,An} • Compute: {A1,…,An}+- denote this set by X • Step 1: X = {A1,…,An} • Step 2: find a FD B1…BkB in S such that {B1,…,Bk} X and B X, then X=X{B} • Step 3: repeat step 2 until nothing more can be added to X, then go to step 4 • Step 4: return X
Example • S = {ABC, BCAD, DE, CFB} • Compute {A,B}+ • Step 1: X = {A,B} • Step 2: X = X {C}={A,B,C} because ABC • Step 3 back to step 2: X = X {D} because BCAD • Step 3 back to step 2: X = X {E} because DE • Step 3 back to step 2: nothing more • Step 3 go to step 4: return {A,B,C,D,E}
Correctness of closure algorithm • It computes true functional dependencies • proof: show that if B belongs to {A1,…,An}+ then A1…AnB holds. By induction over the number of steps (n) used in adding an attribute B into the set X • n=0 then B belongs to {A1,…,An} and so A1…AnB is a trivial functional dependency • n n+1: if B is added to X in the step n+1, then A1…AnBj for all j by inductive hypothesis; this, together with B1…BkB, implies that A1…AnB • It computes all functional dependencies • proof: show that if B does not belong to {A1,…,An}+ then A1…AnB does not hold. By constructing an instance I of the relation R such that the FD does not hold. A in the closure Others 111 … 111 000 … 000 111 … 111 111 … 111
Simple questions • What is {A1,…,An}+ if {A1,…,An} is a key of the relation? • Can {A1,…,An}+ ={A1,…,An}? • Does {B1,…,Bm} {A1,…,An} imply {B1,…,Bm}+ {A1,…,An}+?
Transitive Rules • Given A1…An B1…Bm B1…Bm C1…Ck • then A1…An C1…Ck
Closing sets of FDs • Given a set of FDs we can derive some other FDs using the rules about FDs (e.g. combining, splitting, and transitive) • For a relation R, a set of FD is called a basis for R if all other FDs of R can be derived form it. • A basis is minimal if none of its proper subsets is a basis.
Given: R with a set of FDs F S (a new relation) is obtained by removing the attributes {B1,…,Bm} from R Questions: What are the FDs of S? Answer: if A1…AnC1…Ck is a FD of R and none of the Bs appears on the left or right side ({B1,…,Bm}{A1,…,An,C1,…,Ck}=) is a FD of S Projecting FDs B’s R S
Projecting - Example • Given R(A,B,C,D) with the FDs AB, BC, and CD. • Remove the attribute B from R, we obtain a new relation S(A,C,D). • What are the FDs of S? • AC? • AD? • CD? • We can compute this by: • Compute all the closure of every subset of {A,C,D} • by using the FDs of R that do not contain B.
Homework 3.5.1 Consider a relation with schema R(A,B,C,D) and FD’s ABC, CD, and DA. • What are all the nontrivial FD’s that follow from the given FD’s? List only the FDs with one attribute on the right? (5pt) • What are the keys of R? (5pt) • What are the superkeys but not keys? (5pt) 3.5.3 Show that the following rule holds: (5pt) if A1…AnB1…Bm and C1…CkD1…Dt hold then A1…AnC1…Ck B1…BmD1…Dt also holds.
For those whole like fun: 3.5.4 Does the following hold: • if AB then BA • if ABC and AC then BC 3.5.8 A set of attributes is closed if X+=X. What are the FDs of a relation R(A,B,C,D) if • all sets of four attributes are closed • the only closed sets are {} and {A,B,C,D} • the closed sets are {}, {A,B}, {A,B,C,D} (note: the cases are considered separate) Stars: try the exercises with stars.
Design of Relational Database Schema • Some observations: • value of studioName is the same in several tuples • value of filmType is also repeated CAN WE AVOID THESE ANORMALIES? • What wrong with it? • redundancy store the same value unnecessary several time • update anormalies an update might require several changes • deletion anormalies losing information if delete a value
Possible ways to avoid anormalies (Intuition) • The bad way: start again (Oh, no!) • The natural way: try to decompose the given relation into two or more relations that • contain the same information • avoid the anormalies
Example MovieStudioStar(title, year, length, studioName, starName, filmType) is decomposed into 2 relations MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)
Decomposition Given a relation R with schema {A1,…,An}. A decomposition of R into two relations S and T with schemas {B1,…,Bm} and {C1,…,Ck}, respectively, such that • {A1,…,An} = {B1,…,Bm} {C1,…,Ck} • The tuples in S are the projections onto {B1,…,Bm} of all the tuples in R. • The tuples in T are the projections onto {C1,…,Ck} of all the tuples in R.
Example – Projections How do we come up with this decomposition? MovieStudioStar(title, year, length, studioName, starName, filmType) is decomposed into 2 relations MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)
Boyce-Codd Normal Form (BCNF) • BCNF: a relation R is in BCNF iff: whenever there is a nontrivial FD A1…AnB for R, it is the case that {A1,…,An} is a superkey for R. • Why this definition? Answer: if a relation is in BCNF then there is no anormaly. • Example: MovieStudioStar(title, year, length, studioName, starName, filmType): not in BCNF MovieStudio(title, year, length, studioName, filmType): in BCNF StarsIn(title, year, starName): in BCNF
Decomposition into BCNF • Suppose that we decompose a relation R into two relations S and T which are in BCNF. The requirements for S and T: • S and T is a decomposition of R • it is possible to reconstruct R from S and T • Will every decomposition of R satisfy these two conditions? • What are the FDs of the new relations?
Algorithm • Given a relation R with the attributes {A1,…,An}. • Step 1: For every nontrivial FD B1…BmB if {B1,…,Bm} is a superkey then returns R (no decomposition is needed) • Step 2: Takes a nontrivial FD B1…BmB such that {B1,…,Bm} is not a superkey, then decomposes R into two relations S and T with the following schema: • S’s schema: {B1,…,Bm}+ • T’s schema: {B1,…,Bm} ({A1,…,An}\{B1,…,Bm}+) • Repeat Step 1&2 for S and T until no decomposition is needed for every new relation; return the set of new relations as the result
Example The ‘new’ movie relation with the following attributes: {title,year,studioName,president,presAddress} (we call this set ALL) with the FDs: {title yearstudioName, studioNamepresident, presidentpresAddress} Only one key: {title,year} studioNamepresident violated BCNF Step 2: takes studioNamepresident, decomposes into • S with the schema {studioName}+={studioName,president,presAddress} • T with the schema {studioName,title,year}={studioName} (ALL\ {studioName}+) Check: {studioName,title,year} is in BCNF (the first two FDs) {studioName,president,presAddress} is not in BCNF Continue with the decomposition of S using presidentpresAddress and we get the following two relation schemas: {president,presAddress} and {president,studioName} both are in BCNF. The final result: {studioName,title,year}, {president,presAddress},{president,studioName}
Recovering information from a decomposition • Suppose that R with the schema {A1,…,An} is decomposed into two relations S and T according to the algorithm whose attributes are {B1,…,Bm}+ and {B1,…,Bm} ({A1,…,An}\{B1,…,Bm}+) • The tuples of R can be obtained by joining all possible pairs of S and T where {B1,…,Bm} have the same values.
Recovering … the B’s others the rest of the closure t (R) t’ (S) Projection Join t’’ (T) {B1,…,Bm} {B1,…,Bm}+\ {B1,…,Bm} {A1,…,An}\{B1,…,Bm}+
Example – Decomposition and Recovering MovieStudioStar(title, year, length, studioName, starName, filmType) is not in BCNF is decomposed into 2 relations that are in BCNF: MovieStudio(title, year, length, studioName, filmType) and StarsIn(title, year, starName)
Some remarks • The algorithm will stop and output a set of BCNF relations. • Not every decomposition according to the algorithm is good • The FD’s for the new relations are determined by ‘projecting’. • If a decomposition is based on FDs (according to the algorithm) then the recovering process will give us exactly the original relation. • If a decomposition is not based on FDs then we might not be able to recover the original relation from the new ones: • Example: R(A,B,C) with AB and we decompose it into S(A,B) and T(B,C):
Third Normal Form (3NF) • So far: if a relation is not in BCNF then anormalies arise. • Given a relation Bookings with the attributes: • title: name of the movie • theater: name of the theater where the movie is being shown • city: the city where the theater is located (a tuple (m,t,c): represents the fact that movie m is shown at theater t in city c)
Bookings(title,theater,city) • The FDs of the relations: • theater city • title city theater • theatercity violates the BCNF condition, why? • decomposition yields: {theater,city} and {theater,title} • Consider the relations: Possible relations according to the FDs of each schema recovering Violate the FD title citytheater
3NF • A relaxation of the BCNF condition: a relation R is in 3NF if: whenever there is a nontrivial FD A1…AnB, either {A1,…,An} is a superkey or B is a member of some key. • Bookings(title,theater,city) is in 3NF
Checking BCNF and 3NF • Given R(A,B,C,D) with FDs ABC, CD, DA. • Question: Indicate the BCNF violations and 3NF violations. • Steps in answering the question: • Step 1: compute all nontrivial FDs (right side: one att) • Step 2: find all keys • Step 3: find all the violations • Step 1: ABC, CD, DA, ABD, CA, DBC, ACD • Step 2: Keys – {A,B}, {C,B}, and {D,B} • Step 3: • BCNF violation: CD, DA, CA, ACD and their trivial extensions (e.g. CDD, DAA,…) • 3NF violation: none