Chapter 7: Relational Database Design

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design • First Normal Form • Pitfalls in Relational Database Design • Functional Dependencies • Decomposition • Boyce-Codd Normal Form • Third Normal Form • Multivalued Dependencies and Fourth Normal Form • Overall Database Design Process

First Normal Form • Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names, composite attributes • Identification numbers like CS101 that can be broken up into parts (leads to encoding of information in application program rather than in the database) • A relational schema R is in first normal form if the domains of all attributes of R are atomic • Non-atomic values complicate storage and encourage redundant (repeated) storage of data • E.g. Set of accounts stored with each customer, and set of owners stored with each account (instead of relation depositor) • We assume all relations are in first normal form

Pitfalls in Relational Database Design • Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to • Repetition of Information. • Inability to represent certain information. • Design Goals: • Avoid redundant data • Ensure that relationships among attributes are represented • Facilitate the checking of updates for violation of database integrity constraints.

Example • Consider the relation schema:Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) • Redundancy: • Data for branch-name, branch-city, assets are repeated for each loan that a branch makes (functional dependency: branch-name branch-city assets) • Wastes space • Complicates updating, introducing possibility of inconsistency of assets value • Null values • Cannot store information about a branch if no loans exist • Can use null values, but they are difficult to handle.

Decomposition • Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) • All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1  R2 • Lossless-join decomposition.For all possible relations r on schema R r = R1 (r) R2 (r)

Example of Non Lossless-Join Decomposition • Decomposition of R = (A, B) R1 = (A) R2 = (B) A B A B    1 2 1   1 2 B(r) A(r) r A B A (r) B (r)     1 2 1 2

Goal — Devise a Theory for the Following • Decide whether a particular relation R is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies

Functional Dependencies • Constraints on the set of legal relations. • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. • A functional dependency is a generalization of the notion of a key.

Functional Dependencies (Cont.) • Let R be a relation schema   R and   R • The functional dependency  holds onR if in any legal relations r(R), for all pairs of tuples t1and t2 in r, such that t1[] = t2 []  t1[ ] = t2 [ ] • Example: Consider r(A,B) with the following instance of r. • On this instance, AB does NOT hold, but BA does hold. • 4 • 1 5 • 3 7

Functional Dependencies (Cont.) • K is a superkey for relation schema R if and only if K R • K is a candidate key for R if and only if • K R, and • for no   K,  R • Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). We expect this set of functional dependencies to hold: loan-numberamount loan-number  branch-name but would not expect the following to hold: loan-number customer-name

Use of Functional Dependencies • We use functional dependencies to: • test relations to see if they are legal under a given set of functional dependencies. • If a relation r is legal under a set F of functional dependencies, we say that rsatisfies F. • specify constraints on the set of legal relations • We say that Fholds onR if all legal relations on R satisfy the set of functional dependencies F. • Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loan-schema may, by chance, satisfy loan-number customer-name.

Functional Dependencies A C C A (satisfy)

Functional Dependencies (Cont.) • A functional dependency is trivial if it is satisfied by all instances of a relation • E.g. • customer-name, loan-number customer-name • customer-name customer-name • In general,   is trivial if    • When we design a relational database, we first list those functional dependencies that must always hold • The list of functional dependencies in banking database (next page)

Functional dependencies in banking database • On Branch-schemabranch-name®branch-city branch-name®assets • On Customer-schemacustomer-name®branch-city customer-name®customer-street • On Loan-schemaloan-number®amount loan-number® branch-name • On Account-schemaaccount-number®branch-name account-number ®balance

Closure of a Set of Functional Dependencies • Given a set F of functional dependencies, there are certain other functional dependencies that are logically implied by F. • E.g. If A  B and B  C, then we can infer that A  C • The set of all functional dependencies logically implied by F is the closure of F. • We denote the closure of F by F+. • We can find all ofF+by applying Armstrong’s Axioms: • if   , then   (reflexivity) • if  , then    (augmentation) • if  , and   , then   (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • from CG H and CG I : “union rule” can be inferred from • definition of functional dependencies, or • Augmentation of CG I to infer CG  CGI, augmentation ofCG H to inferCGI HI, and then transitivity

To compute the closure of a set of functional dependencies F: F+ = Frepeatfor each functional dependency f in F+ apply reflexivity and augmentation rules on fadd the resulting functional dependencies to F+for each pair of functional dependencies f1and f2 in F+iff1 and f2 can be combined using transitivitythen add the resulting functional dependency to F+until F+ does not change any further NOTE: We will see an alternative procedure for this task later Procedure for Computing F+

Closure of Functional Dependencies (Cont.) • We can further simplify manual computation of F+ by using the following additional rules. • If   holds and  holds, then    holds (union) • If    holds, then   holds and  holds (decomposition) • If   holds and   holds, then   holds (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.

Closure of Attribute Sets • Given a set of attributes a, define the closureof aunderF (denoted by a+) as the set of attributes that are functionally determined by a under F: ais in F+   a+ • Algorithm to compute a+, the closure of a under F result := a;while (changes to result) do for each in F do begin if  result then result := result end

Example of Attribute Set Closure • R = (A, B, C, G, H, I) • F = {A BA CCG HCG IB H} • (AG)+ 1. result = AG 2. result = ABCG (A C and A  B) 3. result = ABCGH (CG H and CG  AGBC) 4. result = ABCGHI (CG I and CG  AGBCH) • Is AG a candidate key? • Is AG a super key? • Does AG R? • Is any subset of AG a superkey? • Does A+R? • Does G+R?

There are several uses of the attribute closure algorithm: Testing for superkey: To test if  is a superkey, we compute +, and check if + contains all attributes of R. Testing functional dependencies To check if a functional dependency    holds (or, in other words, is in F+), just check if   +. That is, we compute + by using attribute closure, and then check if it contains . Is a simple and cheap test, and very useful Computing closure of F For each   R, we find the closure +, and for each S  +, we output a functional dependency   S. Uses of Attribute Closure

Canonical Cover • Sets of functional dependencies may have redundant dependencies that can be inferred from the others • Eg: A  C is redundant in: {A  B, B  C, A  C} • Parts of a functional dependency may be redundant • E.g. on RHS: {A  B, B  C, A  CD} can be simplified to {A  B, B  C, A  D} • E.g. on LHS: {A  B, B  C, AC  D} can be simplified to {A  B, B  C, A  D} • Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, with no redundant dependencies or having redundant parts of dependencies

Extraneous Attributes • Consider a set F of functional dependencies and the functional dependency   in F. • Attribute A is extraneous in  if A   and F logically implies (F – {})  {( – A) }. • Attribute A is extraneous in  if A  and the set of functional dependencies (F – {})  {(– A)} logically implies F. • Example: Given F = {AC, ABC } • B is extraneous in AB C because AC logically impliesABC. • Example: Given F = {AC, ABCD} • C is extraneous in ABCD since AC can be inferred even after deleting C

Testing if an Attribute is Extraneous • Consider a set F of functional dependencies and the functional dependency   in F. • To test if attribute A   is extraneousin(check if ( – {A})  ) • compute ( – {A})+ using the dependencies in F • check that ( – {A})+ contains ; if it does, A is extraneous • To test if attribute A  is extraneous in  • compute + using only the dependencies in F’ = (F – {})  {(– A)}, (check if   A can be inferred from F’) • check that + contains A; if it does, A is extraneous • Example: • R = (A, B, C, D, E)F = { AB CD A E E C }To check if C is extraneous in AB CD

Canonical Cover • A canonical coverfor F is a set of dependencies Fc such that • F logically implies all dependencies in Fc, and • Fclogically implies all dependencies in F, and • No functional dependency in Fccontains an extraneous attribute, and • Each left side of functional dependency in Fcis unique. • To compute a canonical cover for F:Fc = FrepeatUse the union rule to replace any dependencies in Fcof the form11 and 12 with 112 Find a functional dependency  in Fc with an extraneous attribute either in  or in  If an extraneous attribute is found, delete it from until Fc does not change • Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied

Example of Computing a Canonical Cover • R = (A, B, C)F = {A BC B C A BABC} • Combine A BC and A B into A BC • Set is now {A BC, B C, ABC} • A is extraneous in ABC because BC logically implies AB C. • Set is now {A BC, B C} • C is extraneous in ABC since A BC is logically implied by A B and B  C. • The canonical cover is: A B B C

Goals of Normalization • Decide whether a particular relation R is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies

Decomposition • Figure 7.1:(a bad database design)Lending -schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) • Repetition of information (add a new loan) • Inability to represent certain information (a branch no loan) • Observation:branch-name®branch-citybranch-name®assets but not branch-name®loan-number • From above example, we should decompose a relation schema into several schema with fewer attributes

Decomposition • Lending-schema is decomposed into two schemas:Branch-customer-schema = (branch-name, branch-city, assets, customer-name)Customer-loan-schema = (customer-name, loan-number, amount)branch-customer = Pbranch-name, branch-city, assets, customer-name(Lending)customer-loan = Pcustomer-name, loan-number, amount(Lending) • Figure7.9 and Figure 7.10 • We wish to find all branches that have loans with amount less than $1000:Branch-customer Customer-loan (Figure 7.11) • We are no longer able to represent in the database information about which customers are borrowers from which branch

Lending

Branch-customer Customer-loan

Branch-customer Customer-loan

Decomposition • Because of this loss of information, we call the decomposition of Lending-schema into Branch-customer-schema and Customer-loan-schema a lossy decomposition, or a lossy-join decomposition • A decomposition that is not a lossy-join decomposition is a lossless-join decomposition • Consider another alternative design, in which Lending-schema is decomposed into two schemas: • Branch-schema = (branch-name, branch-city, assets) • Loan-info-schema = (branch-name, customer-name, loan-number, amount) • This decomposition is a lossless-join decomposition, becausebranch-name®branch-city assets

Decomposition • All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1  R2 • Lossless-join decomposition.For all possible relations r on schema R r = R1 (r) R2 (r) • A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+: • R1 R2R1 • R1 R2R2

Decomposition • Example: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount). The set F of functional dependencies that we require to hold on Lending-schema arebranch-name®branch-city assets loan-number®amount branch-name • We decompose Lending-schema into two schemasBranch-schema = (branch-name, branch-city, assets)Loan-info-schema = (branch-name, customer-name, loan-number, amount) • Since branch-name®branch-namebranch-city assetsThis decomposition is a lossless-join decomposition • Next, we decompose Loan-info-schema into Loan-schema = (branch-name, loan-number, amount)borrower-schema = (customer-name, loan-number) • Since loan-number®amount branch-nameThis decomposition is a lossless-join decomposition

Dependency Preservation • When an update is made to the database, the system should be able to check that the update will satisfy all the given functional dependencies • To check updates efficiently, we should design relational database schemas that allow update validation without the computation of joins • F : a set of functional dependencies on a schema RR1 , R2, …, Rn : a decomposition of RFi : a subset of all functional dependencies in F+ that include only attributes of Ri • The set of restrictions F1, F2, …, Fn is the set of dependencies that can be checked efficiently • Let F’ = F1È F2È … È Fn • Dependency preserving decomposition: A decomposition has the property F’+ = F+

Normalization Using Functional Dependencies • When we decompose a relation schema R with a set of functional dependencies F into R1, R2,.., Rn we want • Lossless-join decomposition: Otherwise decomposition would result in information loss. • No redundancy: The relations Ripreferably should be in either Boyce-Codd Normal Form or Third Normal Form. • Dependency preservation: Let Fibe the set of dependencies F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, that is, (F1 F2  …  Fn)+ = F+ • Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

Example • R = (A, B, C)F = {A B, B C) • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1  R2 = {B} and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1  R2 = {A} and A  AB • Not dependency preserving (cannot check B C without computing R1 R2)

Testing for Dependency Preservation • Algorithm for testing if a decomposition D is dependency preservation:compute F+for each schema Ri in D dobegin Fi := the restriction of F+ to Ri;endF’ := ffor each restriction FidobeginF’ = F’  Fiendcompute F’+;if (F’+ = F+) then return (true)else return (false);

To check if a dependency  is preserved in a decomposition of R into R1, R2, …, Rn we apply the following simplified test (with attribute closure done w.r.t. F) result = while (changes to result) dofor eachRiin the decompositiont = (result  Ri)+  Riresult = result  t If result contains all attributes in , then the functional dependency    is preserved. We apply the test on all dependencies in F to check if a decomposition is dependency preserving This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2  …  Fn)+ Testing for Dependency Preservation (A, B, C, D) A  B AB  CD * B  C C  D (A, B, C) (A, D)

Boyce-Codd Normal Form •  is trivial (i.e.,  ) •  is a superkey for R A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form , where  R and  R,at least one of the following holds: • A database design is in BCNF if each relation schema in the database is in BCNF

Example • Customer-schema = (customer-name, customer-street, customer-city)customer-name ® customer-street customer-city Customer-schema is in BCNF • Branch-schema = (branch-name, assets, branch-city)branch-name ® assets branch-city Branch-schema is in BCNF • Loan -info-schema = (branch-name, customer-name, loan-number, amount)loan-number ® branch-name amountLoan-info-schema is not in BCNFLoan-info-schema suffers from the problem of information repetition:(Downtown, Mr. Bell, L-44, 1000)(Downtown, Ms. Bell, L-44, 1000) • Decompose Loan-info-schema into two schemas:Loan-schema = (branch-name, loan-number, amount)Borrower-schema = (customer-name, loan-number) • This decomposition is a lossless-join decomposition

Testing for BCNF • To check if a non-trivial dependency causes a violation of BCNF 1. compute + (the attribute closure of ), and 2. verify that it includes all attributes of R, that is, it is a superkey of R. • Simplified test: To check if a relation schema R with a given set of functional dependencies F is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+. • We can show that if none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either. • However, using only F is incorrect when testing a relation in a decomposition of R • E.g. Consider R (A, B, C, D), with F = { A B, B C} • Decompose R into R1(A,B) and R2(A,C,D) • Neither of the dependencies in F contain only attributes from (A,C,D) so we might be mislead into thinking R2 satisfies BCNF. • In fact, dependency AC in F+ shows R2 is not in BCNF.

Testing for BCNF • An alternative BCNF test is sometimes easier than computing every dependency in F+ • To check if a relation Riin a decomposition of R is in BCNF, we apply this test • For every subset  of attributes in Ri, check that + (the attribute closure of  under F) either includes no attribute of Ri - , or includes all attributes of Ri • If the condition is violated by some set of attributes  in Ri, the following functional dependency must be in F+ •  (+ - )  Ri

BCNF Decomposition Algorithm result := {R};done := false;compute F+;while (not done) do if (there is a schema Riin result that is not in BCNF)then beginlet   be a nontrivial functional dependency that holds on Risuch that  Riis not in F+, and   = ;result := (result – Ri)  (Ri – )  (,  );end else done := true; Note: each Riis in BCNF, and decomposition is lossless-join. Since the dependency a® b holds on Ri, schema Ri is decomposed into (Ri - b) and (a , b ), and (Ri - b) Ç (a , b ) = a

Example of BCNF Decomposition • Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)F = {branch-name assets branch-city loan-number amount branch-name}Key = {loan-number, customer-name} • For the dependencybranch-name ® assets branch-city, Lending-schema is not in BCNF:Branch = (branch-name, branch-city, assets)Loan-info = (branch-name, customer-name, loan-number, amount) • For the dependency loan-number ® branch-name amount, Loan-info is not in BCNF:Loan = (branch-name, loan-number, amount) is in BCNFBorrower = (customer-name, loan-number) is in BCNF • Not every BCNF decomposition is dependency preserving

Example of BCNF Decomposition • Banker-schema = (branch-name, customer-name, banker-name)banker-name®branch-namebranch-name customer-name®banker-name • Since banker-name is not a superkey,Banker-schema is not in BCNFThe BCNF decomposition:Banker-branch = (banker-name, branch-name)Customer-banker = (customer-name, banker-name) • This decomposition is not dependency preserving:F1 = {banker-name®branch-name}F2 = fF’ = {banker-name®branch-name}F’+¹ F+Hence, this decomposition is not dependency preserving

Third Normal Form: Motivation • There are some situations where • BCNF is not dependency preserving, and • efficient checking for FD violation on updates is important • Solution: define a weaker normal form, called Third Normal Form. • Allows some redundancy (with resultant problems; we will see examples later) • But FDs can be checked on individual relations without computing a join. • There is always a lossless-join, dependency-preserving decomposition into 3NF.

Third Normal Form • A relation schema R is in third normal form (3NF) if for all:  in F+at least one of the following holds: • is trivial (i.e.,  ) •  is a superkey for R • Each attribute A in  –  is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) • If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold). • Third condition is a minimal relaxation of BCNF to ensure dependency preservation.

Chapter 7: Relational Database Design