410 likes | 475 Views
Chapter 7: Relational Database Design. Chapter 7: Relational Database Design. First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition Boyce-Codd Normal Form Third Normal Form Multivalued Dependencies and Fourth Normal Form
E N D
Chapter 7: Relational Database Design • First Normal Form • Pitfalls in Relational Database Design • Functional Dependencies • Decomposition • Boyce-Codd Normal Form • Third Normal Form • Multivalued Dependencies and Fourth Normal Form • Overall Database Design Process
First Normal Form • Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names, composite attributes • Identification numbers like CS101 that can be broken up into parts • A relational schema R is in first normal form if the domains of all attributes of R are atomic • Non-atomic values complicate storage and encourage redundant (repeated) storage of data • E.g. Set of accounts stored with each customer, and set of owners stored with each account • We assume all relations are in first normal form
First Normal Form (Contd.) • Atomicity is actually a property of how the elements of the domain are used. • E.g. Strings would normally be considered indivisible • Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 • If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. • Doing so is a bad idea: leads to encoding of information in application program rather than in the database.
Pitfalls in Relational Database Design • Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to • Repetition of Information. • Inability to represent certain information. • Design Goals: • Avoid redundant data • Ensure that relationships among attributes are represented • Facilitate the checking of updates for violation of database integrity constraints.
Example • Consider the relation schema:Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) • Redundancy: • Data for branch-name, branch-city, assets are repeated for each loan that a branch makes • Wastes space • Complicates updating, introducing possibility of inconsistency of assets value • Null values • Cannot store information about a branch if no loans exist • Can use null values, but they are difficult to handle.
Decomposition • Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) • All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1 R2 • Lossless-join decomposition.For all possible relations r on schema R r = R1 (r) R2 (r)
Example of Non Lossless-Join Decomposition • Decomposition of R = (A, B) R2 = (A) R2 = (B) A B A B 1 2 1 1 2 B(r) A(r) r A B A (r) B (r) 1 2 1 2
Goal — Devise a Theory for the Following • Decide whether a particular relation R is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies
Functional Dependencies • Constraints on the set of legal relations. • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes. • A functional dependency is a generalization of the notion of a key.
Functional Dependencies (Cont.) • Let R be a relation schema R and R • The functional dependency holds onR if and only if for any legal relations r(R), whenever any two tuples t1and t2 of r agree on the attributes , they also agree on the attributes . That is, t1[] = t2 [] t1[ ] = t2 [ ]
Functional Dependencies (Cont.) • K is a superkey for relation schema R if and only if K R • K is a candidate key for R if and only if • K R, and • for no K, R • Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). We expect this set of functional dependencies to hold: loan-numberamount loan-number branch-name but would not expect the following to hold: loan-number customer-name
Use of Functional Dependencies • We use functional dependencies to: • test relations to see if they are legal under a given set of functional dependencies. • If a relation r is legal under a set F of functional dependencies, we say that rsatisfies F. • specify constraints on the set of legal relations • We say that Fholds onR if all legal relations on R satisfy the set of functional dependencies F. • Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. • For example, a specific instance of Loan-schema may, by chance, satisfy loan-number customer-name.
Functional Dependencies (Cont.) • A functional dependency is trivial if it is satisfied by all instances of a relation • E.g. • customer-name, loan-number customer-name • customer-name customer-name • In general, is trivial if
Closure of a Set of Functional Dependencies • Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F. • E.g. If A B and B C, then we can infer that A C • The set of all functional dependencies logically implied by F is the closure of F. • We denote the closure of F by F+. • We can find all ofF+by applying Armstrong’s Axioms: • if , then (reflexivity) • if , then (augmentation) • if , and , then (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • some members of F+ • A H • AG I • CG HI
To compute the closure of a set of functional dependencies F: F+ = Frepeatfor each functional dependency f in F+ apply reflexivity and augmentation rules on fadd the resulting functional dependencies to F+for each pair of functional dependencies f1and f2 in F+iff1 and f2 can be combined using transitivitythen add the resulting functional dependency to F+until F+ does not change any further NOTE: We will see an alternative procedure for this task later Procedure for Computing F+
Closure of Functional Dependencies (Cont.) • We can further simplify manual computation of F+ by using the following additional rules. • If holds and holds, then holds (union) • If holds, then holds and holds (decomposition) • If holds and holds, then holds (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.
Closure of Attribute Sets • Given a set of attributes a, define the closureof aunderF (denoted by a+) as the set of attributes that are functionally determined by a under F:ais in F+if a+ • Algorithm to compute a+, the closure of a under F result := a;while (changes to result) do for each in F do begin if result then result := result end
Example of Attribute Set Closure • R = (A, B, C, G, H, I) • F = {A BA C CG HCG IB H} • (AG)+ 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGHI (CG H, CG I and CG ABCG) • Is AG a candidate key? • Is AG a super key? • Does AG R? == Is (AG)+ R • Is any subset of AG a superkey? • Does AR? == Is (A)+ R • Does GR? == Is (G)+ R
There are several uses of the attribute closure algorithm: Testing for superkey: To test if is a superkey, we compute +, and check if +contains all attributes of R. Testing functional dependencies To check if a functional dependency holds (or, in other words, is in F+), just check if +. That is, we compute +by using attribute closure, and then check if it contains . Is a simple and cheap test, and very useful Computing closure of F For each R, we find the closure +, and for each S +, we output a functional dependency S. Uses of Attribute Closure
Canonical Cover • Sets of functional dependencies may have redundant dependencies that can be inferred from the others • Eg: A C is redundant in: {A B, B C, A C} • Parts of a functional dependency may be redundant • E.g. on RHS: {A B, B C, A CD} can be simplified to {A B, B C, A D} • E.g. on LHS: {A B, B C, AC D} can be simplified to {A B, B C, A D} • Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies
Extraneous Attributes • Consider a set F of functional dependencies and the functional dependency in F. • Attribute A is extraneous in if A and F logically implies (F – {}) {( – A) }. • Attribute A is extraneous in if A and the set of functional dependencies (F – {}) {(– A)} logically implies F. • Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one • Example: Given F = {AC, ABC } • B is extraneous in AB C because {AC, AB C} logically implies AC (I.e. the result of dropping B from AB C). • Example: Given F = {AC, ABCD} • C is extraneous in ABCD since AB C can be inferred even after deleting C
Testing if an Attribute is Extraneous • Consider a set F of functional dependencies and the functional dependency in F. • To test if attribute A is extraneousin • compute ({} – A)+ using the dependencies in F • check that ({} – A)+ contains A; if it does, A is extraneous • To test if attribute A is extraneous in • compute + using only the dependencies in F’ = (F – {}) {(– A)}, • check that + contains A; if it does, A is extraneous
Canonical Cover • A canonical coverfor F is a set of dependencies Fc such that • F logically implies all dependencies in Fc, and • Fclogically implies all dependencies in F, and • No functional dependency in Fccontains an extraneous attribute, and • Each left side of functional dependency in Fcis unique. • To compute a canonical cover for F:repeatUse the union rule to replace any dependencies in F11 and 11 with 112 Find a functional dependency with an extraneous attribute either in or in If an extraneous attribute is found, delete it from until F does not change • Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied
Example of Computing a Canonical Cover • R = (A, B, C)F = {A BC, B C, A B, ABC} • Combine A BC and A B into A BC • Set is now {A BC, B C, ABC} • A is extraneous in ABC • Check if the result of deleting A from ABC is implied by the other dependencies • Yes: in fact, BC is already present! • Set is now {A BC, B C} • C is extraneous in ABC • Check if A C is logically implied by A B and the other dependencies • Yes: using transitivity on A B and B C. • Can use attribute closure of A in more complex cases • The canonical cover is: {A B, B C}
Goals of Normalization • Decide whether a particular relation R is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies
Decomposition • Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) • All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1 R2 • Lossless-join decomposition.For all possible relations r on schema R r = R1 (r) R2 (r) • A decomposition of R into R1 and R2 is lossless join if at least one of the following dependencies is in F+: • R1 R2R1 • R1 R2R2
Example of Lossy-Join Decomposition • Lossy-join decompositions result in information loss. • Example: Decomposition of R = (A, B) R2 = (A) R2 = (B) A B A B 1 2 1 1 2 B(r) A(r) r A B A (r) B (r) 1 2 1 2
Normalization Using Functional Dependencies When we decompose a relation schema R with a set of functional dependencies F into R1, R2,.., Rn we want • Lossless-join decomposition: Otherwise decomposition would result in information loss. • Dependency preservation: Let Fibe the set of dependencies F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, that is, (F1 F2 … Fn)+ = F + • Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive. • No redundancy: For obvious reasons. • Formulated in terms of normal forms
Example • R = (A, B, C)F = {A B, B C) • Can be decomposed in two different ways • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1 R2 = {B} and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1 R2 = {A} and A AB • Not dependency preserving (cannot check B C without computing R1 R2)
To check if a dependency is preserved in a decomposition of R into R1, R2, …, Rn we apply the following simplified test (with attribute closure done w.r.t. F) result = while (changes to result) dofor eachRiin the decompositiont = (result Ri)+ Riresult = result t If result contains all attributes in , then the functional dependency is preserved. We apply the test on all dependencies in F to check if a decomposition is dependency preserving This procedure takes polynomial time, instead of the exponential time required to compute F+and(F1 F2 … Fn)+ Testing for Dependency Preservation