Chapter 7: Relational Database Design

Chapter 7: Relational Database Design • First Normal Form • Pitfalls in Relational Database Design • Functional Dependencies • Decomposition • Boyce-Codd Normal Form • Third Normal Form • (Multivalued Dependencies and Fourth Normal Form) • (Overall Database Design Process)

First Normal Form • Domain is atomic if its elements are considered to be indivisible units • Examples of non-atomic domains: • Set of names, composite attributes • Identification numbers like CS101 that can be broken up into parts • A relational schema R is in first normal form if the domains of all attributes of R are atomic • Non-atomic values complicate storage and encourage redundant (repeated) storage of data • E.g. Instead of a separate relation depositor set of accounts stored with each customer, and set of owners stored with each account • In this chapter, we assume all relations are in first normal form

First Normal Form (Cont.) • Atomicity is actually a property of how the elements of the domain are used. • Strings would normally be considered indivisible • But  Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 • If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. • Doing so is a bad idea: leads to encoding of information in application program rather than in the database.

Pitfalls in Relational Database Design • Relational database design requires that we find a “good” collection of relation schemas. • A bad design may lead to • Repetition of Information. • Inability to represent certain information.

Example • Consider the relation schema:Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) • Redundancy • Data for branch-name, branch-city, assets are repeated for each loan that a branch makes • Wastes space • Complicates updating, introducing possibility of inconsistency of assets value • Null values • Cannot store information about a branch if no loans exist • Can use null values  but, difficult to handle.

Goal — Devise a Theory for the Following • Decide whether a particular relation R is in “good” form. • In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that • each relation is in good form • the decomposition is a lossless-join decomposition • Our theory is based on: • functional dependencies • multivalued dependencies

Functional Dependencies • A key role in differentiating good database designs from bad database designs. • A functional dependency is a generalization of the notion of a key. • FDs are constraints on the set of legal relations. • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.

Functional Dependencies (Cont.) • Let R be a relation schema   R and   R • The functional dependency  holds onR if and only if for any legal relations r(R), whenever any two tuples t1and t2 of r agree on the attributes , they also agree on the attributes . That is, t1[] = t2 []  t1[ ] = t2 [ ] • Example: Consider r(A,B) with the following instance of r. • On this instance, AB does NOT hold, but BA does hold. • 4 • 1 5 • 3 7

Functional Dependencies (Cont.) • K is a superkey for relation schema R if and only if K R • K is a candidate key for R if and only if • K R, and • for no   K,  R • FDs allow us to express constraints that cannot be expressed using superkeys. • Consider the schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). The set of functional dependencies that we expect to hold: loan-numberamount loan-number  branch-name but would not expect the following to hold: loan-number customer-name (e.g. a loan - husband-wife pair)

Use of Functional Dependencies • We shall use FDs in two ways • To test relations to see whether they are legal under a given set of functional dependencies. • If a relation r is legal under a set F of functional dependencies, we say that rsatisfies F. • To specify constraints on the set of legal relations • We say that Fholds onR if all legal relations on R satisfy the set of F. • Note • A specific instance of a relation schema may satisfy a FD even if the FD does not hold on all legal instances. • E.g. a specific instance of Loan-schema may, by chance, satisfy loan-number customer-name.

Functional Dependencies (Cont.) • Some functional dependency are said to be trivial if they are satisfied by all relations. • E.g. • customer-name, loan-number customer-name • customer-name customer-name • In general, a FD   is trivial if   

Closure of a Set of Functional Dependencies • Given a set F of FDs, we can prove that certain other FDs hold. We say that such FDs are “logically implied” by F. • E.g. If A  B and B  C, then we can infer that A  C • The set of all FDs logically implied by F is the closure of F. • We denote the closure of F by F+. • We can find all ofF+by applying Armstrong’s axioms: • If   , then   (reflexivity) • If  , then    (augmentation) • If  , and   , then   (transitivity) • sound and complete • Rules from Armstrong’s axioms: • If   and  , then    (union rule) • If   , then   and   (decomposition rule) • If   and    δ, then    δ (pseudotransitivity rule)

Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H } • Some members of F+ • A H • By transitivity from A B and B H • AG I • Pseudotransitivity rule from A C and CG I • By augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • Union rule from CG H and CG I • Augmentation of CG I to infer CG CGI, augmentation ofCG H to inferCGI HI, and then transitivity

To compute the closure of a set of F: F+ = Frepeatfor each FD f in F+ apply reflexivity and augmentation rules on fadd the resulting FDs to F+for each pair of FDs f1and f2 in F+iff1 and f2 can be combined using transitivitythen add the resulting FD to F+until F+ does not change any further Procedure for Computing F+

Closure of Attribute Sets • Given a set of attributes a, define the closureof aunderF (denoted by a+) as the set of attributes that are functionally determined by a under F: ais in F   a+ • Algorithm to compute a+ result := a;while (changes to result) do for each in F do begin if  result then result := result end

Example of Attribute Set Closure • R = (A, B, C, G, H, I) • F = { A BA CCG HCG IB H } • (AG)+ 1. result = AG 2. result = ABCG (A C and A  B) 3. result = ABCGH (CG H and CG  ABCG) 4. result = ABCGHI (CG I and CG  ABCGH)

Testing for superkey To test if  is a superkey, we compute +, and check if + contains all attributes of R. Testing functional dependencies To check if a FD    holds (in other words, is in F+), just check if   +. That is, we compute + by using attribute closure, and then check if it contains . Computing closure of F For each   R, we find the closure +, and for each S  +, we output a functional dependency   S. Uses of Attribute Closure

Canonical Cover • Sets of FDs may have redundant dependencies that can be inferred from the others • E.g. A  C is redundant in {A  B, B  C, A  C} • Parts of a functional dependency may be redundant • E.g. {A  B, B  C, A  CD} can be simplified to {A  B, B  C, A  D} • Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, with no redundant dependencies or having redundant parts of dependencies

Extraneous Attributes • Consider a set F and the FD   in F • If A  , to check attribute A is extraneous let = -{A}, and check if    can be inferred from F. (compute + under F, check if + includes all attributes in ) • If A  , to check attribute A is extraneous consider F’ = (F – {})  {(– A)} and check if   A can be inferred from F’. (compute + under F’, check if + includes A) • Example: Given F = {AC, ABC } • B is extraneous in AB C because AC logically implies ABC. • Example: Given F = {AC, ABCD} • C is extraneous in ABCD since ABC can be inferred even after deleting C

Canonical Cover • A canonical coverfor F is a set of dependencies Fc such that • F logically implies all dependencies in Fc, and • Fclogically implies all dependencies in F, and • No functional dependency in Fccontains an extraneous attribute, and • Each left side of functional dependency in Fcis unique. • To compute a canonical cover for F:Fc=F repeatUse the union rule to replace any dependencies in Fc11 and 12 with 112 Find a functional dependency  with an extraneous attribute either in  or in  If an extraneous attribute is found, delete it from until Fc does not change

Example • R = (A, B, C)F = { A BC B C A BABC } • Combine A BC and A B into A BC • Set is now {A BC, B C, ABC} • A is extraneous in ABC because BC logically implies AB C. • Set is now {A BC, B C} • C is extraneous in ABC since A BC is logically implied by A B and B  C. • The canonical cover is: A B B C • Note: canonical cover might not be unique.

Decomposition • Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) • All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1  R2 • Lossless-join decomposition.For all possible relations r on schema R r = R1 (r) R2 (r) • A decomposition of R into R1 and R2 is lossless join if and only if at least one of the following dependencies is in F+: • R1 R2R1 • R1 R2R2

Normalization Using Functional Dependencies • When we decompose a relation schema R with a set of F into R1, R2,.., Rn we want • Lossless-join decomposition: Otherwise decomposition would result in information loss. • No redundancy: The relations Ripreferably should be in either Boyce-Codd Normal Form or Third Normal Form. • Dependency preservation: Let Fibe the set of dependencies F+ that include only attributes in Ri. • Preferably the decomposition should be dependency preserving, that is, (F1 F2  …  Fn)+ = F+ • Otherwise, checking updates for violation of functional dependencies may require computing joins, which is expensive.

Example • R = (A, B, C)F = {A B, B C) • R1 = (A, B), R2 = (B, C) • Lossless-join decomposition: R1  R2 = {B} and B BC • Dependency preserving • R1 = (A, B), R2 = (A, C) • Lossless-join decomposition: R1  R2 = {A} and A  AB • Not dependency preserving (cannot check B C without computing R1 R2)

Boyce-Codd Normal Form •  is trivial (i.e.,  ) •  is a superkey for R A relation schema R is in BCNF with respect to a set F of FDs if for all functional dependencies in F+ of the form , where  R and  R,at least one of the following holds:

Example • R = (A, B, C)F = {AB B  C}Key = {A} • R is not in BCNF • Decomposition R1 = (A, B), R2 = (B, C) • R1and R2 in BCNF • Lossless-join decomposition • Dependency preserving

Third Normal Form • There are some situations where • BCNF is not dependency preserving, and • Efficient checking for FD violation on updates is important • Solution: Define a weaker normal form, called Third Normal Form. • Allows some redundancy • But, FDs can be checked on individual relations without computing a join. • There is always a lossless-join, dependency-preserving decomposition into 3NF.

3NF (Cont.) • A relation schema R is in third normal form (3NF) if for all:  in F+at least one of the following holds: • is trivial (i.e.,  ) •  is a superkey for R • Each attribute A in  –  is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key) • If a relation is in BCNF it is in 3NF. • Third condition is a minimal relaxation of BCNF to ensure dependency preservation

Example • R = (J, K, L)F = {JK L, L K} • Two candidate keys: JK and JL • R is in 3NF JK L JK is a superkey L K K is contained in a candidate key

Design Goals • Goal for a relational database design is: • BCNF • Lossless join • Dependency preservation • If we cannot achieve this, we accept one of • Lack of dependency preservation • Redundancy due to use of 3NF

Chapter 7: Relational Database Design

Chapter 7: Relational Database Design

Presentation Transcript

DATABASE DESIGN USING THE REA DATA MODEL

Fundamentals of Relational Database Design and Database Planning

Chapter 3: Relational Model

Business Analysis ITEC-630 Fall 2010

Concepts of Database Management Seventh Edition

Designing Databases Using Access

Unit 7 Logical Database Design With Normalization

Chapter 25: Advanced Transaction Processing

Chapter 6 Relational Database Design

CMSC424: Database Design

Chapter 5 Relational Database Management Systems and SQL

Chapter 3: Relational Model

Chapter 7

Chapter 2

Lecture 11 Introduction to Relational Database

Database Systems The Relational Data Model

Chapter 23

Explain the four primary traits that determine the value of information.

Chapter 22: Distributed Databases

Temple University – CIS Dept. CIS616– Principles of Database Systems

Chapter 7: Entity-Relationship Model