Design of Relational Database Schemas

Design of Relational Database Schemas • The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. • Most common cause: attempts to group into one relation both single valued and multi-valued properties of an object. • Now we will tackle the problem of designing good relational schemas.

title year length filmTyp studioNam starName Star Wars 1977 124 color Fox Carrie Fisher Star Wars 1977 124 color Fox Mark Hamill Star Wars 1977 124 color Fox Harrison Ford Mighty Ducks 1991 104 color Disney Emilio Estevez Wayne’s World 1992 95 color Paramount Dana Carvey Wayne’s World 1992 95 color Paramount Mike Meyers Anomalies • Redundancy. • Information may be repeated unnecessarily in several tuples. • E.g. length and filmType. • Update anomalies. • We may change information in one tuple but leave it unchanged in other tuples. • E.g. we could change the length of Star Wars to 125, in the first tuple, and forget to do the same in the second and third tuple. • Deletion anomalies. • If a set of values becomes empty, we may lose other information as a side effect. • E.g. if we delete Emilio Estevez we will lose all the information about Mighty Ducks.

Decomposing Relations - Example title year starName title year length filmTyp studioNam Star Wars 1977 Carrie Fisher Star Wars 1977 124 color Fox Star Wars 1977 Mark Hamill Mighty Ducks 1991 104 color Disney Star Wars 1977 Harrison Ford Wayne’s World 1992 95 color Paramount Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Wayne’s World 1992 Mike Meyers Movie1relation • No true redundancy! • The update anomaly disappeared. If we change the length of a movie, it is done only once. • The deletion anomaly disappeared. If we delete all the stars from Movie2 we still will have the other info for a movie. Movie2relation

Boyce-Codd Normal Form • The goal of decomposition is to replace a relation by several that do not exhibit anomalies. • There is a simple condition under which the anomalies can be guaranteed not to exist. • This condition is calledBoyce-Codd Normal Form, or BCNF. • A relation is in BCNF if: • Whenever there is a nontrivial dependency A1A2…AnB1B2…Bm for R, it must be the case that {A1 ,A2 ,… , An} is a superkey for R.

Boyce-Codd Normal Form - Example Violating BCNF • Relation Movie in the previous figure is not in BCNF. • Consider the FD: title yearlength filmType studioName • Unfortunately, the left side of the above dependency is not a superkey. • In particular we know that the title and the year does not functionally determine starName. • On the other hand, Movie1 is in BCNF. • The only key is {title, year} and • title year  length filmType studioName holds in the relation

Decomposition into BCNF • The decomposition strategy is: • Find a non-trivial FD A1A2…AnB1B2…Bm that violates BCNF, i.e. A1A2…An is not a superkey. • Decompose the relation schema into two overlapping relation schemas: • One is all the attributes involved in the violating dependency and • the other is the left side and all the other attributes not involved in the dependency. • By repeatedly, choosing suitable decompositions, we can break any relation schema into a collection of smaller schemas in BCNF. • The data in the original relation is represented faithfully by the data in the relations that are the result of the decomposition. • i.e. we can reconstruct the original relation exactly from the decomposed relations.

Boyce-Codd Normal Form - Example Consider relation schema: Movies(title, year, studioName, president, presAddr) and functional dependencies: title year  studioName studioName  president president  presAddr Last two violate BCNF. Why? Compute {title, year}+, {studioName}+, {president}+ and see if you get all the attributes of the relation. If not, you got a BCNF violation, and need to break relation.

Boyce-Codd Normal Form – Example • Let’s decompose • starting with: • studioName  president • Let’s add to the right-hand side any other attributes in the closure of studioName (optional “rule of thumb”). • X={studioName} studioNamepresident • X={studioName, president} presidentpresAddr • X={studioName}+={studioName, president, presAddr}

Boyce-Codd Normal Form – Example From the closure we get: studioNamepresident presAddr We decompose the relation schema into the following two schemas: Movies1(studioName, president, presAddr) Movies2(title, year, studioName) The second schema is in BCNF. What about the first schema? The following dependency violates BCNF. presidentpresAddr Why it’s bad to leave Movies1 table as is? If many studios share the same president than we would have redundancy when repeating the presAddr in all those studios.

Boyce-Codd Normal Form – Example We must decompose Movies1, using the FD: presidentpresAddr The resulting relationschemas, both in BCNF, are: Movies11(title, year, studioName) Movies12(studioName, president) In general, we must keep applying the decomposition rule as many times as needed, until all our relations are in BCNF. So, finally we got Movies11, Movies12, and Movies2.

Finding FDs for the decomposed relations • When we decompose a relation, we need to check that the resulting schemas are in BCNF. • We can’t tell a relation is in BCNF, unless we can determine the FDs that hold for that relation.

Finding FDs for the decomposed relations • Suppose S is one of the resulting relations in a decomposition of R. • For this: • Consider each subset X of attributes of S. • Compute X+using the FD on R. • At the end throw out the attributes of R, which aren’t in S. • Then, for each attribute B such that: • B is an attribute of S, • B is in X+ • we have that the functionaldependency XB holds in S.

Example: Consider R(A, B, C, D, E) decomposed intoS(A, B, C)and another relation. Let FDs of R be: AD, BE, DEC First, consider {A}+={A,D}. Since D is not in the schema of S, we get no dependency here. Similarly, {B}+={B,E} and {C}+={C}, yielding no FDs for S. Now consider pairs. {A,B}+={A, B, C, D, E}. Thus, we deduce ABC for S. Neither of the other pairs give us any FD for S. Of course the set of all three attributes of S, {A, B, C}, cannot yield any nontrivial dependencies for S. Thus, the only dependency we need assert for S is ABC.

A Few Tricks • Never need to compute the closure of the empty set or of the set of all attributes. • If we find X + = all attributes, don’t bother computing the closure of any supersets of X.

Another Example • R(A,B,C) • with FD’s A B and B C. • Project onto S(A,C). • A +=ABC ; yields AB, AC. • We do not need to compute AB + or AC +. • B +=BC ; yields BC. • C +=C ; yields nothing. • BC +=BC ; yields nothing. • Resulting FD’s: AB, AC, and BC. • Projection onto AC: AC. • This is the only FD that involves a subset of {A,C }.

Design of Relational Database Schemas

Design of Relational Database Schemas

Presentation Transcript

Logical Relational Database Design

Relational Database Design

Relational Database Design

Relational Database Design

Relational Database Design

Summary Graphs for Relational Database Schemas

Relational Database Design

Relational Database Design Algorithms

Relational Database Design

Relational Database Design Theory

Relational Database Design

Relational database design

Translating Relational Schemas to XML Schemas

Relational database design

Relational Database Design

Relational database design

Relational Database Design

Relational Database Design

Relational Database Design

Relational Database Design

Relational Database Design

Relational Database Design