720 likes | 858 Views
Topic 6. Relational Database Design CPS510 Database Systems Abdolreza Abhari School of Computer Science Ryerson University. Topics in this Section. Problems with bad database design Concept of decomposition Functional dependencies Normalization theory Normal forms (BCNF and 3NF)
E N D
Topic 6 Relational Database Design CPS510 Database Systems Abdolreza Abhari School of Computer Science Ryerson University
Topics in this Section • Problems with bad database design • Concept of decomposition • Functional dependencies • Normalization theory • Normal forms (BCNF and 3NF) • Deriving normal forms • Multivalued dependnecies • 4NF
Introduction • When designing a relational database schema • Several schemas are possible • Some are better and convenient • Some are bad and not desirable • Database design process involves selecting an appropriate schema • Uses the concept of data dependency • Simple functional dependency StudentNo StudentName • More complex multivalued dependency • Data dependencies obtain semantic information from the application
Problems with Bad Database Design (cont’d) • Consider the following schema student(StudentNo, StudentName, Degree, GPA, CourseNo, CourseName, Credits) • This is a combined version of student and course relations • What is wrong with this schema? • Redundancy • Since a student can take several courses, • student information (StudentName, Degree, and GPA) is repeated • course information (CourseName and Credits) is repeated as well
Problems with Bad Database Design (cont’d) • Update anomalies • Direct consequence of redundancy • We could update the GPA in one tuple while leaving as is in other tuples for the same student • Insertion anomalies • We cannot insert a new course unless at least one student is taking it • Could use NULL values but introduce problems of their own • In general, use NULL values only when unavoidable • Deletion anomalies • Inverse of the last problem • If we delete all students taking a course, we also lose unintentionally information on the course
Decomposition • What is the solution? • Split the relation into two or more relations • This process is called decomposition • We can use join to get back the original information • The whole theory of relational database design deals with how to do this decomposition so that the resulting schema satisfies desirable properties • Decomposition can also lead to problems if not done properly student(StudentNo, StudentName, Degree, GPA) course(CourseNo, CourseName, Credits, Degree) • This leads to loss of information (spurious tuples) when joined
Decomposition (cont’d) Example Original relation
Decomposition (cont’d) Decomposed relations student course
Decomposition (cont’d) Join of student and course It is wrong see the next slide
Decomposition (cont’d) • The join of student and course generates 3 spurious tuples • A good decomposition should be lossless or non-additive join property • Ensures that no spurious tuples are generated when a natural join is applied to the relations in the decomposition
Functional Dependencies • Functional dependency expresses semantic information on two set of attributes • Notation: A1A2 … An B1B2 … Bm • Read: A1A2 … Anare functionally determines B1B2 … Bm • Read: B1B2 … Bm are functionally dependent on A1A2 … An • Examples: CourseNoCredits StudentNoDegree CourseNoCourseName If CourseName is unique for the application, then CourseNameCourseNo
Functional Dependencies (cont’d) • Functional dependency • X Y holds on R if in any legal relation r R, for all pairs of • tuples t1 and t2 in r such that t1.X = t2.X, it is also the case • that t1.Y = t2.Y
Functional Dependencies (cont’d) • The following FDs appear to hold: • W Y • WX Z • We cannot look at an instance of a relation and deduce what FDs hold • Example: If the relation is empty, all FDs appear to hold • We can look at an instance of a relation and say what FDs do not hold • FDs are assertions about the real world/application • They cannot be proved • We can enforce them in the database by proper design
Functional Dependencies (cont’d) • FDs can capture relationship types one-to-many relationship • One FD captures this Example • A department can have several employees. Each employee may work in only one department • FDs that express this relationship: • emp# dept#
Functional Dependencies (cont’d) • one-to-one relationship • Two FDs capture this Example • A manager can manage only one department • Each department can have only one manager • FDs that express this in manages relationship: • emp# dept# • dept# emp#
Functional Dependencies (cont’d) • many-to-many relationship • No FDs to capture this Example • A supplier can supply parts to several projects • A project can receive parts from several suppliers • It is clear that no functional dependencies hold for this • relationship
Normalization Theory • Normalization theory is built around the concept of normal forms • A relation is said to be in a normal if it satisfies a certain set of specified constraints • Several normal forms have been defined • 1NF, 2NF, 3NF, BCNF • Based on functional dependency concept • 4NF • Based on multivalued dependencies • 5NF, … • In practical terms, focus is on BCNF/3NF designs
Normalization Theory (cont’d) This relation is in1NF A relation is said to be in 1NF if it satisfies the constraint that it contains only atomic (simple, indivisible) values This relation is not in 1NF
Normalization Theory (cont’d) Second Normal Form (2NF) • Some definitions • Full Functional Dependency • A functional dependency X Y is a full dependency if removal of any attribute from X means that the dependency does not hold • Partial dependency • A functional dependency X Y is a partial dependency if there is some attribute A X that can be removed from X and dependency will still hold
Normalization Theory (cont’d) • Examples • For the enrolled relation, {CourseNo, ProfName, StudentNo} Status is a full dependency as removing any of the three attributes will lead to the dependency not holding • For the student relation {StudentNo, Studentname} Degree is a partial dependency because {StudentNo} Degree holds
Normalization Theory (cont’d) • 2NF definition • A relation schema R is in 2NF if it is in 1NF and every non-key attribute is fully functionally dependent on the primary key of R • The relation schema student(StudentNo,StudentName,Degree,GPA) is in 2NF • The relation schema enrolled(StudentNo,ProfName,CourseNo, Degree, Status) is not in 2NF because of {StudentNo} Degree
Normal Forms • Two normal forms • BCNF (Boyce-Codd Normal Form) (will be discussed later) • 3NF (Third Normal Form) • BCNF is stronger than 3NF • Our goal is to design a relational schema that is • In BCNF • Lossless-join type • Dependency preserving (discussed later) • If not possible, we will settle for 3NF (instead of BCNF)
3NF • A relation schema R with FDs F is in 3NF if it is in 2nd normal form and every nonkey attributes is nontransitively dependent on the primary key. • Note that this definition assuming only one candidate key which further become the primary key. Later we will see the situations with we have more than one candidate key. • For now we assume the FDs and primary and candidate keys are all identified based on appropriate information about the meaning of attributes and their relationships. Later on we will use formulas to identify the keys a set of FDs.
3NF • For example: PropertyOwner(Propertyno,Paddress,Rent, Ownerno,Oname) With following FDs {Propertyno} Paddress,Rent,Ownerno,Oname {Ownerno} Oname is not 3NF because all non-primary-key attributes are functionally dependent on primary key, with the exception of Oname, which is transitively dependent on Ownerno (see the second FD)
3NF • Note that if Oname was part of primary key, Propertyowner was 3NF although it had transitive FD. But here since Oname is a nonkey attribute Propertyowner is not 3NF. • To transform Propertyowner relation into 3NF we remove the transitive dependency by creating two new relations as the follows: Propertyforrent(Propertyno,Paddress,Rent,Ownerno) Owner(Ownerno, Oname)
General Definition of 2NF and 3NF The more general definitions for 2NF and 3NF that consider more than one candidate key for a relation are defined as follows: • 2NF: A relation that is in 1NF and every non-candidate key attribute is fully functionally dependent on any candidate key. • 3NF: A relation that is in 1NF and 2NF and in which no non-candidate-key attribute is transitively dependent on any candidate key
BCNF • Boyce/Codd normal form (BCNF) considers the general situations in which relation has candidate keys. • BCNF: A relation is in BCNF if and only if every nontrivial, left irreducible FD has a candidate key as its determinant. • Note that determinant refer to the left side of a FD and in trivial FD, left side is a superset of the right side.
BCNF • Note that the difference between BCNF and 3NF is a functional dependency A B is allowed in3NF if B is primary key attribute and A is not a candidate key, whereas BCNF insists that A must be candidate key. • Therefore, BCNF is stronger than 3NF because every relation in BCNF is also in 3NF. However a relation in 3NF is not necessarily in BCNF.
BCNF • Example of BCNF relations: Client(Client#, cname, address, cphone#) Client# cname, address, cphone# Cphone# cname, address, client# Or Storelocations(Storeid, Storelocations) Storeid Storelocations
More on Functional Dependencies • Functional dependencies • Not sufficient to consider the given set • We need to consider all functional dependencies • Example • X Y and Y Z logically implies X Z • Closure • Let F be the set of functional dependencies • F+ = closure of F • F+ is the set of all FDs • original + logically implied by F
More on Functional Dependencies (cont’d) • Two methods to compute the closure • Method 1 • Uses inference rules • Three basic inference rules (1) Reflexivity rule If X is a set of attributes and Y X, then X Y holds (2) Augmentation rule If X Y holds and W is a set of attributes, then WX WY holds (3) Transitivity rule X Y, Y Z X Z
More on Functional Dependencies (cont’d) • The three rules are sound and complete • Sound • They generate only the correct FDs • Complete • They generate all FDs • These three rules are known as Armstrong’s axioms • More appropriately as Armstrong’s inference rules • However, it is more convenient to consider other rules to simplify computing F+ • Define three more rules
More on Functional Dependencies (cont’d) • Three additional rules (4) Union rule If X Y and X Z hold then X YZ holds (5) Decomposition rule If X YZ holds then X Y and X Z hold (6) Pseudo-Transitivity rule If X Y and WY Z hold then WX Z holds • These three rules can be proved by using Armstrong’s inference rules
More on Functional Dependencies (cont’d) • Method 2: Closure X under F Algorithm to compute X+ X+ := X; repeat old X+ := X+; for each FD Y Z in F do if Y X+ then X+ := X+ Z; until (old X+ = X+);
More on Functional Dependencies (cont’d) Example Given: F = {name street, city, province name, date_donated amount_donated} Find: All attributes that are functionally dependent on name, date_donated Answer: (name, date_donated)+ = {name, date_donated, amount_donated, street, city, province}
Desirable Properties of Decomposition • Two important properties • Lossless-join decomposition • Required for semantic correctness • When we join the decomposed relations, we should get back exact original relation contents • No spurious tuples • Dependency preservation • Required for efficiency • If checking a dependency requires joining two or more relations, it is very inefficient to enforce this FD . • Requires join when inserting a tuple
Finding Keys By FDs • We can define keys in terms of FDs • If R is a relation schema with • Attributes A1, A2, …, An • Functional dependencies F • X is a subset of A1, A2, …, An • X is a key if • X A1, A2, …, An is in F+, and • For no proper subset Y X Y A1, A2, …, An is in F+ • For a given relation schema, there may be several keys • There are called candidate keys • Primary key: Candidate key selected by the designer as the key
Finding Keys By FDs (cont’d) Example • Consider the relation schema R(C, S, P) with FDs {CS P, P C} • This schema has two candidate keys: {CS} and {SP} • {CS} is a candidate key because {CS}+ {C, S, P} • {SP} is a candidate key because {SP}+ {S, P, C}
Deriving 3NF: Bernstein’s Algorithm • Derives 3NF schema that is lossless and dependency preserving • Outline • There are 4 steps Step 1: • Find out facts about the real world • Difficult step but must be done in the design of a database • Probably takes more time than all the other steps put together • Result is a list of attributes and FDs
Bernstein’s Algorithm (cont’d) Step 2: • Reduce the list of functional dependencies • There is a straightforward polynomial algorithm (discussed later) • This step can be done manually for a small list • Algorithm can be programmed for a large list • Result is a minimal list of FDs Step 3: • Find the keys • Difficult step (details given later) • Result is a list of candidate keys
Bernstein’s Algorithm (cont’d) Step 4: • Derive the final schema • Combine FDs with the same left hand side • Make a new table for each FD • Add a key relation if no relation contains a key • Eliminate relations that contained in other relations • Result is 3NF schema that is • Lossless • Dependency preserving • Adding a key relation in this step is necessary to guarantee lossless-join type decomposition
Bernstein’s Algorithm (cont’d) Details of Step 2 Objective: Minimizing the list of FDs • Consists of three sub-steps Sub-step 1 • Rewrite FDs so that right hand side each FD is exactly one attribute • Left hand side may have a set of attributes FD X Y, Z is written as X Y X Z • Essentially applying the decomposition rule
Bernstein’s Algorithm (cont’d) Sub-step 2 • Get rid of redundant FDs • Procedure • For each FD do the following • Consider FD X Y • Take this FD from the list of all FDs • Find X+ in the reduced list • If X+ contains Y, then X Y is redundant • Note: • Here Sub-step 1 is necessary to eliminate some subset of right hand attributes
Bernstein’s Algorithm (cont’d) Example for sub-step 2 pharmacy_account# patient_id patient_id doctor_id pharmacy_account#, drug doctor_id • Suppose we want to see if the last FD is redundant • Reduced list of FDs pharmacy_account# patient_id patient_id doctor_id • Compute {pharmacy_account#, drug}+ = {pharmacy_account#, drug, patient_id, doctor_id} • Since this closure includes doctor_id, the last FD pharmacy_account#, drug doctor_id is redundant
Bernstein’s Algorithm (cont’d) Sub-step 3 • Minimize left hand side • This is a tedious but straightforward process • For each FD, apply the following procedure: • Eliminate an attribute A on the LHS of one of the FDs • Look at the remainder Q of attributes on the LHS for that FD • Find Q+ in the original set of FDs • If Q+ contains the RHS of the FD in question then the attribute A is redundant
Bernstein’s Algorithm (cont’d) Example for sub-step 3 last_name,SIN first_name SIN last_name • Suppose we want to see if last_name in the first FD is redundant A = last_name Q = SIN SIN+ = {SIN,last_name,first_name} • Since this closure contains the RHS (i.e., first_name), last_name is redundant • Minimal dependencies are: SIN first_name SIN last_name
Bernstein’s Algorithm (cont’d) Details of Step 3 Objective: Finding keys • To determine whether or not a given set of attributes X is a key • Find X+ and see that it contains all attributes of the relation • All attributes of the relation are dependent on X • If X has more than one attribute, make sure that no proper subset of X has this property • This can be done by eliminating one attribute at a time • Tedious process • For k attributes, we need to check 2k-1 possibilities
Bernstein’s Algorithm (cont’d) • Two observations to simplify the amount of work • If an attribute is never on the left hand side of a dependency, it is not in any key, unless it is also never on the right hand side • If an attribute is never on the right hand side of any FD, it is in every key Example for Step 3 SIN first_name SIN last_name SIN date_of_birth last_name, first_name SIN • Attributes on RHS but not on LHS date_of_birth (it will not in any key)
Bernstein’s Algorithm (cont’d) • Attributes not on RHS • None for this example (in every key) • Check the subsets of attributes: SIN, last_name, first_name • Only four possibilities SIN is a key last_name not a key first_name not a key first_name, last_name is a key • We have only two candidate keys: {SIN } and {first_name, last_name}