270 likes | 273 Views
Explore the problems faced in designing relational databases, such as repetition of information and inability to represent certain information. Find solutions including decomposition, lossless join, and normalization.
E N D
Designing Relational Databases B.Ramamurthy B.ramamurthy
Problems with Simple Relations • Repetition of information • Inability to represent certain information • Loss of information from decomposing relations B.ramamurthy
Example • Consider a lending schema lending-schema (bname, bcity, assets, cname,loanNo, amt) when you want to insert a new loan do you have insert assets too? Ex: insert (Perry, Horseneck, 170000, Adams, l-13, 1500) B.ramamurthy
Example (contd.) How will you represent branch details if there was no loan in the branch. We will need null values. Null values are difficult to handle. B.ramamurthy
Solution : alternative design • Decomposition • Decompose lending-schema into branch-customer-schema(bcs) and customer-loan-schema(cls) • bcs(bname, bcity, assets, cname) • cls(cname, loanNo, amt) • Is this decomposition correct? Lossless? B.ramamurthy
Decomposition • bcs natural-join cls should result in our original lending but does it? • For example, we cannot answer the question “Find all branches that have made loan < 100” on “lending-schema” will result in Mianus and Round Hill. • But the same query on bcs joined with cls results in Mianus, Round Hill and Downtown. • Some information was lost in decomposition that leads us to the incorrect answer. B.ramamurthy
How can we make decomposition lossless? • Impose set of constraints (including functional dependencies). • For example • bname --> {bcity, assets} would have taken care of the problem above. B.ramamurthy
Formally Stating.. • Let C be a set of constraints. • A decomposition {R1, R2, .. Rn} of a relation scheme R is a lossless join for if all relations r on schema R that are legal under C, r = R1 ( r) join R2 ( r ) … join Rn ( r ) B.ramamurthy
Normalization using Functional Dependencies • Using functional dependencies we can define several “normal forms” that represent good database deign. • There are several normal forms : 1NF, 2NF, etc. • We will cover 3NF and Boyce-Codd Normal Form (BCNF) B.ramamurthy
Desirable Properties • Lossless-join decomposition • Dependency preservation • Lack of redundancy B.ramamurthy
BCNF • A relation schema R is in BCNF with respect to a set of functional dependencies if for all functional dependencies in F+ of the form a-->b at least one of the following holds, • a-->b is a trivial functional dependency (that is b is subset of a) • a is superkey for schema R B.ramamurthy
Example customer-scheme(cname,cstreet,ccity) cname-->cstreet,ccity branch-scheme(banme, assets, bcity) bname-->asset,bcity loan-info-schema(bname, cname, loanNo, amt) loanNo-->amt,bname Is not BCNF B.ramamurthy
Solution • Decompose loan-info-scheme into • loan-schema (bname, loanNo, amt) • Borrower-schema (cnmae, loanNo) B.ramamurthy
BCNF Decomposition result = {R} done = false Compute F+ while not done if there is a schema Ri that is not BCNF let a --> b be the non-trivial fn.dep in Ri s.t. a --> Ri is not in F+ and ab =0 result = (result -Ri) (Ri-b) (a,b) else done=true; B.ramamurthy
Lossless Decomposition • If R is split into R1 and R2, for the decomposition to be loss less the at least one of the two should hold: R1 R2 --> R1 R1 R2 --> R2 B.ramamurthy
Dependency Preservation • When a relational schema D defined by functional dependency F is decomposed into Ri {i = 1,2,3 ..n}, each functional dependency should be testable by at least one of Ri. • Formally, let F+ be the closure F and let F’+ be the closure of dependencies covered by Ri. F+ == F+’ for dependency preservation. B.ramamurthy
3NF : Third Normal Form • BCNF is quite stringent that sometimes it may not be possible to cover all the functional dependencies after decomposition. • 3NF solves this problem by relaxing the BCNF requirements as follows: B.ramamurthy
3NF : Definition • A relation schema R is in BCNF with respect to a set of functional dependencies if for all functional dependencies in F+ of the form a-->b at least one of the following holds, • a-->b is a trivial functional dependency (that is b is subset of a) • a is superkey for schema R • Each attribute A in b-->a is contained in a candidate key for R. B.ramamurthy
3NF : Algorithm Let Fc be the canonical cover of F. j = 0; for each dependency a-->b if none of schemes in Ri contains ab then j = j+1; Rj = ab; if none of the schemas contains a candidate key for R then j = j+ 1; Rj = any candidate key for R; return (Rj) B.ramamurthy
Examples : 7.2, 7.3, 7.8 B.ramamurthy
Functional Dependencies (review) • A functional dependency is a relationship between attributes • Examples: • Student ID determines Major: StuID => Major • StuID => (Name, Major) • (StuID, Course) => Grade • Attribute(s) to the left of arrow called determinant(s) B.ramamurthy
Anomalies in a Relation (review) • An anomaly is a weakness in the way a relation is set up • Assume a single table representing student-courses (!) • Consider the following table: • IdNameMajorCourse-IdCourse-Desc 45 Meyer CS CSE421 Operating Systems 56 Beaver CE CSE462 Database Concepts 23 Teller CE CSE506 Architecture • This table has all three anomalies: insert, update, and delete B.ramamurthy
Insert, Update and Delete Anomalies (review) • Insertanomaly is caused when a new course needs to be inserted that is not registered by a student • Updateanomaly caused when Major is updated in one row only for Id 45. • Deleteanomaly caused when Id 56 removed from the table; we lose a CSE course description • Anomalies are avoided by splitting the offending relation into multiple relations (decomposition) B.ramamurthy
Fourth Normal Form (4NF) • A multi-valued dependency exists when • there are at least three attributes A, B, and C in a relation and • for each value of A there is a well-defined set of values for B, and a well-defined set of values for C, • but the set of values of B is independent of set C • A relation is in 4NF if it is already in 3NF and has no multi-valued dependencies • Every possible combination of the two multi-valued attributes have to be stored in the database thus leading to redundancy and consequent anomalies B.ramamurthy
Fourth Normal Form (4NF) - Example • Course-IdInstructor TextbookMGS404 Clay HansenMGS404 Clay KroenkeMGS404 Drake HansenMGS404 Drake Kroenke • By placing the multi-valued attributes in tables by themselves, we can convert the above to 4NF • Change to:COURSE-INST (Course-Id, Instructor)COURSE-TEXT (Course-Id, Textbook) B.ramamurthy
Normal Forms 5NF 4NF BCNF 3NF 2NF 1NF B.ramamurthy
Homework #3 :Due date 4/11 in class : Hardcopy 1. Consider a scheme R (A,B,C,D,E) and the two set of functional dependencies. Are the two sets F1 and F2 equivalent? F1 : A-->B, AB--> C, D--> AC, D --> E F2 : A--> BC , D--> AE 2. Consider the schema R (A,B,C,D,E,F). AB--> C, C --> A, BC -->D, AC D--> B BE-->C, CE-->FA, CF-->BD, D-->EF a. Find the closure. b. Find the set candidate keys. Show all the steps. B.ramamurthy