1.39k likes | 1.51k Views
Lecture 03: Normal Forms, Constraints, Views. Wednesday, October 12, 2011. Announcements. HW1: was due Monday HW2: due next Monday. Outline and Reading Material. Schema Normalization: 19.1 – 19.7 Read only about BCNF, skip 3NF Note: you need BCNF for HW2
E N D
Lecture 03:Normal Forms, Constraints, Views Wednesday, October 12, 2011 Dan Suciu -- CSEP544 Fall 2011
Announcements • HW1: was due Monday • HW2: due next Monday Dan Suciu -- CSEP544 Fall 2011
Outline and Reading Material • Schema Normalization: 19.1 – 19.7 • Read only about BCNF, skip 3NF • Note: you need BCNF for HW2 • Constraints and triggers: 3.2, 3.3, 5.8 • Views: 3.6 • Answering queries using views: Sec. 1,2,3 Some material is NOT covered in the book.Read the slides carefully: we will not cover all in class! Dan Suciu -- CSEP544 Fall 2011
Schema Normalization Dan Suciu -- CSEP544 Fall 2011
Normal Forms • 1st Normal Form = all tables are flat • 2nd Normal Form = obsolete • Boyce Codd Normal Form = will study • 3rd Normal Form = see book • 4th Normal Form = golden standard Dan Suciu -- CSEP544 Fall 2011
First Normal Form (1NF) • A database schema is in First Normal Form if all tables are flat Student Student Takes Course May needto add keys
name buys Person Product price name ssn Relational Schema Design Conceptual Model: Relational Model: plus FD’s Normalization: Eliminates anomalies Dan Suciu -- CSEP544 Fall 2011
Data Anomalies When a database is poorly designed we get anomalies: Redundancy: data is repeated Updated anomalies: need to change in several places Delete anomalies: may lose data when we don’t want Dan Suciu -- CSEP544 Fall 2011
Relational Schema Design Recall set attributes (persons with several phones): One person may have multiple phones, but lives in only one city Anomalies: • Redundancy = repeat data • Update anomalies = Fred moves to “Bellevue” • Deletion anomalies = Joe deletes his phone number: what is his city ?
Relation Decomposition Break the relation into two: Anomalies have gone: • No more repeated data • Easy to move Fred to “Bellevue” (how ?) • Easy to delete all Joe’s phone number (how ?)
Relational Schema Design(or Logical Design) Main idea: • Start with some relational schema • Find out its functional dependencies • Use them to design a better relational schema Dan Suciu -- CSEP544 Fall 2011
Functional Dependencies What is a functional dependency? • It is a kind of constraint • In theory one should find the FDs during requirement analysis, then apply rigorously the normalization steps that we discuss next • In practice one rarely follows this strict prescription Dan Suciu -- CSEP544 Fall 2011
Functional Dependencies Notation: A1, A2, …, An B1, B2, …, Bm Meaning: If two tuples agree on the attributes A1, A2, …, An then they must also agree on the attributes B1, B2, …, Bm Dan Suciu -- CSEP544 Fall 2011
When Does an FD Hold Definition: A1, ..., Am B1, ..., Bn holds in R if: t, t’ R, (t.A1=t’.A1 ... t.Am=t’.Am t.B1=t’.B1 ... t.Bn=t’.Bn ) R t if t, t’ agree here then t, t’ agree here t’
Examples An FD holds, or does not hold on an instance: EmpID Name, Phone, Position Position Phone but not Phone Position Dan Suciu -- CSEP544 Fall 2011
Example Position Phone Dan Suciu -- CSEP544 Fall 2011
Example but not Phone Position Dan Suciu -- CSEP544 Fall 2011
Example FD’s are constraints: On some instances they hold On others they don’t namecolor categorydepartment color, categoryprice Does this instance satisfy all the FDs ? Dan Suciu -- CSEP544 Fall 2011
Example namecolor categorydepartment color, categoryprice What about this one ?
An Interesting Observation namecolor categorydepartment color, categoryprice If all these FDs are true: name, categoryprice Then this FD also holds: Why ??
Goal: Find ALL Functional Dependencies • Anomalies occur when certain “bad” FDs hold • We know some of the FDs • Need to find all FDs, then look for the bad ones Dan Suciu -- CSEP544 Fall 2011
Armstrong’s Rules (1/3) A1, A2, …, An B1, B2, …, Bm Splitting rule and Combing rule Is equivalent to A1, A2, …, An B1 A1, A2, …, An B2 . . . . . A1, A2, …, An Bm Dan Suciu -- CSEP544 Fall 2011
Armstrong’s Rules (2/3) Trivial Rule A1, A2, …, An Ai where i = 1, 2, ..., n Why ? Dan Suciu -- CSEP544 Fall 2011
Armstrong’s Rules (3/3) Transitive Closure Rule A1, A2, …, An B1, B2, …, Bm If and B1, B2, …, Bm C1, C2, …, Cp A1, A2, …, An C1, C2, …, Cp then Why ? Dan Suciu -- CSEP544 Fall 2011
Example (continued) 1. namecolor 2. categorydepartment 3. color, categoryprice Start with these: Infer these:
Example (continued) 1. namecolor 2. categorydepartment 3. color, categoryprice Answers: THIS IS TOO HARD ! There is an easier way.
Closure of a set of Attributes Given a set of attributes A1, …, An The closure, {A1, …, An}+ = the set of attributes Bs.t. A1, …, An B namecolor categorydepartment color, categoryprice Example: name+ = {name, color}{name, category}+ = {name, category, color, department, price}color+ = {color}
Closure Algorithm X={A1, …, An}. repeat untilX doesn’t change do: if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X. Example: namecolor categorydepartment color, categoryprice {name, category}+ = ? [ …in class…]
Practice at Home… A, B C A, D E B D A, F B R(A,B,C,D,E,F) Compute {A,B}+ X = {A, B, } Compute {A, F}+ X = {A, F, } Dan Suciu -- CSEP544 Fall 2011
Why Do We Need Closure • With closure we can find all FD’s easily • To check if X ® A • Compute X+ • Check if A Î X+ Dan Suciu -- CSEP544 Fall 2011
Practice at Home Find all FD’s implied by: A, B CA, D B B D
Practice at Home Find all FD’s implied by: A, B CA, D B B D Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?) BCD+ = BCD, ABCD+ = ABCD
Practice at Home Find all FD’s implied by: A, B CA, D B B D Step 1: Compute X+, for every X: A+ = A, B+ = BD, C+ = C, D+ = D AB+ =ABCD, AC+=AC, AD+=ABCD, BC+=BCD, BD+=BD, CD+=CD ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?) BCD+ = BCD, ABCD+ = ABCD Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = : AB CD, ADBC, ABC D, ABD C, ACD B
Keys A superkey is a set of attributes X = A1, ..., Ans.t. for any other attribute B, we have X B Note: X is a superkey if X+ = [all attributes] A key is a minimal superkey X How do we compute all keys X ?
Example Product(name, price, category, color) name, category price category color What are the keys ? Dan Suciu -- CSEP544 Fall 2011
Example Product(name, price, category, color) name, category price category color What are the keys ? (name, category) + = name, category, price, color Hence X = (name, category) is a key
Practice at Home Enrollment(student, address, course, room, time) student address room, time course student, course room, time Find all keys Dan Suciu -- CSEP544 Fall 2011
Read at Home We can have more than one key. Example: relation R(A,B,C) satisfies ABCBCA ABCBAC or Find all keys
Boyce-Codd Normal Form There are no“bad” FDs: A relation R is in BCNF if: Whenever X B is a non-trivial dependency,then X is a superkey. Equivalently: A relation R is in BCNF if: " X, either X+ = X or X+ = [all attributes] Dan Suciu -- CSEP544 Fall 2011
Example SSN Name, City The only key is:{SSN, PhoneNumber}Hence SSN Name, City is a “bad” dependency Alternatively: SSN+ = Name, City and is neither SSN nor all attributes
BCNF Decomposition Algorithm Normalize(R)find X s.t.: X ≠ X+ ≠ [all attributes]if (not found) then“R is in BCNF”let Y = X+ - X; Z = [all attributes] - X+ decompose R into R1(X Y) and R2(X Z)Normalize(R1); Normalize(R2); Y X Z X+
Find X s.t.: X ≠X+ ≠ [all attributes] Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN name, age age hairColor Dan Suciu -- CSEP544 Fall 2011
Find X s.t.: X ≠X+ ≠ [all attributes] Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN name, age age hairColor Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber) name,age,hairColor phoneNumber SSN Dan Suciu -- CSEP544 Fall 2011
Find X s.t.: X ≠X+ ≠ [all attributes] Example BCNF Decomposition Person(name, SSN, age, hairColor, phoneNumber) SSN name, age age hairColor What arethe keys ? Iteration 1: Person: SSN+ = SSN, name, age, hairColor Decompose into: P(SSN, name, age, hairColor)Phone(SSN, phoneNumber) Iteration 2: P: age+ = age, hairColor Decompose: People(SSN, name, age)Hair(age, hairColor)Phone(SSN, phoneNumber) Dan Suciu -- CSEP544 Fall 2011
R(A,B,C,D) A BB C Practice at Home R(A,B,C,D) A+ = ABC ≠ ABCD Dan Suciu -- CSEP544 Fall 2011
R1(A,B,C) B+ = BC ≠ ABC R2(A,D) R12(A,B) R11(B,C) R(A,B,C,D) A BB C Practice at Home R(A,B,C,D) A+ = ABC ≠ ABCD What arethe keys ? What happens if in R we first pick B+ ? Or AB+ ? Dan Suciu -- CSEP544 Fall 2011
Decompositions in General R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp) R1 = projection of R on A1, ..., An, B1, ..., Bm R2 = projection of R on A1, ..., An, C1, ..., Cp Dan Suciu -- CSEP544 Fall 2011
Theory of Decomposition Sometimes it is correct: Lossless decomposition
Incorrect Decomposition Sometimes it is not: What’sincorrect ?? Lossy decomposition