450 likes | 616 Views
C20.0046: Database Management Systems Lecture #8. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Agenda. Last time: Normalization This time: 4NF Relational Algebra Pep talk OHs today, drop-ins (80809). Normalization Review. Q: What’s required for BCNF?
E N D
C20.0046: Database Management SystemsLecture #8 Matthew P. Johnson Stern School of Business, NYU Spring, 2004 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Agenda • Last time: Normalization • This time: • 4NF • Relational Algebra • Pep talk • OHs today, drop-ins (80809) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Normalization Review • Q: What’s required for BCNF? • Q: What are the two types of violations? • Q: What’s the loophole for 3NF? • Q: How do we fix a non-BCNF relation? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Normalization Review • Q: If AsBs violates BCNF, what do we do? • Q: In this case, could the decomposition be lossy? • Q: How do we combine two relations? • Q: Can BCNF decomp. lose FDs? • Q: Can 3NF decomp. lose FDs? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Name SSN Jobs Streets Citys Michael 123 Mayor 111 East 60th Street New York Michael 123 Mayor 222 Brompton Road London Michael 123 CEO 111 East 60th Street New York Michael 123 CEO 222 Brompton Road London Hilary 456 Senator 333 Some Street Chappaqua Hilary 456 Senator 444 Embassy Row Washington Hilary 456 First Lady 333 Some Street Chappaqua Hilary 456 First Lady 444 Embassy Row Washington Hilary 789 Lawyer 333 Some Street Chappaqua Hilary 789 Lawyer 444 Embassy Row Washington New topic: MVDs (3.7) • Consider this relation • People ~ their jobs ~ their residences • Person-address/city: many-many • Person-job: many-many • Address/city-job: independent M.P. Johnson, DBMS, Stern/NYU, Sp2004
Redundancy in BCNF • Lots of redundancy! • Key? All fields • None determined by others! • Non-trivial FDs? None! • In BCNF? Yes! • Now what? • New concept, leading to another normal form: • Multivalued dependencies M.P. Johnson, DBMS, Stern/NYU, Sp2004
MVD definition • As Bs if, when As are held fixed values in Bs are independent of values in rest • More precisely: if t1 and t3 agree on As, we then can find t2 such that t2, t2, t3 agree on As t2, t1 agree of Bs t2, t3 agree on Cs t1 | | | | t2 | | | | t3 M.P. Johnson, DBMS, Stern/NYU, Sp2004
Name Streets Citys Jobs t1 Hilary 333 Some Street Chappaqua Senator t3 Hilary 444 Embassy Row Washington Lawyer t2 Hilary 333 Some Street Chappaqua Lawyer MVD example • Claim: name streets,cities • If true: can pick arbitrary t1, t3 and find a t2 • We pick: first and last of Hilary’s tuples: • Now: if true, can find another Hilary row with street/address of t1 and job of t3 M.P. Johnson, DBMS, Stern/NYU, Sp2004
MVD example • Now: if true, can find another Hilary row with street/address of t1 and job of t3 • Sure enough: t2 t2 M.P. Johnson, DBMS, Stern/NYU, Sp2004
MVD rules • No splitting rule: • In the example, name streets,cities • Do we have name streets? • No: 444 Embassy Row doesn’t go with Chappaqua • NB: City doesn’t determine street – could have >1 house • But city, street aren’t independent t1 t3 M.P. Johnson, DBMS, Stern/NYU, Sp2004
MVD rules • Trivial dependencies: • As Bs iff As BsAi • Transitive rule: • As Bs, Bs Cs As Cs • Complementation rule: • As Bs As rest • Intuition: if each value in Bs is assoc’ed w/each value in rest, then each value of rest is assoc’ed w/each value in Bs M.P. Johnson, DBMS, Stern/NYU, Sp2004
MVDs and FDs • MVD is a generalization of FD • Every FD is an MVD • Pf: Suppose As Bs Pick t1, t3 that agree on As. Must find a t2. Let t2 be t3. Then 1) t2 agrees on As with both 2) t2 agrees on Bs with t1 (why?) 3) t2 agrees on rest with t3 (why?) QED M.P. Johnson, DBMS, Stern/NYU, Sp2004
Fourth Normal Form • 4NF: like BCNF, but with MVDs not FDs • An MVD As Bs is nontrivial if • No Bs are As • Some attributes left over (why?) • 4NF: for every nontrivial MVD As Bs, As is a superkey • In example name streets,cities, but name isn’t a superkey M.P. Johnson, DBMS, Stern/NYU, Sp2004
Decomposition to 4NF • Again, analogous to BCNF • If we can find As Bs for R where As isn’t a superkey, replace R with R1(As,Bs) and R2(As,rest) • Running example: name streets,cities • People(name,streets,cities,jobs) becomes Residences(name,street,city) and Employment(name,job) M.P. Johnson, DBMS, Stern/NYU, Sp2004
4NF: another construal • In nontrivial As Bs, As must be superkey • After df of 4NF, text says: “That is, … every nontrivial MVD is really a FD with a superkey on the left” (p123). • We know: FDs are* MVDs but not vice versa • So: Why does this follow? Is it true? • Yes. As is a superkey As everything • As Bs the MVD is an FD • Two kinds of MVDs: FDs and “true” MVDs • 4NF eliminates exactly the true ones * The typo swapping these was fixed. M.P. Johnson, DBMS, Stern/NYU, Sp2004
Summary of normal forms M.P. Johnson, DBMS, Stern/NYU, Sp2004
Combined isa/weak example • Exercise 3.3.1 • Convert from E/R to R, by E/R, OO and nulls chair name number room givenBy Depts courses isa Lab-courses Computer-allocation M.P. Johnson, DBMS, Stern/NYU, Sp2004
Next topic: relational algebra (5.1-2) • Set operations: union, intersection, difference • Projection, selection • Cartesian Product • Joins: natural joins, theta joins • Combining operations to form queries • Dependent and independent operations M.P. Johnson, DBMS, Stern/NYU, Sp2004
What is relational algebra? • An algebra for relations • “High-school” algebra is an algebra for numbers • Formalism for constructing expressions • Operations • Operands: Variables, Constants, expressions • Expressions: • Vars & constants • Operators applied to expressions M.P. Johnson, DBMS, Stern/NYU, Sp2004
Why do we care about relational algebra? • Why construct expressions on relations? • The exprs are the form questions about the take • The relations these exprs cash out to are the answers to our questions • First proof of RDBMS/RA concept: System R (1979) • Modern implementation of RA: SQL M.P. Johnson, DBMS, Stern/NYU, Sp2004
Relation operators • Five basic operators: • Union: • Intersection: • Difference: - • Selection: s • Projection: P • Cartesian Product: • Derived/auxiliary operators: • Intersection, complement • Joins (natural, equijoin, theta join, semijoin) • Renaming: r M.P. Johnson, DBMS, Stern/NYU, Sp2004
Operators • Relations are sets have set-theoretic ops • Venn diagrams • Union: R1 R2 • Example: • ActiveEmployees RetiredEmployees • Difference: R1 – R2 • Example: • AllEmployees – RetiredEmployees = ActiveEmployees M.P. Johnson, DBMS, Stern/NYU, Sp2004
Set operations - example R: S: R S: M.P. Johnson, DBMS, Stern/NYU, Sp2004
Set operations - example R: S: R - S: M.P. Johnson, DBMS, Stern/NYU, Sp2004
Operators • Intersection: R1 R2 • Example: • UnionizedEmployees RetiredEmployees • Intersection can be derived from and – • R1 R2 = R1 – (R1 – R2) • R1 R2 = -(-R1 -R2) (allowed?) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Set operations - example R: S: R S: M.P. Johnson, DBMS, Stern/NYU, Sp2004
Operators • Selection • Selects all tuples satisfying a condition • Notation: sc(R) • Examples • ssalary > 100000(Employee) • sname = “Smith”(Employee) • The condition c can have • comparison ops:=, <, , >,, <> • boolean ops: and, or M.P. Johnson, DBMS, Stern/NYU, Sp2004
Theater N’hood Title Angelica Village City of God Angelica Village Fog of War Theater N’hood Title Film Forum Village City of God Angelica Village City of God Angelica Village Fog of War Selection example • Select the movies at Angelica: • sTheater=“Angelica”(Showings) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Operators • Projection: op we used for decomposition • Eliminates columns, then removes duplicates • Notation: PA1,…,An(R) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Operators • Cartesian Product • Cross product • Each tuple in R1 combines w/each tuple in R2 • Notation: R1 R2 • If R1, R2 fields overlap, include both and disambiguate: R1.A, R2.A • Fairly rare in practice • used to express joins • Q: Where does the name come from? • Q: If R1 has n1 rows and R2 has n2, how large is R1 x R2? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Cartesian product example Hillary-addresses Hillary-jobs Hillary-addresses x Hillary-jobs M.P. Johnson, DBMS, Stern/NYU, Sp2004
Operators • Natural join: our join up to now • But always merging shared attributes • Notation: R1 ⋈ R2 • Meaning: R1⋈ R2 = Pevery att once(sshared atts =(R1 R2)) • I.e., first compute the cross product R1 x R2 • Next, select the rows in which shared fields agree • Finally, project onto the union of R1 and R2’s fields (remove duplicates) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Natural join example Addresses Jobs Addresses ⋈ Jobs M.P. Johnson, DBMS, Stern/NYU, Sp2004
Natural Join • R S • R ⋈ S= ? • Unpaired tuples called dangling M.P. Johnson, DBMS, Stern/NYU, Sp2004
Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? • Given R(A, B, C), S(D, E), what is R ⋈ S? • Given R(A, B), S(A, B), what is R ⋈ S? M.P. Johnson, DBMS, Stern/NYU, Sp2004
Theta Join • Like natural join, but • includes only rows that satisfy arbitrary condition • Does not project away shared attributes • R1⋈q R2 = sq(R1 R2) • Here q can be any condition • If condition is always satisfies, then theta join becomes natural join M.P. Johnson, DBMS, Stern/NYU, Sp2004
Theta-join example U V U V A<D M.P. Johnson, DBMS, Stern/NYU, Sp2004
Equijoin • A theta join where q is an equality • R1 ⋈A=B R2 = sA=B(R1 R2) • s = lower-case sigma • Example: • Employee ⋈SSN=SSN Dependents • Most useful join in practice M.P. Johnson, DBMS, Stern/NYU, Sp2004
Semijoin • R ⋉ S = P{atts of R}(R ⋈ S) • Q: What does this mean? • Natural join of R and S; • Then project onto R’s atts • A: The rows of R for which >1 row in S agree on shared atts M.P. Johnson, DBMS, Stern/NYU, Sp2004
Semijoin example Dependents Employee network Employee ⋉ Dependents = {employees who have dependents} M.P. Johnson, DBMS, Stern/NYU, Sp2004
Renaming • Changes the schema, not the instance • Notation: rB1,…,Bn(R) • r is spelled “rho”, pronounced “row” • Example: • Employee(ssn,name) • rE2(social, name)(Employee) • Or just: rE(Employee) M.P. Johnson, DBMS, Stern/NYU, Sp2004
Complex RA Expressions • Q: How long was Star Wars (1977)? • Strategy: find the row with Star Wars; then project the length field M.P. Johnson, DBMS, Stern/NYU, Sp2004
Combining operations • Schema: Movies (Title, year, length, filmType, studioName) • Query: select titles and years of movies by Fox that are at least 100 minutes long. M.P. Johnson, DBMS, Stern/NYU, Sp2004
Complex RA Expressions • Reps(ssn, name, etc.) • Clients(ssn, name, rssn) • Q: Find George’s client names • PClients.name(sReps.name=George(sReps.ssn=rssn( Reps x Clients))) • Or: PClients.name(sReps.name=George and Reps.ssn=rssn(Reps x Clients)) • Or: PClients.name(sReps.name=George(Reps x Clients) sReps.ssn=rssn(Reps x Clients)) M.P. Johnson, DBMS, Stern/NYU, Sp2004
For next time • Finish chapter 5 • Come to office hours! M.P. Johnson, DBMS, Stern/NYU, Sp2004