550 likes | 629 Views
Lecture #3 Functional Dependencies Normalization Relational Algebra. Thursday, October 12, 2000. Administration. Homework #1 due today. Project descriptions & groups due today. Homework #2 available today. Exam date is looking like December 7 th Complaints?
E N D
Lecture #3Functional DependenciesNormalizationRelational Algebra Thursday, October 12, 2000
Administration • Homework #1 due today. • Project descriptions & groups due today. • Homework #2 available today. • Exam date is looking like December 7th • Complaints? • Projects: tell us if you need to use the lab.
Functional Dependencies Definition: If two tuples agree on the attributes A , A , … A 1 2 n then they must also agree on the attributes B , B , … B 1 2 m Formally: A , A , … A B , B , … B 1 2 m 1 2 n Motivating example for the study of functional dependencies: Name Social Security Number Phone Number
Examples • EmpID Name, Phone, Position • Position Phone • but Phone Position
In General • To check A B, erase all other columns • check if the remaining relation is many-one (called functional in mathematics)
More Examples Product: name price, manufacturer Person: ssn name, age Company: name stock price, president Key of a relation is a set of attributes that: - functionally determines all the attributes of the relation - none of its subsets determines all the attributes. Superkey: a set of attributes that contains a key.
Finding the Keys of a Relation Given a relation constructed from an E/R diagram, what is its key? Rules: 1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set. Person name ssn address
Rules for Binary Relationships name buys Person Product price name ssn Several cases are possible for a binary relationship E1 - E2: 1. Many-many: the key includes the key of E1 together with the key of E2. What happens for: 2. Many-one: 3. One-one:
Keys in Multiway Relationships If there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key. Product Purchase Store Payment Method Person
Rules in FD’s Splitting/Combining Rule: A , A , … A B , B , … B Is equivalent to 1 2 m 1 2 n B A , A , … A 1 1 2 n Splitting rule and Combing rule B A , A , … A 2 1 2 n … B A , A , … A m 1 2 n
Rules in FD’s (continued) Trivial Dependency A , A , … A A Always holds. 1 2 n i Why ?
C , C …, C C , C …, C B , B …, B 1 1 1 2 2 2 p p m Rules in FD’s (continued) Transitive Closure Rule: If A , A , … A 1 2 n and B , B , … B 1 2 m A , A , … A then 1 2 n Why ?
Closure of a set of Attributes Given a set of attributes {A1, …, An} and a set of dependencies S. Problem: find all attributes B such that: any relation which satisfies S also satisfies: A1, …, An B + The closure of {A1, …, An}, denoted {A1, …, An} , is the set of all such attributes B
Closure Algorithm Start with X={A1, …, An}. Repeat until X doesn’t change do: if is in S, and C is not in X then add C to X. C B , B , … B 1 2 n B , B , … B are all in X, and n 1 2
Example A B C A D E B D A F B Closure of {A,B}: X = {A, B, } Closure of {A, F}: X = {A, F, }
Why Is the Algorithm Correct ? • Show the following by induction: • For every B in X: • A1, …, An B • Initially X = {A1, …, An} -- holds • Induction step: B1, …, Bm in X • Implies A1, …, An B1, …, Bm • We also have B1, …, Bm C • By transitivity we have A1, …, An C • This shows that the algorithm is sound; need to show it is complete
Relational Schema Design Main idea: • Start with some relational schema • Find out its FD’s • Use them to design a better relational schema
Relational Schema Design Recall set attributes (persons with several phones): Name SSN Phone Number Fred 123-321-99 (201) 555-1234 Fred 123-321-99 (206) 572-4312 Joe 909-438-44 (908) 464-0028 Joe 909-438-44 (212) 555-4000 Problems: - redundancy - update anomalies - deletion anomalies Note: SSN is NOT a key here
Relation Decomposition Break the relation into two: SSN Name 123-321-99 Fred 909-438-44 Joe SSN Phone Number 123-321-99 (201) 555-1234 123-321-99 (206) 572-4312 909-438-44 (908) 464-0028 909-438-44 (212) 555-4000
Decompositions in General Let R be a relation with attributes A , A , … A 1 2 n Create two relations R1 and R2 with attributes B , B , … B C , C , … C 1 2 m 1 2 l Such that: = B , B , … B C , C , … C A , A , … A 1 2 m 1 2 l 1 2 n And -- R1 is the projection of R on -- R2 is the projection of R on B , B , … B 1 2 m C , C , … C 1 2 l
Incorrect Decomposition • Sometimes it is incorrect: Decompose on : Name, Category and Price, Category
Incorrect Decomposition When we put it back: Cannot recover information
Boyce-Codd Normal Form A simple condition for removing anomalies from relations: A relation R is in BCNF if and only if: Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R. A , A , … A B 1 2 n A , A , … A 1 2 n In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
BCNF Decomposition Find a dependency that violates the BCNF condition: A , A , … A B , B , … B 1 2 m 1 2 n Heuristic: choose B , B , … B “as large as possible” 1 2 m Decompose: Continue until there are no BCNF violations left. Others A’s B’s Find a 2-attribute relation that is not in BCNF. R1 R2
Example Decomposition Person: Name SSN Age EyeColor PhoneNumber Functional dependencies: SSN Name, Age, Eye Color BNCF: Person1(SSN, Name, Age, EyeColor), Person2(SSN, PhoneNumber) What if we also had an attribute Draft-worthy, and the FD: Age Draft-worthy
Other Example • R(A,B,C,D) A B, B C • Key: A, D • Violations of BCNF: A B, A C, A BC • Pick A BC: split into R1(A,BC) R2(A,D) • What happens if we pick A B first ?
Correct Decompositions A decomposition is lossless if we can recover: R(A,B,C) R1(A,B) R2(A,C) R’(A,B,C) = R(A,B,C) R’ is in general larger than R. Must ensure R’ = R
Decomposition Based on BCNF is Necessarily Lossless Attributes A, B, C. FD: A C Relations R1(A,B) R2(A,C) Tuple in R: (a,b,c) Tuples in R1: (a,b), (a,b’) Tuples in R2: (a,c), (a,c’) Tuples in the join of R1 and R2: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’) Can (a,b,c’) be a bogus tuple? What about (a,b’,c’) ?
Example Name SSN Phone Number Fred 123-321-99 (201) 555-1234 Fred 123-321-99 (206) 572-4312 Joe 909-438-44 (908) 464-0028 Joe 909-438-44 (212) 555-4000 What are the dependencies? What are the keys? Is it in BCNF?
And Now? SSN Name 123-321-99 Fred 909-438-44 Joe SSN Phone Number 123-321-99 (201) 555-1234 123-321-99 (206) 572-4312 909-438-44 (908) 464-0028 909-438-44 (212) 555-4000
3NF: A Problem with BCNF Unit Company Product FD’s: Unit -> Company; Company, Product -> Unit So, there is a BCNF violation, and we decompose. Unit Company Unit Company Unit Product No FDs
So What’s the Problem? Unit Company Unit Product Galaga99 UW Galaga99 databases Bingo UW Bingo databases No problem so far. All local FD’s are satisfied. Let’s put all the data back into a single table again: Unit Company Product Galaga99 UW databases Bingo UW databases Violates the dependency: company, product -> unit!
Solution: 3rd Normal Form (3NF) A simple condition for removing anomalies from relations: A relation R is in 3rd normal form if and only if: Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R, or B is part of a key. A , A , … A B 1 2 n A , A , … A 1 2 n What happened to first and second normal forms? Will we have more normal forms?
Multi-valued Dependencies SSN Phone Number Course 123-321-99 (206) 572-4312 CSE-444 123-321-99 (206) 572-4312 CSE-341 123-321-99 (206) 432-8954 CSE-444 123-321-99 (206) 432-8954 CSE-341 The multi-valued dependencies are: SSN Phone Number SSN Course
Definition of Multi-valued Dependecy Given R(A1,…,An,B1,…,Bm,C1,…,Cp) the MVD A1,…,An B1,…,Bm holds if: for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…Cp
Definition of MVDs Continued Equivalently: the decomposition into R1(A1,…,An,B1,…,Bm), R2(A1,…,An,C1,…,Cp) is lossless Note: an MVD A1,…,An B1,…,Bm Implicitly talks about “the other” attributes C1,…Cp
Rules for MVDs If A1,…An B1,…,Bm then A1,…,An B1,…,Bm Other rules in the book
4th Normal Form (4NF) R is in 4NF if whenever: A1,…,An B1,…,Bm is a nontrivial MVD, then A1,…,An is a superkey Same as BCNF with FDs replaced by MVDs
Confused by Normal Forms ? 3NF BCNF 4NF In practice: (1) 3NF is enough, (2) don’t overdo it !
Querying the Database • How do we specify what we want from our database? Find all the employees who earn more than $50,000 and pay taxes in New Jersey. • We design high-level query languages: • SQL (used everywhere) • Datalog (used by database theoreticians, their students, friends and family) • Relational algebra: a basic set of operations on relations that provide the basic principles.
Relational Algebra at a Glance • Operators: relations as input, new relation as output • Five basic RA operators: • Basic Set Operators • union, difference (no intersection, no complement) • Selection: s • Projection: p • Cartesian Product: X • Derived operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • When our relations have attribute names: • Renaming: r
Set Operations • Binary operations • Union: all tuples in R1 or R2 • R1 U R2 • Example: • ActiveEmployees U RetiredEmployees • Difference: all tuples in R1 and not in R2 • R1 – R2 • Example • AllEmployees - RetiredEmployees
Selection • Unary operation: returns a subset of the tuples which satisfy some condition • Notation: s (R) • c is a condition: • =, <, >, and, or, not • Find all employees with salary more than $40,000: • s (Employee) c Salary > 40000
Projection • Unary operation: returns certain columns • Eliminates duplicate tuples ! • Notation: P (R) • Example: project social-security number and names: • P (Employee) A1,…,An SSN, Name
Cartesian Product • Binary Operation • Result is tuples combining any element of R1 with any element of R2, for R1 X R2 • Schema is union of Schema(R1) & Schema(R2) • Notation: R1 x R2 • Example: Employee x Dependents • Very rare in practice; but joins are very common.
Join (Natural) • Most important, expensive and exciting. • Combines two relations, selecting only related tuples • Equivalent to a cross product followed by selection • Resulting schema has all attributes of the two relations, but one copy of join condition attributes