1 / 75

Schema Refinement (end) Relational Algebra

Schema Refinement (end) Relational Algebra. Lecture #8. Normal Forms. First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Boyce Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF) Others. Multi-valued Dependencies.

lucky
Download Presentation

Schema Refinement (end) Relational Algebra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Schema Refinement (end)Relational Algebra Lecture #8

  2. Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Boyce Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF) Others...

  3. Multi-valued Dependencies SSN Phone Number Course 123-321-99 (206) 572-4312 CSE-444 123-321-99 (206) 572-4312 CSE-341 123-321-99 (206) 432-8954 CSE-444 123-321-99 (206) 432-8954 CSE-341 The multi-valued dependencies are: SSN Phone Number SSN Course

  4. Definition of Multi-valued Dependency Given R(A1,…,An,B1,…,Bm,C1,…,Cp) the MVD A1,…,An B1,…,Bm holds if: for any values of A1,…,Anthe “set of values” of B1,…,Bmis “independent” of those of C1,…Cp

  5. Definition of MVDs Continued Equivalently: the decomposition into R1(A1,…,An,B1,…,Bm), R2(A1,…,An,C1,…,Cp) is lossless Note: an MVD A1,…,An B1,…,Bm Implicitly talks about “the other” attributes C1,…Cp

  6. Rules for MVDs If A1,…An B1,…,Bm then A1,…,An B1,…,Bm Other rules in the book

  7. 4th Normal Form (4NF) R is in 4NF if whenever: A1,…,An B1,…,Bm is a nontrivial MVD, then A1,…,An is a superkey Same as BCNF with FDs replaced by MVDs

  8. Multivalued Dependencies (MVDs) • XY means that given X, there is a unique set of possible Y values (which do not depend on other attributes of the relation) • PARENTNAMECHILDNAME • An FD is also a MVD • MVD problems arise if there are two independent 1:N relationships in a relation.

  9. Confused by Normal Forms ? 3NF BCNF 4NF In practice: (1) 3NF is enough, (2) don’t overdo it !

  10. Normal Forms First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Boyce Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF) Others...

  11. Fifth Normal Form • Sometimes a relation cannot be losslessly decomposed into two relations, but can be into three or more. • 5NF captures the idea that a relation scheme must have some particular lossless decomposition ("join dependency"). • Finding actual 5NF cases is difficult.

  12. Normalization Summary • 1NF: usually part of the woodwork • 2NF: usually skipped • 3NF: a biggie • always aim for this • BCNF and 4NF: tradeoffs start here • in re: d-preserving and losslessness • 5NF: You can say you've heard of it...

  13. Caveat • Normalization is not the be-all and end-all of DB design • Example: suppose attributes A and B are always used together, but normalization theory says they should be in different tables. • decomposition might produce unacceptable performance loss (extra disk reads) • Plus -- there are constraints other than FDs and MVDs

  14. Current Trends • Object DBs and Object-Relational DB’s • may permit complex attributes • 1st normal form unnecessary • Data Warehouses • huge historical databases, seldom or never updated after creation • joins expensive or impractical • argues against normalization • Everyday relational DBs • aim for BCNF, settle for 3NF

  15. Relational Algebra

  16. Querying the Database • Goal: specify what we want from our database Find all the employees who earn more than $50,000 and pay taxes in Champaign-Urbana. • Could write in C++/Java, but bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra, Datalog • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles.

  17. Motivation: The Stack • To use the "stack" data structure in my program, I need to know • what a stack looks like • what (useful) operations I can perform on a stack • PUSH and POP • Next, I look for an implementation of stack • browse the Web • find many of them • choose one, say LEDA

  18. Motivation: The Stack (cont.) • LEDA already implement PUSH and POP • It also gives me a simple language L, in which to define a stack and call PUSH and POP • S = init_stack(int); • S.push(3); S.push(5); • int x = S.pop(); • Can also define an expression of operations on stacks • T = init_stack(int); • T.push(S.pop());

  19. Motivation: The Stack (cont.) • To summarize, I know • definition of stack • its operations (PUSH, POP): that is, a stack algebra • an implementation called LEDA, which tells me how to call PUSH and POP in a language L • I can use these implementations to manipulate stacks • LEDA hides the implementation details • LEDA optimizes implementation of PUSH and POP

  20. Now Contrast It with Rel. Databases def of relations relational algebra • To summarize, I know • definition of stack • its operations (PUSH, POP): that is, a stack algebra • an implementation called LEDA, which tells me how to call PUSH and POP in a language L • I can use these implementations to manipulate stacks • LEDA hides the implementation details • LEDA optimizes implementation of PUSH and POP SQL language operation and query optimization

  21. What is an “Algebra” • Mathematical system consisting of: • Operands --- variables or values from which new values can be constructed. • Operators --- symbols denoting procedures that construct new values from given values.

  22. What is Relational Algebra? • An algebra whose operands are relations or variables that represent relations. • Operators are designed to do the most common things that we need to do with relations in a database. • The result is an algebra that can be used as a query language for relations.

  23. Relational Algebra at a Glance • Operators: relations as input, new relation as output • Five basic RA operations: • Basic Set Operations • union, difference (no intersection, no complement) • Selection: s • Projection: p • Cartesian Product: X • When our relations have attribute names: • Renaming: r • Derived operations: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join)

  24. Five Basic RA Operations

  25. Set Operations • Union, difference • Binary operations

  26. Set Operations: Union • Union: all tuples in R1 or R2 • Notation: R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • ActiveEmployees U RetiredEmployees

  27. Set Operations: Difference • Difference: all tuples in R1 and not in R2 • Notation: R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • AllEmployees - RetiredEmployees

  28. Selection • Returns all tuples which satisfy a condition • Notation: sc(R) • c is a condition: =, <, >, and, or, not • Output schema: same as input schema • Find all employees with salary more than $40,000: • sSalary > 40000(Employee)

  29. Find all employees with salary more than $40,000. s Salary > 40000(Employee)

  30. Ullman: Selection • R1 := SELECTC (R2) • C is a condition (as in “if” statements) that refers to attributes of R2. • R1 is all those tuples of R2 that satisfy C.

  31. JoeMenu := SELECTbar=“Joe’s”(Sells): bar beer price Joe’s Bud 2.50 Joe’s Miller 2.75 Example Relation Sells: bar beer price Joe’s Bud 2.50 Joe’s Miller 2.75 Sue’s Bud 2.50 Sue’s Miller 3.00

  32. Projection • Unary operation: returns certain columns • Eliminates duplicate tuples ! • Notation: P A1,…,An(R) • Input schema R(B1,…,Bm) • Condition: {A1, …, An} {B1, …, Bm} • Output schema S(A1,…,An) • Example: project social-security number and names: • PSSN, Name (Employee)

  33. PSSN, Name (Employee)

  34. Projection • R1 := PROJL (R2) • L is a list of attributes from the schema of R2. • R1 is constructed by looking at each tuple of R2, extracting the attributes on list L, in the order specified, and creating from those components a tuple for R1. • Eliminate duplicate tuples, if any.

  35. Prices := PROJbeer,price(Sells): beer price Bud 2.50 Miller 2.75 Miller 3.00 Example Relation Sells: bar beer price Joe’s Bud 2.50 Joe’s Miller 2.75 Sue’s Bud 2.50 Sue’s Miller 3.00

  36. Cartesian Product • Each tuple in R1 with each tuple in R2 • Notation: R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Condition: {A1,…,An} {B1,…Bm} = F • Output schema is S(A1, …, An, B1, …, Bm) • Notation: R1 x R2 • Example: Employee x Dependents • Very rare in practice; but joins are very common

  37. Product • R3 := R1 * R2 • Pair each tuple t1 of R1 with each tuple t2 of R2. • Concatenation t1t2 is a tuple of R3. • Schema of R3 is the attributes of R1 and R2, in order. • But beware attribute A of the same name in R1 and R2: use R1.A and R2.A.

  38. R3( A, R1.B, R2.B, C ) 1 2 5 6 1 2 7 8 1 2 9 10 3 4 5 6 3 4 7 8 3 4 9 10 Example: R3 := R1 * R2 R1( A, B ) 1 2 3 4 R2( B, C ) 5 6 7 8 9 10

  39. Renaming • Does not change the relational instance • Changes the relational schema only • Notation: rB1,…,Bn (R) • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • LastName, SocSocNo (Employee)

  40. Renaming Example Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777

  41. Renaming • The RENAME operator gives a new schema to a relation. • R1 := RENAMER1(A1,…,An)(R2) makes R1 be a relation with attributes A1,…,An and the same tuples as R2. • Simplified notation: R1(A1,…,An) := R2.

  42. R( bar, addr ) Joe’s Maple St. Sue’s River Rd. Example Bars( name, addr ) Joe’s Maple St. Sue’s River Rd. R(bar, addr) := Bars

  43. Derived RA Operations1) Intersection2) Most importantly: Join

  44. Set Operations: Intersection • Difference: all tuples both in R1 and in R2 • Notation: R1 R2 • R1, R2 must have the same schema • R1 R2 has the same schema as R1, R2 • Example • UnionizedEmployees RetiredEmployees • Intersection is derived: • R1 R2 = R1 – (R1 – R2) why ?

  45. Joins • Theta join • Natural join • Equi-join • Semi-join • Inner join • Outer join • etc.

  46. Theta Join • A join that involves a predicate • Notation: R1 q R2 where q is a condition • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • {A1,…An} {B1,…,Bm} = f • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 q R2 = sq (R1 x R2)

  47. Theta-Join • R3 := R1 JOINC R2 • Take the product R1 * R2. • Then apply SELECTC to the result. • As for SELECT, C can be any boolean-valued condition. • Historic versions of this operator allowed only A theta B, where theta was =, <, etc.; hence the name “theta-join.”

  48. BarInfo( bar, beer, price, name, addr ) Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Joe’s Maple St. Sue’s Bud 2.50 Sue’s River Rd. Sue’s Coors 3.00 Sue’s River Rd. Example Sells( bar, beer, price ) Bars( name, addr ) Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Sue’s River Rd. Sue’s Bud 2.50 Sue’s Coors 3.00 BarInfo := Sells JOIN Sells.bar = Bars.name Bars

  49. Natural Join • Notation: R1 R2 • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: • {A1,…,An} {B1,…, Bm} (called the join attributes) • Equivalent to a cross product followed by selection • Example Employee Dependents

More Related