320 likes | 560 Views
Discussion #23 Relational Algebra. Topics. Algebras Relational Algebra use of standard notation set operators , , renaming selection projection cross product join | | Queries (from English) Query optimization SQL. Relational Algebra. What is an algebra?
E N D
Topics • Algebras • Relational Algebra • use of standard notation • set operators , , • renaming • selection • projection • cross product • join || • Queries (from English) • Query optimization • SQL
Relational Algebra • What is an algebra? • a pair: (set of values, set of operations) • ADT type Class Object e.g. stack: (set of all stacks, {pop, push, top, …}) integer: (set of all integers, {+, -, *, }) • What is relational algebra? • (set of relations, set of relational operators) • {, , , , , , , ||}
Relational Algebra is Closed • Closed: all operations produce values in the value set • (reals, {+, *, }) closed • (reals, {+, *, , }) not closed (divide by 0) • (reals, {+, *, >}) not closed (T/F not in value set) • (computer reals, {+, *, }) not closed (overflow, roundoff) • (relations, relational operators) closed • Implication: we can always nest relational operators; can’t for algebras that are not closed. • e.g. after overflow, can do nothing • e.g. can’t always nest: (2 < 3) + 5 = ?
Set Operations: , , and • Relations are sets; thus set operations should work. • Examples: S = A B 2 2 2 3 4 2 5 5 R = A B 1 2 2 2 2 3 RS = A B 2 2 2 3 RS = A B 1 2 2 2 2 3 4 2 5 5 SR = A B 4 2 5 5 RS = A B 1 2
Set Operations (continued …) • Definition: schema(R) = {A, B} = AB, i.e. the set of attributes • We sometimes write R(AB) to mean the relation R with schema AB. • Definition: union compatible • schema(R) = schema(S) • required precondition for , , • Definitions: • R S = { t | t R t S} • R S = { t | t R t S} • R S = { t | t R t S}
Tuple Restriction: [X] • Restriction is a tuple operator (not a relational operator). • t[X] restricts tuple t to the attributes in X. A B C t = 1 2 3 t[A] = (1) t[AC] = (1,3) t = (1,2,3) t[A] = (1,2,3)[A] = {(A,1), (B, 2), (C,3)}[A] = {(A,1)} = (1)
Renaming: • ABR renames attribute A to be B. • A must be in schema(R) • B must not be in schema(R) • Example: let R =A B 1 2 2 2 2 3 Q =A C 2 2 3 2 RQ = ? Not union compatible • But with : RCBQ= A B 1 2 2 2 2 3 3 2 CBQ =A B 2 2 3 2
Renaming (continued…) • Q = ABR renames attribute A to B; the result is Q. • Precondition: • A schema(R) • B schema(R) • Postcondition: • schema(Q) = (schema(R) {A}) {B} • Q = {t' | t (tR t' = (t – {(A, t[A])}) {(B, t[A])})} Q = ABR = {{(B,1), (C,2)} {(B,2), (C,2)}} R = {{(A,1), (C,2)} {(A,2), (C,2)}}
Selection: • The selection operation selects the tuples that satisfy a condition. R =A B 1 2 2 2 2 3 B=2R =A B 1 2 2 2 A=1R =A B 1 2 A=2B2R =A B 2 2 2 3 PR = { t | t R P(t) } A=3R =A B Note: empty, but still retain the schema Meaning: apply predicate P to tuple t by substituting into P appropriate t values. • Precondition: each attribute mentioned in P must be in schema(R). • Postcondition: PR = { t | t R P(t) } • schema(PR) = schema(R)
Projection: The projection operation restricts tuples in a relation to those designated in the operation. R =A B 1 2 2 2 2 3 AR =A 1 2 BR =B 2 3 ABR =R=A,BR ={A,B}R Q =A B C 1 1 1 2 1 1 3 4 5 BCQ =B C 1 1 4 5 Precondition: X schema(R) Postcondition: XR = { t' | t (t R t' = t[X]) } schema(XR) = X
Cross Product: Standard cartesian product adapted for relational algebra R S =A B C D 1 2 1 1 1 2 2 2 1 2 3 3 2 2 1 1 2 2 2 2 2 2 3 3 S =C D 1 1 2 2 3 3 R =A B 1 2 2 2
Cross Product (continued…) Precondition: schema(R) schema(S) = Postcondition: R S = { t | t' t''(t' R t'' S t = t' t'')} schema(R S) = schema(R) schema(S) R =A B 1 2 = t' 2 2 t' = { (A,1), (B,2) } S =C D 1 1 2 2 3 3 = t'' t'' = { (C,3), (D,3) } t' t'' = { (A,1), (B,2), (C,3), (D,3) }
Cross Product (continued…) What if R and S have the same attribute, e.g. A? S =C A 1 1 = t'' = { (C,1), (A,1) } 2 2 3 3 = t''' = { (C,3), (A,3) } R =A B 1 2 = t' = { (A,1), (B,2) } 2 2 Can’t do cross product Solution: Rename AAS t' t'' = { (A,1), (B,2), (C,1), (A,1) } R AAS =A B C A 1 2 1 1 1 2 2 2 1 2 3 3 2 2 1 1 2 2 2 2 2 2 3 3
1 1 2 2 1 2 A B 1 2 1 2 1 2 2 2 1 2 2 2 2 1 2 2 3 2 Cross Product B' C 1 2 2 1 3 2 (R ) B=B' BB'S R || S = ABC Selection Projection Renaming Natural Join: || R || S =A B C 1 2 1 2 2 1 R =A B 1 2 2 2 S =B C 1 2 2 1 3 2
Join (continued …) • In general, we can equate 0, 1, 2, or more attributes using || . • A join is defined as: schema (R || S) = schema(R) schema(S) R || S = {t | t[schema(R)] R t[schema(S)] S} • There are no preconditions join always works.
Join (continued…) 0 attributes in common (full cross product) R || S =A B C D 1 1 1 1 1 1 1 5 2 3 1 1 2 3 1 5 4 1 1 1 4 1 1 5 R =A B 1 1 2 3 4 1 S =C D 1 1 1 5 1 attribute in common R || S =A B C 1 2 2 2 2 2 2 3 3 R =A B 1 2 2 2 2 3 S =B C 1 1 2 2 3 3 2 attributes in common R || S =A B C D 2 2 4 2 2 2 4 1 R =A B C 1 2 3 2 2 4 2 3 5 S =A B D 1 1 1 2 2 2 2 2 1
Join (continued…) • We can use renaming to control the || R || CAS =A B 1 2 R =A B 1 2 2 2 S =B C 1 2 2 1 3 2 S' = B A 1 2 2 1 3 2 = A B 2 1 1 2 2 3 R || S' =A B 1 2 • BTW, observe equivalence with intersection
Relational Algebra Expressions • Relational operators are closed. Thus we can nest expressions: R =A B 1 2 3 4 S =B C D 2 5 1 2 7 2 3 2 3 4 5 4 DC=5(R || S) =A B C D 1 2 5 1 1 2 7 2 3 4 5 4 = D 1 4 • Unary operators have precedence over binary operators; binary operators are left associative. • We can now do something very useful: ask and answer with relational algebra (almost) any query we can dream up.
Relational Algebra Queries PrerequisiteCourse='EE200'cp =Prerequisite EE005 CS100 • List the prerequisites for EE200. • When does CS101 meet? Day,HourCourse='CS101'cdh =Day Hour M 9AM W 9AM F 9AM • When and where does EE200 meet? Day,Hour,RoomCourse='EE200'(cdh || cr) =Day Hour Room Tu 10AM 25 Ohm Hall W 1PM 25 Ohm Hall Th 10AM 25 Ohm Hall • Our answers are in (cdh || cr). • We select Course to be EE200. • Then, project on Day, Hour, Room.
Queries (continued…) • Where can I find Snoopy at 9 am on Monday? • Can we rewrite the query more optimally? • What rules should we use? • Associativity and commutivity of join • Distributive laws for select and project • What strategy should we use? • Eliminate unnecessary operations • Make joins as small as possible before execution StudentID Name'Snoopy' Address Phone CourseStudentID Grade Course Room* Course Day'M' Hour'9AM' RoomName='Snoopy' Day='M' Hour='9AM' (snap || csg || cr || cdh) =Room Turing Aud.
Query Optimization RoomName='Snoopy' Day='M' Hour='9AM' (snap || csg || cr || cdh) • “Intuitively” we can write as Room(Name='Snoopy'snap || csg || cr || Day='M' Hour='9AM'cdh) • Why does this execute faster? • What laws hold that will let us do this? • R || S = S || R • P1P2E = P1P2E • P(R |×| S) = R || PS (if all the attributes of P are in S) • How do we know they hold?
Proofs for Laws • To prove P1P2E = P1P2E,we need to prove that two sets are equal. We prove A = B by showing AB BA. We show that AB by showing that xA xB. • Thus, we can do two proofs to prove P1P2E = P1P2Eas follows: • t P1P2E premise • t E (P1P2)(t) def.: PR = {t | tR P(t)} • t E P1(t) P2(t) identical substitutions & operations • t E P2(t) P1(t) commutative • t P2E P1(t) def. of • t P1P2E def. of • t P1P2E premise • … just go backwards from 6 to 1 in the proof above
Alternate Proof (Derive the right-hand side from the left-hand side.) Thus, we can prove P1P2E = P1P2Eas follows: P1P2E = {t | t E (P1P2)(t)} def.: PR = {t | tR P(t)} = {t | t E P1(t) P2(t)} identical substitutions & operations = {t | t E P2(t) P1(t)} commutative = {t | t P2E P1(t)} def. of = {t | t P1P2E} def. of = P1P2E def. of a relation
Proofs for Laws (continued …) • To prove P(R || S) = R || PS, where all attributes of P are in S, we again need to prove that two sets are equal. • As before, we can convert the lhs to the rhs. P(R || S) = {t | t P(R || S)} def. of a relation = {t | t R || S P(t)} def.: PR={t | tRP(t)} = {t | t[schema(R)] R t[schema(S)] S P(t)} def.: R||S={t | t[schema(R)]Rt[schema(S)]S} = {t | t[schema(R)] R t[schema(S)] S P(t[schema(S)])} all attributes of P are in S = {t | t[schema(R)] R t[schema(S)] PS} def. of = {t | t R || PS} def. of || = R || PS def. of a relation
SQL Correspondence with Relational Algebra Assume we have relations R(AB) and S(BC). AB = 1 R select A from R where B = 1 B R B S select B from R except select B from S A, R.B, CR.B = S.B (R S) = R || S select A, R.B, C from R, S where R.B = S.B
SQL Correspondence with Relational Algebra Assume we have relations R(AB) and S(BC). AB = 1 R select A from R where B = 1 B R B S select R.B from R where R.B not in (select S.B from S) R || S select * from R natural join S
SQL Queries select PrerequisitePrerequisite from cp EE005 where Course='EE200' CS100 • List the prerequisites for EE200. • When does CS101 meet? select Day, HourDay Hour from cdh M 9AM where Course= 'CS101' W 9AM F 9AM • When and where does EE200 meet? select cdh.Course, Day, Hour, Room Course Day Hour Room from cdh, cr EE200 Tu 10AM 25 Ohm Hall where cdh.Course= 'EE200' EE200 W 1PM 25 Ohm Hall and cdh.Course=cr.Course EE200 Th 10AM 25 Ohm Hall
SQL Queries select PrerequisitePrerequisite from cp EE005 where Course='EE200' CS100 • List the prerequisites for EE200. • When does CS101 meet? select Day, HourDay Hour from cdh M 9AM where Course= 'CS101' W 9AM F 9AM • When and where does EE200 meet? select Course, Day, Hour, RoomCourse Day Hour Room from cdh natural join cr EE200 Tu 10AM 25 Ohm Hall where cdh.Course= 'EE200' EE200 W 1PM 25 Ohm Hall EE200 Th 10AM 25 Ohm Hall
SQL Queries select PrerequisitePrerequisite from cp CS100 EE005 CS100 CS101 CS120 CS101 CS121 CS205 select distinct Prerequisite Prerequisite from cp CS100 CS101 CS120 CS121 CS205 EE005 • List all prerequisite courses.
SQL Queries • Where can I find Snoopy at 9 am on Monday? select RoomRoom from snap, csg, cr, cdh Turing Aud. where Name='Snoopy' and Day='M' and Hour='9AM' and snap.StudentID=csg.StudentID and csg.Course=cr.Course and cr.Course=cdh.Course • List all prereqs of CS750 (including prereqs of prereqs.) • Not possible with standard SQL (unless nesting depth is known) • Is possible with Datalog • Rules: prereqOf(x, y) :- cp(y, x). • prereqOf(x, y) :- prereqOf(x, z), cp(y, z). • Query: prereqOf(x, 'CS750')? • To gain more power and flexibility, we typically embed SQL in a high-level language.
SQL Queries • List all prereqs of CS750 (including prereqs of prereqs.) select cp.Prerequisite from cp where cp.Course = 'CS750' union select cp1.Prerequisite from cp cp1, cp cp2 where cp1.Course = cp2.Prerequisite and cp2.Course = 'CS750' union …