310 likes | 677 Views
Lecture 07: Relational Algebra. Outline. Relational Algebra (Section 6.1). Relational Algebra. Formalism for creating new relations from existing ones Its place in the big picture:. Declarative query language. Algebra. Implementation. Relational algebra. SQL, relational calculus.
E N D
Outline • Relational Algebra (Section 6.1)
Relational Algebra • Formalism for creating new relations from existing ones • Its place in the big picture: Declarativequerylanguage Algebra Implementation Relational algebra SQL,relational calculus
Relational Algebra • Five operators: • Union: • Difference: - • Selection: s • Projection: P • Cartesian Product: • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming: r
1. Union and 2. Difference • R1 R2Example: • ActiveEmployees RetiredEmployees • R1 – R2Example: • AllEmployees − RetiredEmployees
What about Intersection ? • It is a derived operatorR1 R2 = R1 – (R1 – R2) • Also expressed as a join (will see later)Example • UnionizedEmployees RetiredEmployees
3. Selection • Returns all tuples which satisfy a condition • Notation: sc(R) • Examples • sSalary > 40000(Employee) • sname = “Smith”(Employee) • The condition c can be =, <, , >,, <> [in SQL: SELECT * FROM Employee WHERE Salary > 40000]
Find all employees with salary more than $40,000. s Salary > 40000(Employee)
4. Projection • Eliminates columns, then removes duplicates • Notation: P A1,…,An(R) • Example: project to social-security number and names: • PSSN, Name (Employee) • Output schema: Answer(SSN, Name) [In SQL: SELECT DISTINCT SSN, Name FROM Employee]
5. Cartesian Product • Combine each tuple in R1 with each tuple in R2 • Notation: R1 R2 • Example: • Employee Dependents • Very rare in practice; mainly used to express joins [In SQL: SELECT * FROM R1, R2]
Relational Algebra • Five operators: • Union: • Difference: - • Selection: s • Projection: P • Cartesian Product: • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming: r
Renaming • Changes the schema, not the instance • Schema: R(A1, …, An ) • Notation: rB1,…,Bn (R) • Example: • rLastName, SocSocNo (Employee) • Output schema: Answer(LastName, SocSocNo) [in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]
Renaming Example Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777
Natural Join • Notation: R1 ⋈ R2 • Meaning: R1 ⋈ R2 = PA(sC(R1 R2)) • Where: • The selection sC checks equality of all common attributes • The projection eliminates the duplicate common attributes [in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2 WHERE R1.B = R2.B Schema: R1(A,B), R2(B,C)]
Employee Dependents = PName, SSN, Dname(sSSN=SSN2(Employee x rSSN2, Dname(Dependents)) Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe
Natural Join • R= S= • R ⋈ S=
Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ? • Given R(A, B, C), S(D, E), what is R ⋈ S ? • Given R(A, B), S(A, B), what is R ⋈ S ?
Theta Join • A join that involves a predicate • R1 ⋈q R2 = sq (R1 R2) • Here q can be any condition
Eq-join • A theta join where q is an equality R1 ⋈A=B R2 = s A=B (R1 R2) • Example: • Employee ⋈SSN=SSN Dependents • Most useful join in practice(difference to natural join?)
Semijoin • R ⋉ S = PA1,…,An (R ⋈ S) • Where A1, …, An are the attributes in R • Example: • Employee ⋉ Dependents
Semijoins in Distributed Databases • Semijoins are used in distributed databases Dependents Employee network Employee ⋈ssn=ssn (s age>71 (Dependents)) T = PSSNs age>71 (Dependents) R = Employee ⋉ T Answer = R ⋈ Dependents
seller-ssn=ssn pid=pid buyer-ssn=ssn Complex RA Expressions Pname Person Purchase Person Product Pssn Ppid sname=fred sname=gizmo
sname sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors sname rating > 5 bid=100 sid=sid Sailors Reserves Application: Query Rewriting for Optimization The earlier we process selections, less tuples we need to manipulate higher up in the tree (predicate pushdown) Disadvantages?
Algebraic Laws (Examples) • Commutative and Associative Laws • R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T • R S = S R, R (S T) = (R S) T • Laws involving selection • sC AND C’(R) = sC(sC’(R)) = sC(R) ∩ sC’(R) • sC (R S) = sC (R) S • When C involves only attributes of R • Laws involving projections • PM(PN(R)) = PM,N(R)
Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags • {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d} • sC(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets !
Finally: RA has Limitations ! • Cannot compute “transitive closure” • Find all direct and indirect relatives of Fred • Cannot express in RA !!! Need to write C program
Formulating queries in RA • Consider a database for student enrollment for courses, and books used in the courses • STUDENT (SSN, Name, Major, Bdate) • COURSE (Course#, Cname, Dept) • ENROLL (SSN, Course#, Quarter, Grade) • BOOK_ADOPTION (Course#, Quarter, Book_ISBN) • TEXT (Book_ISBN, Book_Title, Publisher, Author)
Formulating queries in RA • Specify the following queries in relational algebra • List the number of courses (Course#) taken by all students named ‘John Smith’ in Winter 1999 (i.e., Quarter = W99) • List any department which has all its adopted books published by ‘BC Publishing’
Formulating Queries in RA • PCourse# (s Quarter=W99 ((s Name= ‘John Smith’ (STUDENT) ⋈ ENROLL)) • OtherDept = PDept ((s Publisher <> ‘PS Publishers’ (BOOK_ADOPTION ⋈ TEXT)) ⋈ COURSE)AllDept = PDept (BOOK_ADOPTION ⋈ COURSE)Answer = AllDept - OtherDept • And how will you express it in SQL? WHY?