340 likes | 580 Views
Relational Algebra - Chapter 6.1-6.5. Relational Algebra. Theoretical basis for SQL (E.F. Codd ) Relational algebra (algebraic notation) and relational calculus (logical notation) C reated to demonstrate the potential for a query language of the relational model
E N D
Relational Algebra • Theoretical basis for SQL (E.F. Codd) • Relational algebra (algebraic notation) and relational calculus (logical notation) • Created to demonstrate the potential for a query language of the relational model • Algebra and calculus are equivalent in expressive power • Can represent complex queries compactly, but too mathematical for the average person
What does it do? • Provide DML and DDL • In relational algebra, a series of operations are combined to form a relational algebra expression (query)
Set theoretic operations • Union, Intersection, Difference - Binary • Applied to 2 sets (relations) - no duplicates in result - mathematical set • Must be same type of tuples - Compatibility • same degree n • dom(Ai) = dom(Bi) • Fig. 6.4 • Resulting relation - same attribute names as first relation • Which operations are: • Commutative ? • R U S = S U R • Associative ? • R U (S U T) = (R U S) U T
Cartesian Product X - Binary • Also binary, but does not require union compatibility • R(A1, A2, … An) X S(B1, B2, …, Bm) • Creates a tuple with the combined attributes of 2 tables • Q(A1, A2, …, An, B1, B2, …, Bm) Fig. 6.5 • Degree of resulting relation? • n+m
Select Operation s • s - unary operation (Where in SQL) • A subset of tuples satisfying a selection condition • Selects rows • Equivalent to select condition in WHERE clause s<selection condition> (<Relation name>) sdno=4(Employee) s salary>30000(Employee)
Select Operation • Select condition is a Boolean expression <attr_name> <comparison op> <constant value> <attr_name> <comparison op> <attr_name> comparison op - =, <, <=, etc. • You can use boolean conditions to connect clauses • Can combine cascade of selects into single select with ANDs s(dno=4 and salary>25000) or (dno=5 and salary>30000)(Employee) Fig. 6.1
Select Operation • Degree of resulting relation? • Selectivity - the fraction of tuples selected • number of tuples total tuples • Is Select is commutative?
Project Operation p or Õ • p - unary operation • Equivalent to the SELECT clause in SQL(Select) • Keeps only certain attributes (columns) from a relation • Selects columns • Form of operation: p<attr_list> (<relation name>) pfname, lname, salary Employee Fig 6.1 • Resulting relation has only those attributes specified • Degree of relation ? • attributes in attr_list
Project Operation • The project operation eliminates duplicate tuples in the resulting relation so that it remains a mathematical set psex, salary Employee Fig 6.1 • If several male employees have salary $30,000 only single tuple <M, 30000> is kept in the resulting relation. • Is the project operation commutative?
Operations • DML Operations: • set theory operations • relational DB operations • Relational DB operations: • Select • Project • Join
Sequences of operations • Several relation algebra operations can be combined to form a relational algebra expression (query). • Retrieve the names and salaries of employees who work in department 5. Q ←pfname, lname, salary (sdno=5 Employee) • Alternately, explicit intermediate relations can be specified for each step: Dept5 ← sdno=5 Employee R ← pfname,lname,salary Dept5
Write the following in Relational Algebra • Select * from Employee • Select bdate from Employee • Select * from Employee where sex='F'
Write the following in Relational Algebra Select ssn from Employee where not exists (select * from Dependent where ssn=essn) (Select ssn from employee) minus (Select essn from dependent)
Renaming • Attributes can optionally be renamed in the resulting relation: Dept5 ¬sdno=5 Employee T(firstname,lastname,salary) ¬pfname,lname,salaryDept5 Fig. 6.2 • Alternative notation in textbook: rR(firstname, lastname, salary) pfname,lname,salaryDept5
The Join operation |X| • Similar to a Cartesian Product followed by a select • Form of operation: R |X|<join condition> S • Result is: Q (A1, A2, …, An, B1, B2, …, Bm) A1, A2 … are the attributes of R B1, B2, .. are the attributes of S • For all tuples that satisfy the join condition • join condition: <cond> and <cond> and … Fig. 6.6 • Resulting number of tuples? • Different types of joins - theta join, natural join, equijoin
Theta Join • R |X|Ai q Bi S • where the join condition is of the form: Ai q Bi q is =, < , £ , etc. • Example: Scholarship(SNameGPA_ReqDesc) Student (Name CWID GPA Major) Select Name, SName From Student, Scholarship Where GPA >= GPA_Req
Natural join • We will use the * notation (some others use |X| without subscript) • Like an equijoin, except attributes for the equijoin in the second relation are deleted from result • (Why have 2 columns with the same value?) Q ¬ R * (<list1>),(<list2>) S Fig 6.7 • Equivalent to equijoin but keep only list1 • If attributes have the same name in both relations, list1 and list2 are not needed. • In the original definition of natural join, the join attributes required to have the same names in both relations.
Equijoin • R |X|Ai=Bi S • requires identical values in every tuple for each pair of join attributes • (one or more equality comparisons) • Join conditions are all of the form Ai = Bi and Aj = Bj … • Retrieve each department’s name and manager’s name. T¬ Department |X|mgrssn=ssnEmployee Result ¬pdname,fname,lname (T)
Order of precedence • Unary: Select and project (highest precedence) • Binary: • Joins, Cartesian product • Intersection • Union, minus • Use lots of parenthesis!
Write the following in Relational Algebra • Select * from Employee, Department where dno=dnumber • List employee SSNs who are female and work for the research department • Select *From Employee, dept_locations Where dno = dnumber and dlocation = 'Houston'
Renaming attributes • A relation can have a set of join attributes with itself • List all employee names and their supervisor names S(soc, first, last)¬pssn,fname,lname Employee Temp ¬ Employee |X|superssn=socS Result ¬pfname,lname,first,last(Temp) • Usually, don't see qualification of attributes in relational algebra
Complete Set of Relational Algebra Operations { s , p , È , - , ×} • All other relational algebra operations can be expressed as a sequence of operations from this set. • Other operations are for convenience. R |X| S = s<cond> (R X S) R Ç S = (RÈ S) - ((R - S) È (S - R))
Do we need anymore relational algebra operations to satisfy queries?
How about? Select COUNT(*) From Project Select pname, COUNT(ssn) From Project, Works_on Where pnumber=pno
Additional relation algebra Operations • Aggregate function - SUM, COUNT, AVG, MIN, MAX [<grouping attribute>] Á <function list> (<relation name>) R ¬Ácount ssn, avg salary(Employee) The following uses the optional grouping attribute R ¬dnoÁcount ssn, avg salary(Employee) Fig. 6.9 • The attributes returned from an aggregate function are the attributes in the function listand any grouping attributes listed
Outer Join • Extension of join and union • In a regular equijoin or natural join, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result. • Some queries require all tuples in R1 (or R2 or both) to appear in the result • When no matching tuples are found, nulls are placed for the missing attributes.
Outer Join • Left outer join: R1 ]X| R2 keeps every tuple in R1 in result. • List all employees and if they are a manager, list dname Temp <- (Employee ]X| ssn=mgrssn Department) R <- pfname, minit, lname, dname (Temp) Fig. 6.11 • Right outer join: R1 |X[ R2 keeps every tuple in R2 in result. • Full outer join: R1 ]X[ R2 keeps every tuple in R1 and R2 in result. • Think about how this is different from R1 X R2.
Division operation • Part of original relational algebra • T(Y) = R(Z) ¸ S(X) • tuple t is in result if t is in R for every tuple in S • More generally, result is a relation T(Y) that includes t if t appears in R with the value of X for every tuple in S. Fig. 6.8 • The attributes Y in table T = attributes of R in Z - attributes S in X, where Y is the set of attributes in R not in S. Result <- R ¸ S
Division operation • For example, Retrieve names of all employees who work on all projects John smith works on. smith <- sfname='John' and lname='Smith'(Employee)smith_pnos <- Works_on |X| ssn=essn smithssn_pnos <- pessn,pnoWorks_onssns <- ssn_pnos¸ smith_pnos results <- pfname, lname (ssns * employee)
Write the following in Relational Algebra • Compute the average number of dependents over employees with dependents • Select * From Employee Where salary > all (Select salary From Employee Where sex = 'F')
Write in Relational Algebra • For each project on which more than two employees work, retrieve the project number, project name, and the number of employees who work on that project. Select pnumber, pname, COUNT(*) From Project, Works_on Where pnumber =pno Group By pnumber, pname Having COUNT(*) > 2
DDL - Also provided • Declare Schema for database • Declare Relation for Schema • Insert <values> into Relation • Delete Relation tuple with specified condition • Modify col. of Relation tuple with specified condition