640 likes | 815 Views
Database Systems Chapter 6 The Relational Algebra. Relation Schema. A relation schema is used to describe a relation Denoted by S ( A 1 , A 2 , A 3 , …, A n ), where S : Relation schema name A 1 , …, A n : attributes of S
E N D
Relation Schema • A relation schema is used to describe a relation • Denoted by S(A1, A2, A3, …, An), where • S: Relation schema name • A1, …, An: attributes of S • The degree of a relation is the number of attributes in a relation schema
Relation Instance (1) • A relation state (or relation instance) r of the relation schema S(A1, A2, …, An), denoted by r(S), is a set of n-tuples • r(S) = {t1, t2, …, tm) • each n-tuple ti is an ordered list of n values, • ti = <v1, v2, …, vn> where • each value vi, 1 in, is an element of dom(Ai) or a special null value
Basic Relational Algebra Operations • Is a collection of operators that are used to manipulate entire relations. The result of each operation is a new relation which can be further manipulated.
SELECT Operation • --- Select (sigma) • Format: selection-condition(R) • Semantics: • returns all tuples of relation R that satisfy the selection-condition • Select operation is unary. It applies to a single relation • is used to select a subset of the tuples in a relation that satisfy a selection condition.
Formats of Selection Conditions • (a) A op v: A is an attribute, op is an operator (=, , <, , >, ), and v is a constant. • Age 20, Name = `Bill' • (b) A op B: A and B are two attributes in R. • Persons(SSN, Name, Birthplace, Residence) • Birthplace = Residence • (c) Combinations of (a) and (b) connected by and, or or not. • Age 20 and Birthplace = Residence
An Example of SELECT (1) • Example: Find all students who are 20 years old or younger, and whose birthplace is the same as his/her residence. • Age 20 and Birthplace = Residence(Students)
An Example of Select (2) If the current Students is: SSN Name Age GPA Birthplace Residence 123456789 John 20 3.2 Vestal Vestal 234567891 Mary 18 2.9 Binghamton Vestal 345678912 Bill 19 2.7 Endwell Endwell 456789123 Nancy 24 3.6 Binghamton NYC then the result is a new relation: SSN Name Age GPA Birthplace Residence 123456789 John 20 3.2 Vestal Vestal 345678912 Bill 19 2.7 Endwell Endwell
SELECT Operation • Commutativity of select: • condition-1(condition-2(R)) = condition-2(condition-1(R)) = condition-1 and condition-2(R) • city = “Irbid” AND GPA > 65 (STUDENT) or • city = “Irbid” (GPA > 65 (STUDENT)
PROJECT Operation • --- project (pi) • Format: • attribute-list(R), • where attribute-list is a subset of all attributes in R • Semantics: • Returns all tuples of relation R but for each tuple, only values under attribute-list are returned • Project removes duplicate tuples automatically
Project (2) • Selects certain columns from the table and discards other columns • <attribute-list> (<relation-name>) • degree of resulting relation is equal to the number of attributes in the <attribute-list> • The number of tuples of the result of project is less than or equal to the number of tuples in the original relation. (It removes the duplicates)
Project (3) Example: Find the name and GPA of all students. Name,GPA(Students) Students SSNNameAgeGPAName GPA 123456789 John 20 3.2 John 3.2 234567891 Mary 18 2.9 Mary 2.9 345678912 John 19 3.2 Input Relation Output Relation
Project (4) • If attribute-list-1attribute-list-2, then • attribute-list-1(attribute-list-2(R)) = attribute-list-1(R) • The Project operation is notcommutative • Retrieve all student numbers and names who live in Amman. • STNO, ST-Name(City = “Amman”(STUDENT))
Project (5) • In complex queries, it becomes necessary to store intermediate results, therefore we should know how to give names to relations and attributes • Amman-students =city = “Amman” (STUDENT) • Result = STNO, ST-Name (Amman-Students) • or • Result(Number, Name) = STNO, ST-Name (Amman-Students) Renaming of attributes
Select and Project • Example: • Find the name and GPA of all students who are 20 years oldor younger and whose birthplace and residence are the same • Name, GPA(Age20 and Birthplace=Residence(Students))
RENAME Operation • --- rename (rho) • Format: S(R) • Semantics: • Make a copy of relation R and name the copy as S • S(R): • Rename R only • S(B1, B2, …, Bn)(R): • Rename R and its attributes • (B1, B2, …, Bn)(R): • Rename attributes only
Set Theoretic Operations • UNION, INTERSECTION, DIFFERENCE. • binary (applied to two relations at a time) • To apply any of these operators to relations, relations should be union-compatible. • Two relations R(A1, A2, …, An) and S(B1, B2, …, Bm) are said to be union-compatible if: • they have the same degree (n = m) and • dom(Ai) = dom(Bi) for 1 i n. • Both R and S have the same number of attributes and the corresponding attributes have the same domain
Union (1) • --- union • Format: • R1 R2 • Semantics: • Returns all tuples that belong to either R1 or R2. • Formally: • R1 R2 = { t | t R1 or t R2 } • Condition of union: • R1 and R2 must be union compatible. • The union operator removes duplicate tuples automatically.
Union (2) Example: R1 R2 R1 R2 A B C A B C A B C a1 b1 c1 a0 b0 c0 a1 b1 c1 a2 b2 c2a1 b1 c1 a2 b2 c2 a3 b3 c3a2 b2 c2 a3 b3 c3 a4 b4 c4 a0 b0 c0 a4 b4 c4
Set Difference (1) • - set difference • Format: • R1 - R2 • Semantics: • Returns all tuples that belong to R1 but not R2. • Formally: • R1 - R2 = { t | t R1 and t R2 } • Set difference also requires union compatibility between R1 and R2
Set Difference (2) Example: R1 R2 R1 - R2 A B C A B C A B C a1 b1 c1 a0 b0 c0 a3 b3 c3 a2 b2 c2a1 b1 c1 a3 b3 c3a2 b2 c2 a4 b4 c4
INTERSECTION • --- set intersection • Format: • R1 R2 • Semantics: • Returns all tuples that belong to both R1 and R2. • Formally: • R1 R2 = { t | t R1 and t R2 } • Derivation from existing operators: • R1 R2 = R1 - (R1 - R2) = R2 - (R2 - R1)
INTERSECTION • Union and Intersection are commutative operations • R S = S R and • R S = S R • Union and Intersection can be applied to any number of relations and both are associative: • R (S Q) = (S R) Q • R (S Q) = (S R) Q • Difference operator is not commutative: • R - S S - R in general.
Cartesian Product (1) • --- Cartesian product • Format: • R1 R2 • Semantics: • Returns every tuple that can be formed by concatenating a tuple in R1 with a tuple in R2 • Binary operation, but the relations on which it is applied do not have to be union compatible
Cartesian Product (2) Example: R1 A B C R2 B D E a1 b1 c1 b1 d1 e1 a2 b2 c2 b2 d2 e2 a3 b3 c3 R1 R2 AR1.BC R2.BD E a1 b1 c1 b1 d1 e1 a1 b1 c1 b2 d2 e2 a2 b2 c2 b1 d1 e1 a2 b2 c2 b2 d2 e2 a3 b3 c3 b1 d1 e1 a3 b3 c3 b2 d2 e2
Cartesian Product (3) • If R1 and R2 have common attributes, then the full names of these attributes must be used • Example: • Use R.A instead of A • To prevent identical attribute names from occurring in the same relation schema, R R is not allowed. However, R S(R) is allowed • Commutativity: R1 R2 = R2 R1
Cartesian Product (4) • Given R(A1, A2, …, An) and S(B1, B2, …, Bm) • RS = Q(A1, A2, …, An, B1, B2, …, Bm) • degree of Q = n + m • If R1 has N tuples and R2 has M tuples, then • R1 R2 has N*M tuples • Cartesian product is extremely expensive • If R1 and R2 are both large, then each relation may need to be scanned many times to perform the Cartesian product. • Writing out the result can be very expensive due to the large size of the result
Example Retrieve for each female employee a list of names of her dependents FEMALE-EMPS= Sex = “F” (EMPLOYEE) EMP-NAMES = Fname,Lname,SSN(FEMALE-EMPS) EMP-DEPENDENTS = EMP-NAMES DEPENDENT ACTUAL-DEP= SSN = ESSN (EMP-DEPENDENTS) RESULT = Fname,Lname,Dependent-name(ACTUAL-DEP)
Relational Algebra Example (1) Example: Find the names of each employee and his/her manager Employees: SSNNameAgeDept-Name 123456789 John 34 Sales 234567891 Mary 42 Service 345678912 Bill 39 null Departments: Name Location Manager Sales XYZ Bill Inventory YZX Charles Service ZXY Maria
Relational Algebra Example (2) • A relational algebra expression is: Employees.Name, Departments.Manager (Employees.Dept_Name = Departments.Name (Employees Departments)) • A simplified version (don't use full name when you don't have to): Employees.Name, Manager (Dept_Name =Departments.Name(Employees Departments))
Relational Algebra Example (3) • Use assignment operator ( = ) to save the intermediate result into a temporary relation • Example: The following expression Employees.Name, Manager(Dept_Name = Departments.Name (Employees Departments) ) • is equivalent to the following series of expressions: TEMP1 = Employees Departments TEMP2 = Dept_Name=Departments.Name(TEMP1) RESULT = Employees.Name, Manager (TEMP2)
Relational Algebra Example (4) Example: Find the names of all students who have the highest GPA STUDENTS SSN Name GPA 123456789 John 3.8 234567891 Maria 3.2 345678912 Mike 3.0
Relational Algebra Example (5) • Step 1: Find the GPAs that are not the highest • TEMP1 = Students.GPA(Students.GPA <S2.GPA (Students S2(Students))) Students S2(Students) SSN Name GPA S2.SSN S2.Name S2.GPA 123456789 John 3.8 123456789 John 3.8 123456789 John 3.8 234567891 Maria 3.2 123456789 John 3.8 345678912 Mike 3.0 234567891 Maria 3.2 123456789 John 3.8 234567891 Maria 3.2 234567891 Maria 3.2 234567891 Maria 3.2 345678912 Mike 3.0 345678912 Mike 3.0 123456789 John 3.8 345678912 Mike 3.0 234567891 Maria 3.2 345678912 Mike 3.0 345678912 Mike 3.0
Relational Algebra Example (6) • Step 2: Find the highest GPA • TEMP2 = GPA(Students) - TEMP1 • Step 3: Find the names of students who have the highest GPA • RESULT = Name(Students.GPA = EMP2.GPA(StudentsTEMP2))
Join (1) • --- join • Format: • R1 join-condition R2 • Semantics: • Returns all tuples in R1 R2 which satisfy the join condition • Derivation from existing operators: • R1 join-condition R2 = join-condition(R1 R2) • Format of join condition: • R1.A op R2.B • R1.A1 op R2.B1and R1.A2 op R2.B2 . . . • Tuples whose join attributes are NULL do not appear in the result
Join (2) Example: Find the names of all employees and theirdepartment locations Employees: SSNNameAgeDept-Name 123456789 John 34 Sales 234567891 Mary 42 Service 345678912 Bill 39 null Departments: NameLocationManager Sales Binghamton Bill Inventory Endicott Charles Service Vestal Maria
Join (3) Employees.Name, Location(Employees Dept-Name = Departments.Name Departments) Result Name Location John Binghamton Mary Vestal
Join (4) Example: Find the names of all employees who earn more than his/her manager Employees: SSNNameSalary Manager-SSN 123456789 John 34k 234567891 234567891 Bill 40k null 345678912 Mary 38k null 456789123 Mike 41k 345678912 Employees.Name (Employees Employees.Manager-SSN = EMP.SSN and Employees.Salary > EMP.Salary EMP(Employees))
Equijoin • Definition: A join is called an equijoin if only equality operator is used in all join conditions. R1 R2 R1 R1.B = R2.BR2 A B B C A R1.B R2.B C a b b c a b b c d b c d d b b c b c a d b c c d • Most joins in practice are equijoins.
Natural Join (1) • Definition: • A join between R1 and R2 is a natural join if • There is an equality comparison between every pair of identically named attributes from the two relations • Among each pair of identically named attributes from the two relations, only one remains in the result • Natural join is denoted by with no join conditions explicitly specified
Natural Join (2) • Example: • R1(A, B, C) R2(A, C, D) • has attributes (A, B, C, D) in the result • Questions: • How to express natural join in terms of equijoin and other relational operator? • R1 R2 = R1.A, B, R2.C, D (R1R1.A=R2.Aand R1.C= R2.C R2)
A Complete Set of Relational Algebra Operations • The relational algebra is a set of expressions as defined below: • A relation is an expression. • If E1 and E2 are expressions, so are P(E1), A(E1), S(E1), E1E2, E1-E2, E1E2 • That is, any expression that can be formed from base relations and the six relational operators is a relational algebra expression
Aggregate Functions & Grouping • Well know Aggregate Functions: • SUM, AVERAGE, MAXIMUM, MINIMUM, COUNT • Format: (Script F) • <grouping attributes> <function list> (R) • <grouping attributes> is a list of attributes in R • <function list> is a list of (<function> <attribute>) pairs • <function> is one of aggregate functions • <attribute> is an attribute in R
Examples • Retrieve each department number, the number of employees in the department, and their average salary: • DNOCOUNT( SSN), AVERAGE(SALARY) (EMPLOYEE) DNO COUNT_SSN AVERAGE_SALARY 1 1 55000 4 3 31000 5 4 33250
Examples • Retrieve for each department number, the number of employees in the department, and their average salary: R(DNO, NO_OF_EMP, AVERAGE_SAL)(DNO COUNT( SSN), AVERAGE(SALARY) (EMPLOYEE)) DNO NO_OF_EMP AVERAGE_SAL 1 1 55000 4 3 31000 5 4 33250
Examples • If no grouping attributes are specified, the functions are applied to attribute values of all the tuples in the relation • Retrieve the number of employees, and their average salary: • COUNT( SSN), AVERAGE(SALARY) (EMPLOYEE)) COUNT_SSN AVERAGE_SALARY 8 35125
Outerjoin (1) R1 R2 R1 R2 A B C C D E A B C D E a1 b1 c1c1 d1 e1 a1 b1 c1 d1 e1 a4 b3 c2 c6 d3 e2 • The second tuples of R1 and R2 are not present in the result (called dangling tuples) • Applications exist that require to retain dangling tuples
Outerjoin (2) O --- outer join Format: R1 O R2 Semantics: like join except • it retains dangling tuples from both R1 and R2 • it uses null to fill out missing entries R1 O R2 A B C D E a1 b1 c1 d1 e1 a4 b3 c2 null null null null c6 d3 e2
Left Outerjoin and Right Outerjoin • LO --- left outer join • Format: R1 LO R2 • Semantics: like outerjoin but retains only dangling tuples of the relation on the left • RO --- right outer join • Format: R1 RO R2 • Semantics: like outerjoin but retains only dangling tuples of the relation on the right
Left Outerjoin and Right Outerjoin (2) R1 R2 R1 R2 A B C C D E A B C D E a1 b1 c1c1 d1 e1 a1 b1 c1 d1 e1 a4 b3 c2 c6 d3 e2