460 likes | 609 Views
LECTURE 4. RELATIONAL ALGEBRA. Introduction. Relational algebra defines the theoretical way of manipulating table contents through a number of relational operators These relational operators include; SELECT (or RESTRICT) PROJECT JOIN PRODUCT INTERSECT UNION DIFFERENCE DIVIDE.
E N D
LECTURE 4 RELATIONALALGEBRA
Introduction • Relational algebra defines the theoretical way of manipulating table contents through a number of relational operators • These relational operators include; • SELECT (or RESTRICT) • PROJECT • JOIN • PRODUCT • INTERSECT • UNION • DIFFERENCE • DIVIDE
Introduction (cont’d) • The relational operators have the property of closure, i.e., relational algebra operators are used on existing tables to produce new tables • The relational operators are classed as being unary or binary • Unary operators such as SELECT and PROJECT, can be applied to one relation • Binary operators such as JOIN are applied on two relations
SELECTion • SELECT or RESTRICT, can be used to list all of the row values, or return only the row values that match a specified criterion. • The SELECT operator is denoted by and is formally defined as: • (R) or (RELATION) where(R) is the set of specified tuples of the relation R andis the predicate (or criterion) to extract the required tuples. • It is possible to create more complex criteria by using the logical operators AND, OR, and NOT
(OrderLine) OrderLine SELECT (cont’d) (OrderLine)
PROJECTion • This operator returns all values for selected attributes. • The PROJECT operator is denoted by and formally defined as: • or where the projection of the relation R, denoted by is the set of specified attributes a1…..an of the relation R
OrderNum(OrderLine) OrderLine PROJECT (cont’d)
UNION • The UNION set operator combines all tuples from two relations, excluding duplicate tuples. • The relations must have the same attribute characteristics (the columns and domains must be identical) to be used in the UNION. • When two or more tables share the same number of columns, i.e., have the same degree, and when the share the same (or compatible) domains, they are said to be union- compatible
UNION(cont’d) • The UNION operator is denoted by and formally defined as: • The union of relations and denoted by with degree n, is the relation where for each i (i = 1, 2, …n), aiand bi must have compatible domains. • The degree of R3is the same as that of R1and R2. However, the cardinality of R3isa+b, only if a and b are the cardinalities of R1and R2respectively
OrderLine1 OrderLine2 UNION (cont’d) OrderLine1OrderLine2
INTERSECT • The INTERSECT operator denoted as , returns only the tuples that appear in both relations • The tables must be union-compatible to give valid results. • The INTERSECToperator is formally defined as: • The intersect of relations and denoted by with degree n, is the relation that includes only those tuples of R1 that also appear in R2 where for each i (i = 1, 2, …n), aiand bi must have compatible domains.
INTERSECT(cont’d) Part1 Part2 Part1 Part2
DIFFERENCE • The DIFFERENCE operator returns all tuples in one relation that are not found in the other relation. • The DIFFERENCE operator requires that the two relations be union-compatible. • The DIFFERENCEoperator is formally defined as: • The difference of relations and denoted by with degree m, is the relation that includes all the tuples that are in R1 but not in R2 where for each i (i = 1, 2, …m), aiand bi must have compatible domains.
DIFFERENCE (cont’d) Part1 Part2 Part1 Part2
CARTESIAN PRODUCT • The CARTESIAN PRODUCT is usually written as with the new resulting relation containing all the attributes which are present in and along with all possible combinations of tuples from both and • It can be formally defined as: • The CARTESIAN PRODUCTof two relations with cardinality i and with cardinality j is a relation with degree , cardinality and attributes . This can be denoted as • The CARTESIAN PRODUCT is not a very useful operation by itself, as it combines many tuples that have no association with each other. However, when used in conjunction with the RESTRICT (SELECT) operator, it becomes a very important operator known as a JOIN
Part Orders CARTESIAN PRODUCT (cont’d) • Product of Orders and Part
DIVISION • The division operation produces a new relation by selecting the tuples in one relation, , that match every row in another relation, . It is the inverse of the CARTESIAN PRODUCT. • DIVISION, denoted by , can be formally defined as: • The DIVISION of two relations with cardinality i and with cardinality j is a relation with degree and cardinality .
OrderLine Part Division (cont’d) OrderLine Part
JOIN • The JOIN operation is one of the essential operations of relational algebra. • It is said to be the real power behind the relational database, allowing the use of independent tables linked by common attributes • The JOIN of two relations R1and R2 is a restriction on their Cartesian product R1XR2to meet a specified criterion. • The join itself is defined on an attribute a of R1 and b of R2 where the attributes a and b share the same domain.
JOIN(cont’d) • The JOIN operator is formally defined as: • The join of two relations R1(a1, a2,…., an) and R2(b1, b2,…, bm) is a relation R3 with degree and attributes (a1, a2,…, an, b1, b2,…, bm) • Types of join operations • THETA JOIN • EQUIJOIN • NATURAL JOIN • LEFT OUTER JOIN • RIGHT OUTER JOIN
THETA JOIN AND EQUIJOIN • EQUIJOIN is on of the most commonly used joins which links tables on the basis of an equality condition that compares specified columns of each table • The outcome of the equijoin does not eliminate duplicate columns , and the condition or criterion used must be explicitly defined. • The equijoin takes its name from the equality comparison operator (=) used in the condition
THETA JOIN AND EQUIJOIN(cont’d) • If any other comparison operator is used the join is called a THETA JOIN, denoted by θ (θ-join) • Therefore, theta represents a predicate which consists of the following comparison operators (<, <=, >=, <>) • EQUIJOIN is a special type of THETA JOIN • Let R1(a1, a2,…., an) and R2(b1, b2,…, bm) be relations which may have different schemas. • Then, θ-join of R1 and R2is denoted as R1θR2 • The equijoin is denoted as R1R1.a = R2.bR2
THETA JOIN AND EQUIJOIN(cont’d) • It is possible to express the θ-join and the equijoin in terms of restriction and Cartesian product operations • Eg; equijoin R1R1.a = R2.bR2 may also be written as R1.a = R2.b(R1 XR2 ) • Looking at the θ-join and the equijoin in this way allows for some rules to be created which helps in the computation of such joins on two relations: • Compute R1 XR2 . This first performs a Cartesian product to form all possible combinations of the rows of R1 and R2 • Restrict the Cartesian product to only those rows where the values in certain columns match
Equijoin Example STUDENT DEPARTMENT
Cartesian product (STUDENT X DEPARTMENT) Equijoin Example (cont’d)
STUDENT_IN_DEPT = STUDENT.DEPT_CODE = DEPARTMENT.DEPT_CODE(STUDENTXDEPARTMENT) Equijoin Example (cont’d)
NATURAL JOIN • The natural join operation is the most common variant of the joins and requires that the two operant relations must have at least one common attribute, i.e., attributes that share the same domain. The common column(s) is (are) referred to as the join column(s) • The natural join is in fact an equijoin, however, in addition, the duplicate attributes are dropped with the resulting relation containing one less column than that of the equijoin
NATURAL JOIN(cont’d) • Let R1 be a relation having attributes (a1, a2,…, an, y), R2 be another relation having attributes (b1, b2,…, bmy) where y is a set of common attributes (join column(s)) which share the same domain • The natural join operator is defined as: • The natural join of R1 and R2, denoted R1 X R2, consists of combining the tuples of R1 and R2 to build a new relation R3, such that if ,,, then
NATURAL JOIN(cont’d) • The common set of attributes y appears only once in R3; the notation correspond to the attribute value of a tuple of R1 • The steps required to perform the natural join of two relations are: • Compute R1 X R2. • Select those tuples where . Only the rows are selected where the attribute values in the join column(s) are equal • Perform a PROJECT operation on either to the result in step 2, and call it y in the final relation. This is to ensure that the final relation results in a single copy of each attribute in the joining column, thereby eliminating duplicates
Natural Join Example CUSTOMER AGENT
Cartesian product (CUSTOMERX AGENT) Natural Join Example (cont’d)
CUSTOMERXAGENT Natural Join Example (cont’d)
OUTER JOIN • When using the theta and natural join, it is possible that some tuples in the joined relations do not have identical values for the common attributes. As a result these tuples will be ‘lost’. • If it is required that all tuples from the original tables be shown in the resulting relation, then it is necessary to have a join which keeps all the tuples in R1 which have no corresponding values in R2. The tuples in the R2 will have null values. This type of join is known as the outer join
OUTER JOIN (cont’d) • There are 3 common types of the outer join • Left outer join – keeps data from the left-hand relation • Right outer join – keeps data from the right-hand relation • Full outer join – keeps data from both relations • The steps for determining an outer join are very similar to those for computing a natural join, except that data from the left or right side of the relation, depending on whether one is performing a left or right outer join is included.
OUTER JOIN (cont’d) • The stages in performing a left outer join are: • Compute R1 X R2. • Select those tuples where . Only the rows are selected where the attribute values in the join column(s) are equal • Select those tuples in R1that do not have matching values in R2 , so • Perform a PROJECT operation on either to the result in step 2, and call it y in the final relation. This is to ensure that the final relation results in a single copy of each attribute in the joining column, thereby eliminating duplicates. Finally, project the rest of attributes in R1and R2 , except y, and drop the R1 and R2in the final relation
Left Outer Join Example (cont’d) • A left outer join of CUSTOMER and AGENT, will return all the tuples in the CUSTOMER relation, including those that do not have a matching value in the AGENT relation.
Right Outer Join Example (cont’d) • A right outer join of CUSTOMER and AGENT, will return all the tuples in the AGENT relation, including those that do not have a matching value in the CUSTOMER relation.
CONSTRUCTING QUERIES USING RELATIONAL ALGEBRAIC EXPRESSIONS • The main purpose of relational algebra is to provide a way to create and manipulate relations (tables) in a database. • During the lifetime of a database, users will ask many different kinds of queries. The task of building a query involves breaking the query down into a number of smaller steps, where each step generates a set of intermediate results which are then used in the steps of the query.
CONSTRUCTING QUERIES USING RELATIONAL ALGEBRAIC EXPRESSIONS • The following steps should be followed when building a query using relational algebraic expressions • List all the attributes needed to give the answer • Select all the relations needed based on the list of attributes. • Specify the relational operators and the intermediate results that are needed
Building Queries Example - Car Maintenance Database • In a small database that stores information about the maintenance of cars, each car is required to undergo an inspection each year to test to see if it is roadworthy. After each inspection a maintenance record is created and any repairs that are needed are recorded. A repair can require new parts to be purchased and fitted. If a car needs a repair then the EVALUATION is set to FAIL until all the repairs are completed and then it is set to PASS. The Entity Relationship Diagram (ERD) and tables are shown below.
CAR Car Maintenance Example (cont’d) PART
MAINTENANCE_RECORD Car Maintenance Example (cont’d) REPAIR
Car Maintenance Example (cont’d) • Consider the following query asked by a user: • ‘List all information about cars where the model year is after 2006’ • To answer this query, one must first interpret that ‘List all information about cars’ means list all attributes in the relation CAR. The user only wants to see information on cars where the attribute MODEL_YEAR>2006. using the relational operator SELECT we can write the query as a relational algebraic expression as:
Car Maintenance Assignment • Display all the part names and their prices where the cost of the part is greater than GhȻ20.00 • List the car registration and model details and part numbers for all cars where the model year is 2007, where an inspection was carried out after 01/03/2008, which resulted in a part being required for the repair.