520 likes | 674 Views
Relational Algebra. Basic operators Relational algebra expression tree. What is an “Algebra”. A mathematical model consists of: Operands --- variables or values from which new values can be constructed.
E N D
Relational Algebra Basic operators Relational algebra expression tree
What is an “Algebra” • A mathematical model consists of: • Operands --- variables or values from which new values can be constructed. • Operators --- symbols denoting procedures that construct new values from given values. i.e. the ability to produce new values.
What is Relational Algebra? • Simply speaking, the relational algebra is a collection of operators that • take relations as their operands • and return a relation as their result. (closure properties allows nested expression!)
closure • In mathematics, a set is said to be closed under some operation if the operation on members of the set produces a member of the set. For example, the real numbers are closed under subtraction. • Similarly, a set is said to be closed under a collection of operations if it is closed under each of the operations individually. • In relational algebra, we say the result of any operator is still a relation, thus in a closure, which is the foundation for sub-queries.
What is Relational Algebra? • Its operands are relations or variables that represent relations. • Comprises a basic set of relational model operations (represented by operators) which are designed to do the most common things that we need to do with relations in a database. • Sequences of relational algebra operations form relational algebra expressions whose result is also a relation. • The result is an algebra that can be used as but not limited to a query language (at abstract level) for relations.
Basic idea • There is a core relational algebra that has traditionally been thought of as the relational algebra. • But there are several other operators we shall add to the core in order to support or model better the language SQL --- the principal language used in relational database systems.
Core Relational Algebra • Two groups of operations • Set theoretic operations: • Union, intersection, and difference. • Usual set operations, but require both operands have the same relation schema. • Relational model specific operations: • Selection: picking certain rows. σ • Projection: picking certain columns. π • Products: cartesian product. * (also a kind of set operation, but more related to joins.) • joins: compositions of relations. • Division: / Pronounced Bow tie
Union, Intersection, Set difference operators • These operations are binary: each is applied to two relations. • In order to apply these operators to relations, the relations have to be union compatible. • Two relations R(A1, …, An) and S(B1,…,Bn) are union compatible if they have same degree n and domain(Ai)=domain(Bi), 1<=1<=n.
Union, Intersection, Set difference operators • In this part, we assume R and S are union compatible. • Notation: • Union: R S • Intersection RS • Set difference: R S
Union: R S • The result of R S is a relation that includes all the tuples that are either in R or S or in both R and S. • Duplicates are eliminated.
Union: R S student instructor Student instructor
Intersection RS • The result of RS is a relation that includes all the tuples that are in both R and S.
Intersection RS student instructor Student instructor
Set difference: R S • The result of R-S is a relation that includes all the tuples that are in R but not in S.
Set difference: R S Student - instructor Instructor - student
Remarks • Intersection and Union are commutative operations. • R S = S R • R S = R S • Intersection and Union are associative operations. • R (S T) =( R S) T • R (S T) =( R S) T • Set difference is not a commutative operation. • In general, R-S !=S-R
Select operation (selection) • The SELECT operation is used to select a subset of tuples from a relation that satisfy a selection condition. • Form of the operation: σC (R) • Where R is a relational algebra expression. The simplest expression is the name of a database relation. • The condition C is an arbitrary Boolean expression on the attributes of R.
The boolean expression is formed using clauses (atomic comparisons) of the form A op B or A op c, where A, B are attribute names, c is a constant, and op is one of the comparison operators {=, <,>=,>,>=,!=} • The resulting relation has the same attributes as R. • The resulting relation includes only tuples in R whose attribute values satisfy the condition C.
select • R1 <-σC (R2) • C is a condition (as in “if” statements) that refers to attributes of R2. • R1 is all those tuples of R2 that satisfy C.
AllBush<- σlname=“Bush”(tinyemployee): fname lname salary Joe Bush 275 Bob Bush 300 Example Relation tinyemployee: fname lname salary Joe Reale 250 Joe Bush 275 Sue Gates 250 Bob Bush 300
Example σ (DNO=4 and SALARY>2500) or (DNO=5 and salary>30000) (employee):
Remarks • The degree (i.e. the number of attributes) of the resulting relation is the same as that of the input relation. • The resulting relation is empty if no tuple satisfies the selection condition. • The number of tuples of the resulting relation is less than or equal to that of the input relation. • The select operation is commutative: • σC 2(σC1 (R)) =σC 1(σC2 (R)) • A sequence of select operations can be combined into a single select operation • σC 1(σC2 (R)) =σC 1 and C2 (R)
Project operation (projection) • The project operation selects some of the columns of a table while discarding the other columns. • Form of operation: πL(R) • R is a relational algebra expression, • L is a list of attributes from the attributes of relation R. • The resulting relation has only those attributes of R specified in L and in the same order as they appear in the list. Eliminate duplicate tuples, if any so that is remains a mathematical set (set does not allow duplicated elements! However, we will have more discussion later about this point when we come to SQL’s query mechanism!).
Example Relation tinyemployee: fname lname salary Joe Reale 250 Joe Bush 275 Sue Gates 250 Bob Bush 300 Lastnameandsalary<- πlname, salary(tinyemployee) lname salary Reale 250 Bush 275 Gates 250 Bush 300
Be aware that if several employees sharing the same last name and salary, only one tuple will be kept in the resulting relation for duplicated tuples.
Remarks • The degree of the resulting relation is the number of attributes in the attribute list. • The number of tuples of the resulting relation is less than or equal to that of the input relation: • If the attribute list L is a superkey of R, the number of tuples in the resulting relation equals to the number of tuples in R. • Commutativity does not hold on projection.
The Cartesian product operation • The Cartesian product (Cross product, Cross join) is a binary operation and the input queries do not have to be union compatible. • Notation: R x S • Given R(A1,…, An) and S(B1,… Bm), the schema of the relation Q resulting by the operation RXS is: Q(A1,…An, B1, B….m). It has n+m attributes. • The instance of Q has one tuple for each combination of tuples – one from R and one from S.
Cartesian Product • R3 <- R1 x R2 • Pair each tuple t1 of R1 with each tuple t2 of R2. • Concatenation t1t2 is a tuple of R3. • Schema of R3 is the attributes of R1 and R2, in order. • But beware attribute A of the same name in R1 and R2: use R1.A and R2.A. Since relational model assume no two attributes with the same name can appear in a relation schema.
R3( A, R1.B, R2.B, C ) 1 2 5 6 1 2 7 8 1 2 9 10 3 4 5 6 3 4 7 8 3 4 9 10 Example: R3 <- R1 X R2 R1( A, B ) 1 2 3 4 R2( B, C ) 5 6 7 8 9 10
From Cartesian to join • The Cartesian product is a meaningless operation on its own. It can combine related tuples from two relations if followed by the appropriate select operation (in which case it is called join operation). • The join operation is used to select and combine related tuples from two relations into single tuples.
Theta-Join • The most general form of the join operation is the Theta Join, which is similar to a Cartesian product followed by a selection. • R3 <- R1 C R2 • Take the product R1 X R2. • Then apply SELECTC to the result. • As for SELECT, C can be any boolean-valued condition. • Historic versions of this operator allowed only A theta B, where theta was =, <, >, etc.; hence the name “theta-join.” • Theta join A join that links tables based on a relationship other than equality between two columns • Theta means unknown (not necessarily equal) as apposed to natural join here.
BarInfo( bar, beer, price, name, addr ) Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Joe’s Maple St. Sue’s Bud 2.50 Sue’s River Rd. Sue’s Coors 3.00 Sue’s River Rd. Example Sells( bar, beer, price ) Bars( name, addr ) Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Sue’s River Rd. Sue’s Bud 2.50 Sue’s Coors 3.00 BarInfo <- Sells JOIN Sells.bar = Bars.name Bars
Natural Join • A frequent type of join connects two relations by: • Equating attributes of the same name, and • Projecting out one copy of each pair of equated attributes. • Called natural join. • Denoted R3 <- R1 R2.
BarInfo( bar, beer, price, addr ) Joe’s Bud 2.50 Maple St. Joe’s Milller 2.75 Maple St. Sue’s Bud 2.50 River Rd. Sue’s Coors 3.00 River Rd. Example Sells( bar, beer, price ) Bars( bar, addr ) Joe’s Bud 2.50 Joe’s Maple St. Joe’s Miller 2.75 Sue’s River Rd. Sue’s Bud 2.50 Sue’s Coors 3.00 BarInfo <- Sells Bars Note Bars.name has become Bars.bar to make the natural join “work.”
A join can be specified between a relation and itself (self-join). • One can think of this as joining two distinct copies of the relation, although only one relation actually exists.
Inner joins or outer joins • The join operations we described so far are called inner joins, where only matching tuples (through join condition or selection) are kept in the result. • Sometimes, we need query results in which tuples from two tables are to be combined by matching corresponding rows, but without losing any tuples for lack of matching values. To meet this need, a set of operations, called outer joins are introduced here.
Outerjoin • Suppose we join R JOINC S. • A tuple of R that has no tuple of S with which it joins is said to be dangling. • Similarly for a tuple of S. • Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.
R OUTERJOIN S = A B C 1 2 3 4 5 NULL NULL 6 7 Example: Outerjoin R = A B S = B C 1 2 2 3 4 5 6 7 (1,2) joins with (2,3), but the other two tuples are dangling.
Three different kinds of outerjoin • Full outerjoin • left outerjoin • Right outerjoin A B C 1 2 3 NULL 4 5 A B C 1 2 3 7 NULL 6
Expressions in a Single Assignment • There are multiple ways to express the same meaning in relational algebra. • Example: the theta-join R3 <- R1 JOINC R2 use a single theta join operator, it is equivalent to : R3 <- SELECTC (R1 X R2), which involves two operators, i.e. product and select.
Precedence of relational operators • Precedence of relational operators: • Unary operators --- select, project, rename --- have highest precedence, bind first. • Then come products and joins. • Then intersection. • Finally, union and set difference bind last. • You can always insert parentheses to force the order you desire.
Building Complex Expressions • Algebras allow us to express sequences of operations in a natural way. • Example: in arithmetic --- (x + 4)*(y - 3). • Relational algebra allows the same. • Three notations, just as in arithmetic: • Sequences of assignment statements. • Expressions with several operators. • Expression trees.
Sequences of operations • We can break complex relational algebra expressions into smaller ones by applying one or more operators at a time and naming the intermediate results to improve readability.
Expression Trees • Leaves are operands --- either variables standing for relations or particular, constant relations. • Interior nodes are operators, applied to their child or children.
Example • Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Maple St. or sell Bud for less than $3.
UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Maple St.” SELECT price<3 AND beer=“Bud” As a Tree: Bars Sells
Example • Using Sells(bar, beer, price), find the bars that sell two different beers at the same price. • Strategy: by renaming, define a copy of Sells, called S(bar, beer1, price). The natural join of Sells and S consists of quadruples (bar, beer, beer1, price) such that the bar sells both beers at this price.
PROJECTbar SELECTbeer != beer1 JOIN RENAMES(bar, beer1, price) The Tree Sells Sells
Relation Schemas ( i.e. relation structures) for Interior Nodes in expression tree • An expression tree defines a schema for the relation associated with each interior node. • Similarly, a sequence of assignments defines a schema for each relation on the left of the <- sign.
Schema-Defining Rules 1 • For union, intersection, and difference, the schemas of the two operands must be the same, so use that schema for the result. • Selection: schema of the result is the same as the schema of the operand. • Projection: list of attributes tells us the schema.