540 likes | 768 Views
Relational Algebra. Lecture 4. Relational Algebra. Relational Algebra Extended Relational-Algebra-Operations Modification of the Database Views Operations on Bags. Relational Schema and Relations. R is a set of attributes: R(A,B,C, …, D) – is a relational schema
E N D
Relational Algebra Lecture 4
Relational Algebra • Relational Algebra • Extended Relational-Algebra-Operations • Modification of the Database • Views • Operations on Bags
Relational Schema and Relations • R is a set of attributes: R(A,B,C, …, D) – is a relational schema • <val(A1), val(A2), val(C), …val(D)> - tuple • r – is a set of tuples defined on attributes of R is a relation with schema R. Denoted by r(R).
What is “algebra” • Mathematical model consisting of: • Operands --- Variables or values; • Operators --- Symbols denoting procedures that construct new values from a given values • Relational Algebra is algebra whose operands are relations and operators are designed to do the most commons things that we need to do with relations
Relational Algebra • Six basic operators • Select – s (r) • Project – P (r) • Union – r U s • set difference – r - s • Cartesian product – r X s • Rename – rename (r) • The operators take two or more relations as inputs and give a new relation as a result.
Select Operation – Example A B C D • Relation r 1 5 12 23 7 7 3 10 • A=B ^ D > 5(r) A B C D 1 23 7 10
Select Operation • Notation: p(r) • p is called the selection predicate • Defined as: p(r) = {t | t rand p(t)} Where p is a formula in propositional calculus consisting of terms connected by : (and), (or), (not)Each term is one of: <attribute> op <attribute> or <constant> where op is one of: =, , >, . <. • Example of selection:branch-name=“Perryridge”(account)
Project Operation – Example A B C • Relation r: 10 20 30 40 1 1 1 2 A C A C • A,C (r) That is, the projection of a relation on a set of attributes is a set of tuples 1 1 1 2 1 1 2 =
Project Operation • Notation:A1, A2, …, Ak (r) where A1, A2 are attribute names and r is a relation. • The result is defined as the relation of k columns obtained by erasing the columns that are not listed • Duplicate rows removed from result, since relations are sets • E.g. to eliminate the branch-name attribute of accountaccount-number, balance (account)
Union Operation – Example A B A B 1 2 1 2 3 • Relations r, s: s r A B 1 2 1 3 r s:
Union Operation • Notation: r s • Defined as: r s = {t | t r or t s} • For r s to be valid. 1. r,s must have the same number of attributes 2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s) to find all customers with either an account or a loancustomer-name (depositor) customer-name (borrower)
Set Difference Operation – Example A B A B 1 2 1 2 3 • Relations r, s: s r A B 1 1 r – s:
Set Difference Operation • Notation r – s • Defined as: r – s = {t | t rand t s} • Set differences must be taken between compatible relations. • r and s must have the same number of attributes • attribute domains of r and s must be compatible
Cartesian-Product Operation-Example A B C D E Relations r, s: 1 2 10 10 20 10 a a b b r s r xs: A B C D E 1 1 1 1 2 2 2 2 10 10 20 10 10 10 20 10 a a b b a a b b
Cartesian-Product Operation • Notation r x s • Defined as: r x s = {t q | t r and q s} • Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ). • If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
Composition of Operations A B C D E • Can build expressions using multiple operations • Example: A=C(r x s) • r x s • A=C(r x s) 1 1 1 1 2 2 2 2 10 10 20 10 10 10 20 10 a a b b a a b b A B C D E 10 20 20 a a b 1 2 2
Rename Operation • Allows us to name, and therefore to refer to, the results of relational-algebra expressions. • Allows us to refer to a relation by more than one name. Example: X (E) returns the expression E under the name X If a relational-algebra expression E has arity n, then (A1, A2, …, An) (E) returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …., An. x x
Banking Example branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-only) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)
Example Queries • Find all loans of over $1200 amount> 1200 (loan) • Find the loan number for each loan of an amount greater than $1200 loan-number (amount> 1200 (loan))
Example Queries • Find the names of all customers who have a loan, an account, or both, from the bank customer-name (borrower) customer-name (depositor) • Find the names of all customers who have a loan and an account at bank. customer-name (borrower) customer-name (depositor)
Example Queries • Find the names of all customers who have a loan at the Perryridge branch. customer-name (branch-name=“Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) • Find the names of all customers who have a loan at the Perryridge branch but do not have an account at any branch of the bank. customer-name (branch-name = “Perryridge” (borrower.loan-number = loan.loan-number(borrower x loan))) – customer-name(depositor)
Example Queries • Find the names of all customers who have a loan at the Perryridge branch. • Query 1customer-name(branch-name = “Perryridge”( borrower.loan-number = loan.loan-number(borrower x loan))) Query 2 customer-name(loan.loan-number = borrower.loan-number( (branch-name = “Perryridge”(loan)) x borrower))
Example Queries Find the largest account balance • Rename account relation as d • The query is: balance(account) - account.balance (account.balance < d.balance(account x rd (account)))
Formal Definition • A basic expression in the relational algebra consists of either one of the following: • A relation in the database • A constant relation • Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions: • E1 E2 • E1 - E2 • E1 x E2 • p (E1), P is a predicate on attributes in E1 • s(E1), S is a list consisting of some of the attributes in E1 • x(E1), x is the new name for the result of E1
Additional Operations We define additional operations that do not add any power to the relational algebra, but that simplify common queries. • Set intersection • Natural join • Division • Assignment
Set-Intersection Operation • Notation: r s • Defined as: • rs ={ t | trandts } • Assume: • r, s have the same arity • attributes of r and s are compatible • Note: rs = r - (r - s)
Set-Intersection Operation - Example A B • Relation r, s: • r s A B 1 2 1 2 3 r s A B 2
Natural-Join Operation • Notation: r s • Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R S obtained as follows: • Consider each pair of tuples tr from r and ts from s. • If tr and ts have the same value on each of the attributes in RS, add a tuple t to the result, where • t has the same value as tr on r • t has the same value as ts on s • Example: R = (A, B, C, D) S = (E, B, D) • Result schema = (A, B, C, D, E) • rs is defined as:r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))
r s Natural Join Operation – Example B D E A B C D • Relations r, s: 1 3 1 2 3 a a a b b 1 2 4 1 2 a a b a b r s A B C D E 1 1 1 1 2 a a a a b
Division Operation Notation: r s • Suited to queries that include the phrase “for all”. • Let r and s be relations on schemas R and S respectively where • R = (A1, …, Am, B1, …, Bn) • S = (B1, …, Bn) The result of r s is a relation on schema R – S = (A1, …, Am) r s = { t | t R-S(r) u s ( tu r ) }
Division Operation – Example A B Relations r, s: B 1 2 3 1 1 1 3 4 6 1 2 1 2 s r s: A r
Another Division Example Relations r, s: A B C D E D E a a a a a a a a a a b a b a b b 1 1 1 1 3 1 1 1 a b 1 1 s r A B C r s: a a
Division Operation • Definition in terms of the basic algebra operationLet r(R) and s(S) be relations, and let S R r s = R-S (r) –R-S ( (R-S(r) x s) – R-S,S(r)) To see why • R-S,S(r) simply reorders attributes of r • R-S(R-S(r) x s) – R-S,S(r)) gives those tuples t in R-S(r) such that for some tuple u s, tu r.
Assignment Operation • The assignment operation () provides a convenient way to express complex queries. • Write query as a sequential program consisting of • a series of assignments • followed by an expression whose value is displayed as a result of the query. • Assignment must always be made to a temporary relation variable. • Example: Write r s as temp1 R-S (r)temp2 R-S ((temp1 x s) – R-S,S (r))result = temp1 – temp2
Query 1 CN(BN=“Downtown”(depositoraccount)) CN(BN=“Uptown”(depositoraccount)) where CN denotes customer-name and BN denotes branch-name. Query 2 customer-name, branch-name(depositoraccount) temp(branch-name) ({(“Downtown”), (“Uptown”)}) Example Queries • Find all customers who have an account from at least the “Downtown” and the Uptown” branches.
customer-name, branch-name(depositoraccount) branch-name (branch-city = “Brooklyn” (branch)) Example Queries • Find all customers who have an account at all branches located in Brooklyn city.
Extended Relational-Algebra-Operations • Generalized Projection • Outer Join • Aggregate Functions
Generalized Projection • Extends the projection operation by allowing arithmetic functions to be used in the projection list.F1, F2, …, Fn(E) • E is any relational-algebra expression • Each of F1, F2, …, Fn are are arithmetic expressions involving constants and attributes in the schema of E. • Given relation credit-info(customer-name, limit, credit-balance), find how much more each person can spend: customer-name, limit – credit-balance (credit-info)
Aggregate Functions and Operations • Aggregation function takes a collection of values and returns a single value as a result. avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values • Aggregate operation in relational algebra G1, G2, …, GngF1( A1), F2( A2),…, Fn( An) (E) • E is any relational-algebra expression • G1, G2 …, Gn is a list of attributes on which to group (can be empty) • Each Fiis an aggregate function • Each Aiis an attribute name
Aggregate Operation – Example A B C • Relation r: 7 7 3 10 sum-C gsum(c)(r) 27
Aggregate Operation – Example • Relation account grouped by branch-name: branch-name account-number balance Perryridge Perryridge Brighton Brighton Redwood A-102 A-201 A-217 A-215 A-222 400 900 750 750 700 branch-nameg sum(balance) (account) branch-name balance Perryridge Brighton Redwood 1300 1500 700
Aggregate Functions • Result of aggregation does not have a name • Can use rename operation to give it a name • For convenience, we permit renaming as part of aggregate operation branch-nameg sum(balance) as sum-balance (account)
branch-name loan-number amount Downtown Redwood Perryridge L-170 L-230 L-260 3000 4000 1700 customer-name loan-number Jones Smith Hayes L-170 L-230 L-155 Outer Join – Example • Relation loan • Relation borrower
loan-number branch-name amount customer-name L-170 L-230 Downtown Redwood 3000 4000 Jones Smith • Left Outer Join loan Borrower loan-number branch-name amount customer-name L-170 L-230 L-260 Downtown Redwood Perryridge 3000 4000 1700 Jones Smith null Left Outer Join • Joinloan Borrower
loan-number branch-name amount customer-name L-170 L-230 L-155 Downtown Redwood null 3000 4000 null Jones Smith Hayes loan-number branch-name amount customer-name L-170 L-230 L-260 L-155 Downtown Redwood Perryridge null 3000 4000 1700 null Jones Smith null Hayes Right Outer Join, Full Outer Join • Right Outer Join loanborrower Outer Join loan borrower
Null Values • It is possible for tuples to have a null value, denoted by null, for some of their attributes • null signifies an unknown value or that a value does not exist. • The result of any arithmetic expression involving null is null. • Aggregate functions simply ignore null values • For duplicate elimination and grouping, null is treated like any other value, and two nulls are assumed to be the same
Null Values • Comparisons with null values return the special truth value unknown • If false was used instead of unknown, then not (A < 5) would not be equivalent to A >= 5 • Three-valued logic using the truth value unknown: • OR: (unknownortrue) = true, (unknownorfalse) = unknown (unknown or unknown) = unknown • AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown • NOT: (not unknown) = unknown • Result of select predicate is treated as false if it evaluates to unknown
customer-name, branch-name Expression Trees Leaves are operands --- either variables standing for relations or particular relations Interior nodes are operators applied to their descendents depositor account
Relational Algebra on Bags • A bag is like a set but it allows elements to be repeated in a set. • Example: {1, 2, 1, 3, 2, 5, 2} is a bag. • Difference between a bag and a list is that order is not important in a bag. • Example: {1, 2, 1, 3, 2, 5, 2} and {1,1,2,3,2,2,5} is the same bag
Need for Bags • SQL allows relations with repeated tuples. Thus SQL is not a relational algebra but rather “bag” algebra • In SQL one need to specifically ask to remove duplicates, otherwise replicated tuples will not be eliminated • Operation projection is more efficient on bags than on sets