280 likes | 294 Views
Relational Algebra. Relational Query Languages. Query languages: allow manipulation and retrieval of data from a database Relational model supports simple, powerful QLs: Strong formal foundation Allows for much optimization Query languages are NOT programming languages
E N D
Relational Query Languages • Query languages: allow manipulation and retrieval of data from a database • Relational model supports simple, powerful QLs: • Strong formal foundation • Allows for much optimization • Query languages are NOT programming languages • QLs not expected to be “turing complete” • QLs not intended to be used for complex calculations • QLs support easy, efficient and sophisticated access to large data sets
Formal Relational Query Languages • Mathematical query languages form the basis for “real” languages (e.g. SQL) and for implementation: • Relational Algebra: • Operational - a query is a sequence of operations on data • Very useful for representing execution plans, i.e., to describe how a SQL query is executed internally • Relational Calculus: • descriptive - a query describes how the data to be retrieved looks like • Understanding Relational Algebra is key to understanding SQL.
Preliminaries • A query is applied to a relation instance, and the result of a query is also a relation instance (i.e., input = relation instances, output = relation instance) • Schemas of input relations for a query are fixed (but query will run regardless of instance) • The schema for the result of a given query is also fixed! Determined by the definition of query language constructs
Example Instances • We assume that names of fields in query results are ‘inherited’ from names of fields in query input relations Sailors S1 age sid sname rating 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35 R Reserves Boat B bid sid bname color bid day 101 Interlake blue 103 Snapper red104 Bingo green 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 S2 sid age sname rating 44 robby 7 45 31 debby 8 55 58 lilly 10 35
Relational Algebra • Basic Operations • Selection : Selects a subset of rows from a relation • Projection ∏: projects to a subset of columns from a relation • Cross Product 5: Combines two relations • Set Difference: Tuples that are in the first but not the second relation • Union : tuples in any of the relations to be unified • Additional operations • Intersection (), join ( ), division, renaming • Not essential but very useful • Relational algebra is closed: since each operation has input relation(s), and returns a relation, operations can be composed
Projection • Projection: • Notation: ∏L(Rin) • Returns a subset of the columns of the input relation Rin, i.e, it ignores the attributes that are not in the projection list L • Schema of result relation contains exactly the attributes of the projection list, with the same attribute names as in Rin • Projection operator has to eliminate duplicates! • Note: real systems typically do NOT eliminate duplicates unless the user explicitly asks for it; eliminating duplicates is a very costly operation! ∏ (S2) ∏ (S2) S2 sname,rating age sname rating age sid sname rating age yuppy 9 debby 8 conny 5 lilly 10 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35 35 55
Selection and Operator Composition (S2) rating > 8 ∏ ( (S2)) sname,rating rating > 8 S2 • Selection: C(Rin) • Returns the subset of the rows of the input relation Rin that fulfill the condition C • Condition C involves the attributes of Rin • Schema of result relation identical to schema of Rin • No duplicates (obviously) • Operation Composition: result relation of one operation can be input for another relational algebra operation age sid sname rating 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35 age sid sname rating sname rating yuppy 9 lilly 10 28 yuppy 9 35 58 lilly 10 35
Union, Intersection,Set-Difference • Notation: • Rin1 Rin2 (Union), • Rin1 Rin2 (Intersection), • Rin1- Rin2 (Difference), • Usual operations on sets • Rin1 and Rin2 must be union-compatible, i.e., they must have the same number of attributes and corresponding attributes must have the same type (note that attribute names need not be the same) • The result schema is the same as the schema of the input relations (with possible renamed attributes)
Union, Intersection,Set-Difference: Examples S1 S2 S1 age sid sname rating sid age sname rating 44 robby 7 45 31 debby 8 55 58 lilly 10 35 28 yuppy 9 35 22 conny 5 35 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S2 S1 S2 age sid sname rating age sid sname rating 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35 31 debby 8 55 58 lilly 10 35 S1 - S2 sid age sname rating 44 robby 7 45
Cross-Product S2 sid bid day • Cross-Product: Rin1XRin2 • Each row of Rin1 is paired with each row of Rin2 • Result schema has one attribute per attribute of Rin1 and one attribute per attribute Rin2 with field names inherited if possible • Conflict if both relations have an attribute with the same name: e.g., attribute names of result relation concatenate input relation name with input attribute name sid age sname rating 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S1 X R1 R1. sid S1. sid age sname rating bid day 22 101 07/24/00 58 103 07/30/00 44 robby 7 45 44 robby 7 45 31 debby 8 55 31 debby 8 55 58 lilly 10 35 58 lilly 10 35 22 101 07/24/00 58 103 07/30/00 22 101 07/24/00 58 103 07/30/00
Joins • Condition Join (Theta-Join): Rout = Rin1C Rin2 = C(Rin1XRin2) • Result schema the same as for cross-product • Fewer tuples than cross-product sid bid day S2 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 sid age sname rating 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S2 R S2.sid > R.sid S2. sid R. sid age sname rating bid day 44 robby 7 45 31 debby 8 55 22 101 10/24/07 22 101 10/24/07 58 lilly 10 35 22 101 10/24/07
Self-Joins S2 sid age sname rating 44 robby 7 45 31 debby 8 55 58 lilly 10 35 give S2 a second name S2’ S2 S2’ S2.age > S2’.sid S2’. rating S2’. age S2. age S2. sname S2. rating S2’. sname S2. sid S2’. sid robby 7 45 58 lilly 10 35 31 debby 8 55 58 lilly 10 35
Equi-Join • Equi-Join: Rout = Rin1Rin1.a1 = Rin2.b1, … Rin1.an = Rin2.bn Rin2 = • A special case of condition join where the condition C contains only equalities. • Result schema similar to cross-product, but only one copy of attributes for which equality is specified • Natural Join: Equijoin on all common attributes, i.e., on all attributes with the same name sid bid day S2 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 sid age sname rating 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S2 R S2.sid = R.sid S2. sid age sname bid rating day lilly 10 35 58 lilly 10 35 103 09/30/07 101 09/01/07 104 10/23/07 58 lilly 10 35
Division • Not supported as a primitive operation, but useful for expressing queries like • Find sailors who have reserved allboats • Let A have 2 fields, x and y; B have only field y: • A/B = {<x> | <y> B <x,y> A} • I.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple in A • Or: if the set of y values (boats) associated with an x value (sailor) in A contains all y values in B, the x value is in A/B • In general, x and y can be any lists of fields; y is the list of fields in B, x y is the list of fields of A. R1 Reserves sid bid 101 103 58 101 58 104 B1 Boat bid 101 103 104
Examples of Division A/b x y y y y x1 y1 x1 y2 x1 y3 x1 y4 x2 y1 x2 y2 x3 y2 x4 y2 x4 y4 y2 y1 y2 y4 y2 y4 B1 B2 B3 x x1 x2 x3 x4 x x x1 x4 x1 A/B3 A/B1 A/B2 A
Expressing A/B using Basic Operators • Division is not an essential operation; just a useful shorthand • (Also true of joins, but joins are so common that systems implement joins specially) • Idea: For A/B, compute all x values that are not ‘disqualified’ by some y value in B. • x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A • Disqualified x values: ∏X((∏X (A) x B)-A) • A/B: ∏X (A) - all disqualified tuples
Renaming • Renaming : (Rout(B1,…Bn), Rin(A1, …An)) • Produces a relation identical to Rin but named Rout and with attributes A1, … An of Rin renamed to B1, … Bn (Temp, S1), (Temp1(sid1,rating1), S1(sid,rating))
Examples (discussed in class) • Relations • Sailors(sid,sname,rating,age) • Reserves(sid,bid,day) • Boats(bid,bname,color) • Queries • Find names of sailors who have reserved boat #103 (three solutions) • Find names of sailors who have reserved a red boat (2 solutions) • Find names of sailors who have reserved a red or a green boat (2 solutions) • Find names of sailors who have reserved a red and a green boat (1 solution) • Find names of sailors who have reserved all boats (solution with division)
Summary • The relational model has rigorously defined query languages that are simple and powerful • Relational algebra is operational; useful as internal representation for query evaluation plans • Projection, selection, cross-product, difference and union are the minimal set of operators with which all operations of the relational algebra can be expressed • Several ways of expressing a given query; a query optimizer should choose the most efficient version. • Relational Completeness of a query language: A query language (e.g., SQL) can express every query that is expressible in relational algebra
Relational Calculus • Relational Calculus is non-operational but descriptive: • Users define queries in terms of what they want, not in terms of how to compute it. (Declarativeness). • Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC) • Both TRC and DRC are simple subsets of first-order logic • Calculus has variables, constants, comparison operations, logical connectives and quantifiers. • TRC: variables range over tuples • DRC: variables range over domain elements (= field values) • Find the names of all sailors with a rating > 7 • TRC:{T | S Sailors(S.rating > 7 S.name = T.name)} • DRC:{<N> | S,R,A(<S,N,R,A> Sailors R7}
Domain Relational Calculus • Query has the form: {<x1,x2,…,xn>} | p(<x1,x2,…xn>)} • TheAnswer includes all tuples <x1,x2,…,xn> that make the formula p(<x1,x2,…xn>} be true. (‘|’ should be read as “such that”). • Formulas are recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger and better formulas using the logical connectives.
DRC Formulas • VariablesX,Y, x1,… range over domain elements (= field values) • Atomic formula: Let op be one of ,,,,, • <x1,x2,…xn> Relation-name, or • X op Y, or • X op constant • Formula • An atomic formula, or • p, pq, pq, where p and q are formulas, or • X(p(X)), where variable X is free in p(X), or • X(p(X)), where variable X is free in p(X) • The use of quantifiers X and X is said to bind X. A variable that is not bound is free.
Free and Bound Variables • The use of quantifiers X and X is said to bind X. A variable that is not bound is free. • Let us revisit the definition of a query: {<x1,x2,…,xn>} | p(<x1,x2,…xn>)} • There is an important restriction: the variables x1,…,xn that appear to the left of ‘|’ must be the only free variables in the formula p(…).
Example • Find all sailors with a rating above 7: {<I,N,T,A> | <I,N,T,A> Sailors T7} • The condition <I,N,T,A>Sailors ensures that the domain variables I,N,T, and A are bound to fields of the same Sailors tuple. • The condition T7 ensures that all tuples in the answers have a rating above 7 • The term <I,N,T,A> to the left of ‘|’ says that every tuple of the Sailors relation with a rating above 7 is in the answer
Further Examples (discussed in class) • Relations • Sailors(sid,sname,rating,age) • Reserves(sid,bid,day) • Boats(bid,bname,color) • Queries • Find sailors who are older than 18 or have a rating under 9, and are called ‘debby’’ • Find sailors with a rating above 7 that have reserved boat #103 (use of ) • Find sailors rated > 7 who have reserved a red boat (use of ) • Find sailors who have reserved all boats (use of ) • Transfer to “find all sailors I such that for each 3-tuple (B,BN,C> either it is not a tuple in Boats or there is a tuple in Reserves showing that sailor I has reserved the boat.
Unsafe Queries and Expressive Power • Unsafe queries: It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe. • E.g., {<I,N,T,A> | (<I,N,T,A> Sailors)} • Expressive Power: It is known that every query that can be expressed in relational algebra can be expressed as a safe query in DRC/TRC; the converse is also true. • Relational Completeness of a query language: A query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus
Some rules and definitions • Equivalence: Let R, S, T be relations; C, C1, C2 conditions; L projection lists of the relations R and S • Commutativity: • ∏L(C(R)) = C(∏L(R)) if C only considers attributes of L • R1 R2 = R2 R1 • Associativity: • R1 (R2 R3) = (R1 R2) R3 • Idempotence: • ∏L2 (∏L1(R)) = ∏L2 (R) if L2 L1 • C2(C1(R)) = C1C2(R))