1 / 40

Database Systems 236363

Database Systems 236363. Relational Algebra. Query Languages. A query An expression that enables extracting data from a database A query language A language for expressing queries. Relational Algebra.

dick
Download Presentation

Database Systems 236363

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Systems236363 Relational Algebra

  2. Query Languages • A query • An expression that enables extracting data from a database • A query language • A language for expressing queries

  3. Relational Algebra • Relational algebra is a language that enables expressing queries over relational databases • The syntax is similar to algebra, but the expressions operate on tables • The result of such an algebraic expression (query) is a table (relation) Unless stated otherwise, we assume that all relations are sets and all algebraic operators operate on sets

  4. Algebraic Operators • There are 5 basic operators plus 2 technical ones • Unary operators: Projection, Selection • Binary operators: Cartesian product, Subtraction, Union • Renaming attributes: a technical operation to enable composing operators • Assignment: a technical operation to enable execution in stages • Complex operators that can be obtained by composing basic operators: • Intersection, Join (including its variants), Division, etc. • For performance reasons, some complex operators might have direct implementations

  5. Projection • Projection eliminates some of the attributes from the records of a given relation • In terms of tables manipulation, this means removing columns that do not appear in the projection index and then eliminating multiple lines formed due to the column elimination T= πHeb,RomanT=

  6. Selection • Let T=T(A1,…,Am) be a table with schema consisting of attributes A1,…,Am. For an expression θ, σθT consists of all records in T satisfying the expression θ • θ may consist of the following: • Comparisons (through the operators =,<,>,≥,≤,≠) between an attribute and a constant or between two attributes. When the attribute is a set, it is also possible to use set operators like  etc. • Boolean operators, e.g., (A1=“Cat”)  (A2≤8) T= σNo≤2T=

  7. Union, Subtraction and Intersection • These operations are applied only to relations of the same schema and are identical to their counterparts from set theory • Union ST: obtains all records that appear in either S or T • Subtraction S\T: obtains all records that appear in S but not in T • Intersection: ST: obtains all records that appear in both S and T • Intersection is not a basic operator – it can be expressed as S\(S\T) • Examples: S= T= ST= S\T= ST=

  8. Cartesian Product • A Cartesian product yields all combinations of records from the first relation with records from the second relation • In terms of tables, we take all concatenations of rows from the first table with rows from the second table • Whenever S and T have attributes with the same name, we distinguish between them either by adding the table name as a prefix of the attribute name, e.g., “S.Name” and “T.name”, or by adding a sequence number, e.g., “Name1” and “Name2” S= T= S x T= What happens if one of the relations is empty?

  9. Renaming • This is not an algebraic operation, but rather a technical helper operation that is used to compose complex operations (examples appear shortly) • For a given table T=T(A1,…,Am) with a schema consisting of attributes A1,…,Am, the operator ρA1→B1,…,Am→Bm(T) returns an identical table in which the attribute names were changed to B1,…,Bm T= ρAnimal→חיה,Color →צבע(T)=

  10. Basic Operators • The five operators - Projection, Selection, CartesianProduct, Subtraction, and Union – are basic, i.e., • None of these operators can be obtained from the other four • More complex operators (e.g., Join, Division, Intersection) can be obtained by composing some of the basic ones • How can we show that an operator is basic? • We need to find a property that is satisfied by this operator and cannot be obtained by any composition of the others

  11. Proving that an Operator is Basic • Claim: Projection cannot be expressed by composing Select, Cartesian Product, Subtraction, and Union • Proof sketch: Let R be a relation with n attributes A1,…,An. Then πA1R results in a relation with fewer than n attributes. However, the result of applying any of the operators Select, Cartesian Product, Subtraction, and Union on R (and possibly other relations) yields at least n attributes. • This can be shown by induction on the number of operators in the expression • How can this proof be adjusted to show that Cartesian Product is a basic operator?

  12. θ-Join • Given two relations S(A1,…,An) and T(B1,…,Bm) and an expression θ on the attributes A1,…,An,B1,…,Bm, denote by S⋈θT the result of the algebraic expression σθ(SxT) • Example S= T= S⋈B>CT=

  13. Natural Join • A very common operation on databases • For the relations S(A1,…,An,B1,…,Bm) and T(B1,…,Bm,C1,…,Ck),denote by S⋈T the relation that includes all possible combinations of a record from S with a record from T whose common attributes are the same, in which only a single attribute (column) is kept for each pair of common attributes. • More precisely, S⋈T= A1,...,An,S.B1,...,S.Bm,C1,...,Ck(S⋈(S.B1=T.B1) ... (S.Bm=T.Bm)T) S= T= S⋈T=

  14. Semi-Join • For the relations S(A1,…,An,B1,…,Bm) and T(B1,…,Bm,C1,…,Ck),denote by S⋉T the relation that includes all records in S for which there exists a record in T whose common attributes are the same • More precisely, S⋉T= A1,...,An,B1,...,Bm(S⋈T) • For performance reasons, this is usually implemented directly S= T= S⋉T=

  15. Division • For the relations S(A1,…,An,B1,…,Bm) and T(B1,…,Bm) (i.e., the attributes of T are a subset of the attributes of S), denote by ST the relation R(A1,…,An) consisting of all records for which there exists a record in S corresponding to all records in T • More precisely, ST is the maximal relation R such that RxTS • I.e., ST=A1,...,AnS\A1,...,An(((A1,...,AnS)T)\S) ST= S= T=

  16. Division - Example • We’d like to obtain from table A the supplier number of all suppliers that sell all parts in table B A= B= AB=

  17. Division - Example • We’d like to obtain from table A the supplier number of all suppliers that sell all parts in table B A= B= AB=

  18. Division - Example • We’d like to obtain from table A the supplier number of all suppliers that sell all parts in table B A= B= AB=

  19. Relational Algebra – Summarizing Example • Recall the train operation example A_Time D_Time S_Name T_Num Height S_Type Days Arrives Station Train Platform Km Gives Serves Line Service Height Direction T_Category Food L_Num Class

  20. Relational Algebra – Summarizing Example • What Tables do we Extract? • What columns should exist for the relationship set “Serves”? • The key S_Name (of the “Station” entity set) • The key’s attributes L_Num and Direction (of “Line”) • These triplet would serve as the key for “Serves” • In addition, a column for the relation attribute Km • What columns should exist for the relationship set “Arrives”? • The key T_Num of the entity set “Train” • The key’s attributes for the aggregated relationship set “Serves”, i.e., S_Name, L_num, and Direction • The three attributes of the relationship set “Arrives” itself • Platform, A_time, D_time

  21. The Schemas • From the previous slide (underlined attribute names represent keys) • Serves(S_Name, L_Num, Direction, Km) • Arrives(T_Num, S_Name, L_Num, Direction, Platform, • D_Time, A_Time) • We will represent the multiple value attribute as a separate relation • Station(S_Name, Height) • Station_Type(S_Name, S_Type)

  22. Sample Queries • Which stations are served by the line 1-South? • Here, all information is in the table/relation “Serves” S_Name((L_Num=1)(Direction=“south”)(Serves)) • Which lines serves stations below sea level? • Here, we need to join “Serves” and “Station” L_Num,Direction(Height<0(Station⋈Serves)) • We can also use semi-join L_Num,Direction(Serves⋉S_Name(Height<0(Station)))

  23. Sample Queries • What stations are served by more than one line? • Here, we need to examine two lines from “Serves” at a time S_Name((S_Name=S)((L_NumL)(DirectionD))( S_Name→S, L_Num→L, Direction→D, Km→K (Serves)  Serves)) • Suppose we do not care about different directions of the same line? S_Name((S_Name=S)(L_NumL)(S_Name→S, L_Num→L, Direction→D, Km→K (Serves)  Serves)) • Which stations serve exactly one line (in any direction)? S_Name(Serves) \S_Name((S_Name=S)(L_NumL)( S_Name→S, L_Num→L, Direction→D, Km→K (Serves)  Serves))

  24. Sample Queries • What is the name of the highest station? • It is in fact easier to find all other stations R = S_Name((Height<H)(Station   S_Name→N, Height→HStation)) • Now, we can complete the query S_Name(Station) \ R

  25. Example of Using Division • Which trains (by number) serve all stations? T_Num,S_Name(Arrives)S_Name(Station) • What if there are stations that do not appear in any line? How can we avoid referring to them? T_Num,S_Name(Arrives)S_Name(Serves) • What if there are stations that serve a line but no train stops there? How can we ignore them? T_Num,S_Name(Arrives)S_Name(Arrives)

  26. More Complex Examples • Which lines share “change” stations? • First, we will find all <Line,Serves> couples for which the station is a “change” station R = (Line ⋈ Serves)⋉S_Name(S_Type =“change” (Station_Type)) • Now, the Cartesian product RxR would include all combinations of <Line,Serves> couples for which the station is a “change” station. Suppose we distinguish between attributes of each using indices, e.g., S_Name1,S_Name2, etc. • We would like to select from the product only couples for which the station is the same S =  (S_Name1=S_Name2)  (L_Num1L_Num2) (RR) • Finally, we project to obtain only the attributes we are intereted in T = L_Type1,L_Num1,Direction1,L_Type2,L_Num2,Direction2(S)

  27. Which Queries Cannot be Expressed in RA? • Aggregate functions: • How many lines travel “North”? • What is the average distance between stations in Line 1-South? • How many trains stop in each station on Monday? • These functions operate on an unknown number of parameters (counting, sum, average, …) from the set of values obtained from the relation • There are several extensions to RA that enable such queries as well as in SQL

  28. Inexpressible Queries - Continued • Transitive Closure • The following query cannot be expressed in RA: • What are all the stations that can be reached from station S in a finite number of train changes? • Notice, for any given constant k, we can express the following query • What are all the stations that can be reached from station S in at most k train changes? • The key here is whether k is bounded and given or not • These type of queries cannot be expressed in standard ANSI SQL either without the help of the hosting language • In contrast, Datalog supports recursive queries, and in particular this type of queries

  29. Attribute-less Relations • In order to facilitate handling yes/no questions, and to simplify some other queries, often RA is extended to allow relations with no attributes (tables with no rows) • What can such a relation contain? • It can be empty • It can include a single empty line • This represents the value “true” vs. “false” in an empty relation, or “exists” vs. “non-exist” in an empty relation

  30. Generalizing Algebraic Operators • Projection • For a relation R, the empty set is considered a subset of the set of its attributes and the corresponding projection is denoted πλR • The result is the empty relation if R is an empty relation and the single empty record if R is not empty • Cartesian Product • When S has no attributes, RxS results in an empty relation if S is empty and in R if S includes the empty record • Division • When R and S have the same attributes, RS includes the empty record if SR and is empty otherwise

  31. Queries Equivalence • Two expressions are equivalent if they return the same result when applied to any possible database content • Why is this important? • For performance reasons, we may be able to translate one query into an equivalent one that is easier/faster to evaluate • Example: • For the relation R(A,B,C), which of the following equivalent expressions can be evaluated more efficiently? A (R) A (A,B(R)) A (A,C(R))

  32. Example • Given the relations R(A,B), S(A,B), we will show that ARAS is equivalent to A(RS) • We will show that ARAS  A(RS) • As well as that ARAS  A(RS)

  33. Example - Proof (1/2) • We show that ARAS  A(RS) For any record t, if t(AR  AS) then by the definition of union, either tAR or tAS (or both) Hence, either there exists a uR such that u[A]=t[A] or there exists a uSsuch that u[A]=t[A] (or both) In other words, tA(RS)

  34. Example - Proof (2/2) • We show that ARAS  A(RS) If there exists a tA(RS), then by definition there exists a uRSsuch that u[A]=t[A] Hence, either there is a uRsuch that u[A]=t[A] or there is a uSsuch that u[A]=t[A] (or both) In other words, t(ARAS)

  35. On the Importance of Equivalence • A typical SQL query • One needs to ensure that the optimization does not alter the semantics SELECT DISTINCT A FROM R,S WHERE R.B=S.B AND C=5; Direct translation A ((R.B=S.B)(C=5)(R  S)) Optimization A (R⋉B ( C=5S)))

  36. Outerjoin • Natural Join suffers from information loss – that is, the result does not include records that do not have a corresponding record in the second table • To that end, the missing attributes are padded with null • More precisely, for the relations S(A1,…,An,B1,…,Bm) and T(B1,…,Bm,C1,…,Ck),denote by S⋈outerTthe natural join of S and T such that we append null attributes to each record in S that does not have a corresponding record in T and vice versa • That is, S⋈outerT=(S⋈T)((S\(S⋉T))C1,...,CkTnull)(A1,...,AnSnull(T\(T⋉S))) where Tnull is a relation with the same schema as T that includes one record with null values and similarly Snull is a record with the same schema as S that includes one record with null values

  37. Example S= T= What happens when one of the relations is empty? There are also left-outer join, in which only the records of the left relation are padded and right-outer join in which only the records of the right relation are padded S⋈outerT=

  38. Null Values • Once we allow null values, the definition of the RA operators need to be generalized to encompass them • This definition is complicated and outside the scope of this course

  39. Bag/Multi-set Semantics • Until now, we assumed that relations are sets • In practice, and in particular in SQL, we may wish to allow relations that include multiple entries of the same record • To that end, the operators should be generalized accordingly

  40. RA Operators in Bag Semantics • Projection • we eliminate attributes without eliminating duplicates • Selection • we return the corresponding records without eliminating duplicates • Cartesian product RxS • if a record u appears n times in R and a record v appears m times in S, then their concatenation appears nm times in RxS • Union ST • if a record u appears n times in R and m times in S, it will appear n+mtimes in ST • SubtractionS\T • if a record u appears n times in R and m times in S, it will appear max{0,n-m} times in S\T • Intersection ST • if a record u appears n times in R and m times in S, it will appear min{n,m} times in ST • Operator δ • explicit duplicate elimination • Unless otherwise stated, we assume a set semantics in this course

More Related