460 likes | 563 Views
CS4432: Database Systems II. Query Processing. Query in SQL Query Plan in Algebra. Example. Data: relation R (A, B, C) relation S (C, D, E) Query: SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C.
E N D
CS4432: Database Systems II query processing
Query Processing Query in SQL Query Plan in Algebra query processing
Example Data: relation R (A, B, C) relation S (C, D, E) Query: SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C query processing
SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C Answer B D 2 x R A B C S C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 query processing
How do we execute query? - Form Cartesian product of all tables in FROM-clause - Select tuples that match WHERE-clause - Project columns that occur in SELECT-clause One idea query processing
Bingo! Got one... R X S R.A R.B R.C S.C S.D S.E a 1 10 10 x 2 a 1 10 20 y 2 . . C 2 10 10 x 2 . . query processing
But ? • Performance would be unacceptable! • We need a better approach to reasoning about queries, their execution orders and their respective costs query processing
Formal Relational Query Languages • Two mathematical Query Languages form basis for “real” languages (e.g. SQL), and for implementation: • Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.) • Relational Algebra: More operational, very useful for representing execution plans. query processing
Relational Algebra • Tuple : ordered set of data values • Relation: a set of tuples • Algebra: formal mathematical system consisting of a set of objects and operations on those objects • Relational algebra: Algebra whose objects are relations and operators transform relations into other relations query processing
Relational Algebra ? Query: SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C query processing
Relational Algebra - can be used to describe plans... Ex: Plan I B,D sR.A=“c” S.E=2 R.C=S.C X R S OR: B,D [sR.A=“c” S.E=2 R.C = S.C (RXS)] query processing
R1 Example Instances “Sailors” and “Reserves” relations S1 S2 query processing
Relational Algebra • Basic operations: • Selection ( ) Selects a subset of rows from relation. • Projection ( ) Deletes unwanted columns from relation. • Cross-product( ) Allows us to combine two relations. • Set-difference ( ) Tuples in reln. 1, but not in reln. 2. • Union( ) Tuples in reln. 1 and in reln. 2. • Additional operations: • Intersection, join, division, renaming: Not essential ! • Algebra is “closed”: Since each operation returns a relation, operationscan be composed ! query processing
Projection query processing
Selection query processing
Union, Intersection, Set Difference • Operate on two union-compatible relations: • Same number of fields. • `Corresponding’ fields have same type. query processing
Cross-Product • Each row of S1 is paired with each row of R1. • Conflict: Both S1 and R1 have a field called sid. • Renaming operator: query processing
Joins • Condition Join : • Result schema same as that of cross-product. query processing
Joins • Equi-Join: condition contains only equalities. • Result schema only one copy of fields for which equality is specified. • Natural Join: Equijoin on all common fields. query processing
Division • Not primitive operator, but useful: Find sailors who have reserved allboats. • A has 2 fields x and y; B has only field y: • A/B = • i.e., A/B contains all x tuples (sailors) such that for everyy tuple (boat) in B, there is an xy tuple in A. query processing
B1 B2 B3 A Examples of Division A/B A/B1 A/B2 A/B3 query processing
Disqualified x values: A/B: all disqualified tuples Expressing A/B Using Basic Operators • Division is useful shorthand. • Idea: For A/B, compute all x values that are not `disqualified’ by some y value in B. query processing
Solution 1: • Solution 2: • Solution 3: Find names of sailors who’ve reserved boat #103 query processing
A more efficient solution: Find names of sailors who’ve reserved a red boat • Information about boat color only available in Boats; so need an extra join: A query optimizer can find this, given the first solution! query processing
What happens if is replaced by in this query? Find sailors who’ve reserved a red or a green boat • Can identify all red or green boats, then find sailors who’ve reserved one of these boats: • Can also define Tempboats using union! (How?) query processing
Find sailors who’ve reserved a red and a green boat • Must identify sailors who’ve reserved red boats, sailors who’ve reserved green boats, then find the intersection (sid is a key for Sailors): query processing
Find names of sailors who’ve reserved all boats • Uses division; schemas of input relations to / must be carefully chosen: • To find sailors who’ve reserved all ‘Interlake’ boats: ..... query processing
Relational Algebra representation used to describe plans... query processing
Example Data: relation R (A, B, C) relation S (C, D, E) Query: SELECT B, D FROM R, S WHERE R.A = “c” and S.E = 2 and R.C=S.C query processing
Relational Algebra - to describe plan Ex: Plan I B,D sR.A=“c” S.E=2 R.C=S.C X R S OR: B,D [sR.A=“c” S.E=2 R.C = S.C (RXS)] query processing
Another idea: Plan II B,D sR.A = “c”sS.E = 2 R S natural join query processing
R S A B C s (R) s(S) C D E a 1 10 A B C C D E 10 x 2 b 1 20 c 2 10 10 x 2 20 y 2 c 2 10 20 y 2 30 z 2 d 2 35 30 z 2 40 x 1 e 3 45 50 y 3 SELECT B,D FROM R,S WHERE R.A = “c” and S.E = 2 and R.C=S.C query processing
Yet another idea: Plan III B,D sS.E = 2 sR.A = “c” R S natural join query processing
Plan III Use R.A and S.C Indexes (1) Use R.A index to select R tuples with R.A = “c” (2) For each R.C value found, use S.C index to find matching tuples (3) Eliminate S tuples S.E 2 (4) Join matching R,S tuples, project B,D attributes and place in result query processing
=“c” <c,2,10> <10,x,2> check=2? output: <2,x> next tuple: <c,7,15> R S A B C C D E a 1 10 10 x 2 b 1 20 20 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50 y 3 A C I1 I2 query processing
Overview of Query Optimization query processing
SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..} query processing
Example: SQL query Query : Find the movies with stars born in 1960 SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); query processing
<Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> title StarsIn <Attribute> ( <Query> ) starName <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> name MovieStar birthDate ‘%1960’ Example: Parse Tree query processing
Example: Generating Relational Algebra title StarsIn <condition> <tuple> IN name <attribute> birthdate LIKE ‘%1960’ starName MovieStar Fig. 16.14: An expression using a two-argument , midway between a parse tree and relational algebra query processing
Example: Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 16.16: Applying the rule for IN conditions query processing
Example: Improved Logical Query Plan title Question: Push project to StarsIn? starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar Fig. 16.16: An improvement on prev fig query processing
Example: Estimate Result Sizes Need expected size StarsIn MovieStar P s query processing
Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar query processing
Example: Estimate costs L.Q.P P1 P2 …. Pn C1 C2 …. Cn Pick best! query processing
Query Optimization • Relational algebra level … • Detailed query plan level … • Estimate costs • Generate and compare plans query processing