410 likes | 569 Views
CSCE 608 – 600 Database Systems. Chapter 15: Query Execution. Index-Based Algorithms. The existence of an index is especially helpful for selection, and helps others Clustered relation : tuples are packed into the minimum number of blocks
E N D
CSCE 608 – 600 Database Systems Chapter 15: Query Execution
Index-Based Algorithms • The existence of an index is especially helpful for selection, and helps others • Clustered relation: tuples are packed into the minimum number of blocks • Clustering index: all tuples with the same value for the index's search key are packed into the minimum number of blocks
Index-Based Selection • Without an index, selection takes B(R), or even T(R), disk I/O's. • To select all tuples with attribute a equal to value v, when there is an index on a: • search the index for value v and get pointers to exactly the blocks containing the desired tuples • If index is clustering, then number of disk I/O's is about B(R)/V(R,a)
Examples • Suppose B(R) = 1000, T(R) = 20,000, there is an index on a and we want to select all tuples with a = 0. • If R is clustered and don't use index: 1000 disk I/O's • If R is not clustered and don't use index: 20,000 disk I/O's • If V(R,a) = 100, index is clustering, and use index: 1000/100 = 10 disk I/O's (on average) • If V(R,a) = 10, R is not clustered, index is non-clustering, and use index: 20,000/10 = 2000 disk I/O's (on average) • If V(R,a) = 20,000 (a is a key) and use index: 1 disk I/O
Using Indexes in Other Operations • If the index is a B-tree, can efficiently select tuples with indexed attribute in a range • If selection is on a complex condition such as "a = v AND …", first do the index-based algorithm to get tuples satisfying "a = v". • Such splitting is part of the job of the query optimizer
Index-Based Join Algorithm • Consider natural join of R(X,Y) and S(Y,Z). • Suppose S has an index on Y. for each block of R for each tuple t in the current block use index on S to find tuples of S that match t in the attribute(s) Y output the join of these tuples
Analysis of Index-Based Join • To get all the blocks of R, either B(R) or T(R) disk I/O's are needed • For each tuple of R, there are on average T(S)/V(S,Y) matching tuples of S • T(R)*T(S)/V(S,Y) disk I/O's if index is not clustering • T(R)*B(S)/V(S,Y) disk I/O's if index is clustering • This method is efficient if R is much smaller than S and V(S,Y) is large (i.e., not many tuples of S match)
Join Using a Sorted Index • Suppose we want to join R(X,Y)and S(Y,Z). • Suppose we have a sorted index (e.g., B-tree) on Y for R and S: • do sort-join but • no need to sort the indexed relations first
Buffer Management • The availability of blocks (buffers) of main memory is controlled by buffer manager. • When a new buffer is needed, a replacement policy is used to decide which existing buffer should be returned to disk. • If the number of buffers available for an operation cannot be predicted in advance, then the algorithm chosen must degrade gracefully as the number of buffers shrinks. • If the number of buffers available is not large enough for a two-pass algorithm, then there are generalizations to algorithms that use three or more passes.
CSCE 608 - 600 Database Systems Chapter 16: Query Compiler
Query Compiler Parsing Logical Query Plan
SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..}
Outline • Convert SQL query to a parse tree • Semantic checking: attributes, relation names, types • Convert to a logical query plan (relational algebra expression) • deal with subqueries • Improve the logical query plan • use algebraic transformations • group together certain operators • evaluate logical plan based on estimated size of relations • Convert to a physical query plan • search the space of physical plans • choose order of operations • complete the physical query plan
Parsing • Goal is to convert a text string containing a query into a parse tree data structure: • leaves form the text string (broken into lexical elements) • internal nodes are syntactic categories • Uses standard algorithmic techniques from compilers • given a grammar for the language (e.g., SQL), process the string and build the tree
Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960) Assume we have a simplified grammar for SQL.
SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Attribute> LIKE <Pattern> nameMovieStar birthDate‘%1960’ Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> <Tuple> IN <Query> titleStarsIn <Attribute> ( <Query> ) starName <SFW>
The Preprocessor • replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query) • does semantic checking: • are relations and views mentioned in the schema? • are attributes mentioned in the current scope? • are attribute types correct?
Outline • Convert SQL query to a parse tree • Semantic checking: attributes, relation names, types • Convert to a logical query plan (relational algebra expression) • deal with subqueries • Improve the logical query plan • use algebraic transformations • group together certain operators • evaluate logical plan based on estimated size of relations • Convert to a physical query plan • search the space of physical plans • choose order of operations • complete the physical query plan
Convert Parse Tree to Relational Algebra • Complete algorithm depends on specific grammar, which determines forms of the parse trees • Here give a flavor of the approach
Conversion • Suppose there are no subqueries. • SELECT att-list FROM rel-list WHERE cond is converted into PROJatt-list(SELECTcond(PRODUCT(rel-list))), or att-list(cond( X (rel-list)))
<Query> SELECT movieTitle FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE '%1960'; <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> , <FromList> AND <Condition> movieTitleStarsIn <RelName> <Attribute> LIKE <Pattern> MovieStarbirthdate'%1960' <Condition> <Attribute> = <Attribute> starName name
Equivalent Algebraic Expression Tree movieTitle starname = name AND birthdate LIKE '%1960' X StarsIn MovieStar
Handling Subqueries • Recall the (equivalent) query: SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); • Use an intermediate format called two-argument selection
Example: Two-Argument Selection title StarsIn <condition> <tuple> IN name <attribute> birthdate LIKE ‘%1960’ starName MovieStar
Converting Two-Argument Selection • To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression • Different rules depending on the nature of the subquery • Here show example for IN operator and uncorrelated query (subquery computes a relation independent of the tuple being tested)
Rules for IN C R <Condition> X R t IN S S C is the condition that equates attributes in t with corresponding attributes in S
Example: Logical Query Plan title starName=name StarsIn name birthdate LIKE ‘%1960’ MovieStar
What if Subquery is Correlated? • Example is when subquery refers to the current tuple of the outer scope that is being tested • More complicated to deal with, since subquery cannot be translated in isolation • Need to incorporate external attributes in the translation • Some details are in textbook
Outline • Convert SQL query to a parse tree • Semantic checking: attributes, relation names, types • Convert to a logical query plan (relational algebra expression) • deal with subqueries • Improve the logical query plan • use algebraic transformations • group together certain operators • evaluate logical plan based on estimated size of relations • Convert to a physical query plan • search the space of physical plans • choose order of operations • complete the physical query plan
Improving the Logical Query Plan • There are numerous algebraic laws concerning relational algebra operations • By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently • Next we'll survey some of these laws
Associative and Commutative Operations • product • natural join • set and bag union • set and bag intersection • associative: (A op B) op C = A op (B op C) • commutative: A op B = B op A
Laws Involving Selection • Selections usually reduce the size of the relation • Usually good to do selections early, i.e., "push them down the tree" • Also can be helpful to break up a complex selection into parts
Selection Splitting • C1 AND C2 (R) = C1 ( C2 (R)) • C1 OR C2 (R) = (C1 (R)) Uset (C2 (R)) if R is a set • C1 ( C2 (R)) = C2 ( C1 (R))
Selection and Binary Operators • Must push selection to both arguments: • C (R U S) = C (R) U C (S) • Must push to first arg, optional for 2nd: • C (R - S) = C (R) - S • C (R - S) = C (R) - C (S) • Push to at least one arg with all attributes mentioned in C: • product, natural join, theta join, intersection • e.g., C (R X S) = C (R) X S, if R has all the atts in C
Pushing Selection Up the Tree • Suppose we have relations • StarsIn(title,year,starName) • Movie(title,year,len,inColor,studioName) • and a view • CREATE VIEW MoviesOf1996 AS SELECT * FROM Movie WHERE year = 1996; • and the query • SELECT starName, studioName FROM MoviesOf1996 NATURAL JOIN StarsIn;
Remember the rule C(R S) = C(R) S ? The Straightforward Tree starName,studioName year=1996 StarsIn Movie
starName,studioName starName,studioName starName,studioName year=1996 year=1996 year=1996 year=1996 StarsIn StarsIn Movie StarsIn Movie push selection up tree push selection down tree Movie The Improved Logical Query Plan
Laws Involving Projections • Consider adding in additional projections • Adding a projection lower in the tree can improve performance, since often tuple size is reduced • Usually not as helpful as pushing selections down • If a projection is inserted in the tree, then none of the eliminated attributes can appear above this point in the tree • Ex: L(R X S) = L(M(R) X N(S)), where M (resp. N) is all attributes of R (resp. S) that are used in L • Another example: • L(R Ubag S) = L(R) UbagL(S) But watch out for set union!
Push Projection Below Selection? • Rule: L(C(R)) = L(C(M(R))) where M is all attributes used by L or C • But is it a good idea? SELECT starName FROM StarsIn WHERE movieYear = 1996; starName starName movieYear=1996 movieYear=1996 starName,movieYear StarsIn StarsIn
Joins and Products • Recall from the definitions of relational algebra: • R C S = C(R X S) (theta join) • R S = L(C(R X S)) (natural join) where C equates same-name attributes in R and S, and L includes all attributes of R and S dropping duplicates • To improve a logical query plan, replace a product followed by a selection with a join • Join algorithms are usually faster than doing product followed by selection
Duplicate Elimination • Moving down the tree is potentially beneficial as it can reduce the size of intermediate relations • Can be eliminated if argument has no duplicates • a relation with a primary key • a relation resulting from a grouping operator • Legal to push through product, join, selection, and bag intersection • Ex: (R X S) = (R) X (S) • Cannot push through bag union, bag difference or projection