1.57k likes | 1.73k Views
15.1 Slide 5 =1 change Slide 7 =2 change Slide 8 =3 change Slide 9 = 3 changes Slide 10 = 1 change Slide 11 = 3 changes Slide 13 = 3 changes Slide 14 = 1 change Slide 15 = 1 change Slide 16 = 1 change Slide 18 = 3 changes Slide 19 = 2 changes Slide 20 = 1 change. Sweta Shah
E N D
15.1 • Slide 5 =1 change • Slide 7 =2 change • Slide 8 =3 change • Slide 9 = 3 changes • Slide 10 = 1 change • Slide 11 = 3 changes • Slide 13 = 3 changes • Slide 14 = 1 change • Slide 15 = 1 change • Slide 16 = 1 change • Slide 18 = 3 changes • Slide 19 = 2 changes • Slide 20 = 1 change
Sweta Shah CS257: Database SystemsID: 118 Query ExecutionSection 15.1
Agenda • Query Processor • Query compilation • Physical Query Plan Operators • Scanning Tables • Table Scan • Index scan • Sorting while scanning tables • Model of computation for physical operators • Parameters for measuring cost • Iterators
Query Processor • The Query Processor is a group of components of a DBMS that turns user queries and data-modification commands into a sequence of database operations and executes those operations • Query processor is responsible for supplying details regarding how the query is to be executed
Query compilation • Query compilation itself is a multi-step process consisting of : • Parsing: in which a parse tree representing query and its structure is constructed • Query rewrite: in which the parse tree is converted to aninitial query plan • Physical plan generation: where the logical query plan is turned into a physical query plan by selecting algorithms. • Query rewrite and Physical plan generation are called query optimizer
Physical Query Plan Operators • Physical query plans are built from operators • Each of the operators implement one step of the plan. • They are particular implementations for one of the operators of relational algebra. • They can also be non relational algebra operators like “scan” which scans tables and bring into main memory each tuple of some relation • The relation is typically an operand of some other operation.
Scanning Tables • The most basic thing we can do in a physical query plan is to read the entire contents of a relation R. • A variation of this operator involves a simple predicate, where we read only those tuples of the relation R that satisfy the predicate • Necessary - to perform join or union.
Two basic approaches to locating the tuples of a relation R • Table-scan • Relation R is stored in secondary memory with its tuples arranged in blocks • The blocks containing the tuples of R are known to the system • it is possible to get the blocks one by one • This operation is called Table Scan
Two basic approaches to locating the tuples of a relation R • Index-scan • If there is an index on any attribute of R, we may be able to use this index to get all the tuples of R. • a sparse index on R can be used to lead us to all the blocks holding R, even if we don’t know otherwise which blocks these are. • This operation is called index-scan.
Sorting While Scanning Tables • Why do we need sorting while scanning? • the query could include an ORDER BY clause requiring that a relation be sorted • Various algorithms for relational-algebra operations require one or both of their arguments to be sorted relation • Sort-scantakes a relation R and a specification of the attributes on which the sort is to be made, and produces R in that sorted order
Several ways to implement sort-scan • If relation R must be sorted by attribute a, and there is a B-tree index on a,then a scan of the index allows us to produce R in the desired order. • If R is small enough to fit in main memory, then we can retrieve its tuples using a table scan or index scan, and then use a main-memory sorting algorithm. • If R is too large to fit in main memory, then we can use a multiway merge-sort
Model of Computation for Physical Operators • Choosing physical plan operators wisely is an essential for a good query processor. • Cost for an operation is measured in number of disk i/o operations. • If an operator requires the final answer to a query to be written back to the disk, the total cost will depend on the length of the answer and will include the final write back cost to the total cost of the query. • When comparing algorithms for the same operations, assumption -We assume that the arguments of any operator are found on disk, but the result of the operator is left in main memory
Improvements in cost • Major improvements in cost of the physical operators can be achieved by avoiding or reducing the number of disk i/o operations • This can be achieved by passing the answer of one operator to the other in the main memory itself without writing it to the disk. • parameters /statistics- used to express the cost of an operator .Estimates of cost are essential if the optimizer is to determine which of the many query plans is likely to execute fastest
Parameters for Measuring Costs • Parameters that affect the performance of a query • Need a parameter to represent the portion of main memory that the operator uses, and other parameters to measure the size of its argument • Buffer space availability in the main memory at the time of execution of the query • Size of input and the size of the output generated • The size of memory block on the disk and the size in the main memory also affects the performance
Iterators for Implementation of Physical Operators • It is a group of three functions that allows a consumer of the result of the physical operator to get the result one tuple at a time
Iterator • Many physical operators can be implemented as an iterator • The three functions forming the iterator are: Open: • This function starts the process of getting tuples. • It initializes any data structures needed to perform the operation and calls 0pen() for any arguments of the operation.
Iterator GetNext • This function returns the next tuple in the result • Adjusts data structures as necessary to allow subsequent tuples to be obtained • In getting the next tuple of its result, it typically calls GetNextO one or more times on its argument(s). • If there are no more tuples to return, GetNext returns a special value NotFound which we assume cannot be mistaken for a tuple.
Iterator Close • This function ends the iteration after all tuples or all tuples that the consumer wanted, have been obtained. • it calls Close on any arguments of the operator
Query Execution One-pass algorithm for database operations Chetan Sharma 008565661
Changes Made 15.2 Slide 4 – 2 changes Slide 5 – 2 changes Slide 6 – 4 changes Slide 7 – 1 change Slide 9 – 1 change Slide 10 – 2changes Slide 11 – 1 change Slide 12 – 1 change Slide 14 – 1 change Slide 16 – 1change
Overview • One-Pass Algorithm • One-Pass Algorithm Methods: Tuple-at-a-time, unary operations. Full-relation, unary operations. Full-relation, binary operations.
we can divide algorithms for operators into three “degrees” of difficulty and cost: 1)Some methods involve reading the data only once from disk. These are the one-pass algorithms, and they are the topic of this section. Usually, they require at least one of the arguments to fit in main memory, although there are exceptions, especially for selection and projection 2)Some methods work for data that is too large to fit in available main memory but not for the largest imaginable data sets. These two-pass algorithms are characterized by reading data a first time
from disk, processing it in some way, writing all, or almost all, of it to disk, and then reading it a second time for further processing during the second pass. 3) Some methods work without a limit on the size of the data. These methods use three or more passes to do their jobs, and are natural, recursive generalizations of the two-pass algorithms
One-Pass Algorithm • Reading the data only once from disk. • Usually, they require at least one of the arguments to fit in main memory • the one-pass methods -classify operators into three broad groups • -Tuple-at-a-time, unary operations. • -Full-relation, unary operations. • -Full-relation, binary operations
Tuple-at-a-Time • These operations do not require an entire relation, or even a large part of it, in memory at once. Thus, we can read a block at a time, use one main memory buffer, and produce our output. • we require only that M > 1 for the input buffer, regardless of B. • Ex- selection and projection
Tuple-at-a-Time A selection or projection being performed on a relation R
Full-relation, unary operations • Now, let us consider the unary operations that apply to relations as a whole , rather than to one tuple at a time: a)Duplicate elimination. To eliminate duplicates, we can read each block of R one at a time, but for each tuple we need to make a decision as to whether:
-It is the first time we have seen this tuple, in which case we copy it to the output, or -We have seen the tuple before, in which case we must not output this tuple. b)Grouping .
b) Grouping • MIN (a),MAX (a) aggregate, record the minimum or maximum value, respectively, of attribute a seen for any tuple in the group so far. • COUNT aggregation, add one for each tuple of the group that is seen. • SUM (a), add the value of attribute a to the accumulated sum for its group,provided a is not NULL. • AVG (a) is the hard case. We must maintain two accumulations: the count of the number of tuples in the group and the sum of the a-values of these tuples.
b) Grouping When all tuples of R have been read into the input buffer and contributed to the aggregation(s) for their group, we can produce the output by writing the tuple for each group. Note-: that until the last tuple is seen, we cannot begin to create output for aoperation. Thus, this algorithm does not fit the iterator framework very well; The entire grouping has to be done by the Open method before the first tuple can be retrieved
One-Pass Algorithms for Binary Operations • All other operations are in this class: set and bag versions of union, intersection, difference, joins, and products. -Bag union can be computed by a very simple one-pass algorithm • binary operations require reading the smaller of the operands R and S into main memory and building a suitable data structure so tuples can be both inserted quickly and found quickly. • to be performed in one pass is: min(B(R),B(S)) <= M
Some examples In each case, we assume R is the larger of the relations, and we house S in main memory. • Set Union: -We read S into M - 1 buffers of main memory and build a search structure where the search key is the entire tuple. -All these tuples are also copied to the output. -Read each block of R into the Mth buffer, one at a time. -For each tuple t of R, see if t is in S, and if not, we copy t to the output. If t is also in S, we skip t. • Set Intersection : -Read S into M - 1 buffers and build a search structure with full tuples as the search key. -Read each block of R, and for each tuple t of R, see if t is also in S. If so, copy t to the output, and if not, ignore t.
examples continued.. • Product • Read S into M — 1 buffers of main memory; no special data structure is needed. Then read each block of R, and for each tuple t of R concatenate t with each tuple of S in main memory. Output each concatenated tuple as it is formed. • The number of disk I/O ’s is B(R) + B(S), as it must be for a one-pass algorithm on operands R and S,
Summary • One-Pass Algorithm • One-Pass Algorithm Methods: • Tuple-at-a-time, unary operations. • Full-relation, unary operations. • Full-relation, binary operations.
Nested Loops Joins Book Section of chapter 15.3 Guide: Prof. Dr. T.Y. LIN Name: Sanya Valsan Roll: 120
15.3 • Slide 7 = 2changes • Slide 8 = 1 change • Slide 10 = 2 changes • Slide 12 = 2 changes • Slide 13 = 1 changes
Topic to be covered • Tuple-Based Nested-Loop Join • An Iterator for Tuple-Based Nested-Loop Join • A Block-Based Nested-Loop Join Algorithm • Analysis of Nested-Loop Join
Introduction • Nested – loop joins are, in a sense, one and a half passes since in each variation one of the two arguments has its tuples read only once, while the other argument has to be read repeatedly. • Nested-loop joins can be used for relations of any size; it is not necessary that one relation fit in main memory.
15.3.1 Tuple-Based Nested-Loop Join • The simplest variation of nested-loop join has loops that range over individual tuples of the relations involved. In this algorithm, which we call tuple-based nested-loop join, we compute the join as follows R S
Continued • For each tuple s in S DO For each tuple r in R Do if r and s join to make a tuple t THEN output t; • If we are careless about how to buffer the blocks of relations R and S, then this algorithm could require as many as T(R)T(S) disk I/O’s. There are many situations where this algorithm can be modified to have much lower cost. Note: T(R) – number of tuples in relation R.
Continued • One case is when we can use an index on the join attribute or attributes of R to find the tuples of R that match a given tuple of S, without having to read the entire relation R. • Another improvement looks much more carefully at the way tuples of R and S are divided among blocks, and uses as much of the memory as it can to reduce the number of disk I/O's as we go through the inner loop. We shall consider this block-based version of nested-loop join.
15.3.2 An Iterator for Tuple-Based Nested-Loop Join • One advantage of a nested-loop join is that it fits well into an iterator framework, • Open() { • R.Open(); • S.open(); • A:=S.getnext(); } GetNext() { Repeat { r:= R.Getnext(); IF(r= Not found) {/* R is exhausted for the current s*/ R.close(); s:=S.Getnext();
IF( s= Not found) RETURN Not Found; /* both R & S are exhausted*/ R.Close(); r:= R.Getnext(); } } until ( r and s join) RETURN the join of r and s; } Close() { R.close (); S.close (); }
15.3.3 A Block-Based Nested-Loop Join Algorithm We can Improve Nested loop Join by compute R join S. • Organizing access to both argument relations by blocks. This makes sure that when we run through the tuples of R in the inner loop, we use as few disk I/O ’s as possible to read R • Using as much main memory as we can to store tuples belonging to the relation S, the relation of the outer loop.This enables us to join each tuple of R that we read with not just one tuple of S, but with as many tuples of 5 as will fit in memory.
The nested-loop join algorithm FOR each chunk of M-1 blocks of S DO BEGIN read these blocks into main-memory buffers; organize their tuples into a search structure whose search key is the common attributes of R and S; FOR each block b of R DO BEGIN read b into main memory; FOR each tuple t of b DO BEGIN find the tuples of S in main memory that join with t ; output the join of t with each of these tuples; END ; END ; END ;