950 likes | 1.19k Views
Query Optimization. CMPT 454. Query Optimization. Introduction Query Execution Unary Operations External Sorting Binary Operations. Query Compilation Steps. Parsing Construct a parse tree for a query Generate equivalent logical query plans
E N D
Query Optimization CMPT 454
Query Optimization • Introduction • Query Execution • Unary Operations • External Sorting • Binary Operations
Query Compilation Steps • Parsing • Construct a parse tree for a query • Generate equivalent logical query plans • Convert the parse tree to a query plan in relational algebra • Transform the plan into more efficient equivalents • Generate a physical plan • Select algorithms for each of the operators in the query • Including details about how tables are to be accessed or sorted
Query Optimization • Generating equivalent logical plans and their physical plans is known as query optimization • Selecting a good query plan entails deciding • Which equivalent plan leads to the most efficient query • Which algorithm should be used to implement an operation • How data from one operation should be passed to the next • These choices depend on database metadata • Size of relations • Number and frequency of attributes • Indexing and data file organization
Query Processing query Query Compilation generating, improving and estimating query plans Query Execution algorithms to implement physical plan operators result
Introduction • A physical query plan is made up of operators • Each implementing one step of the plan and • Corresponding to a relational algebra operation • Some relational algebra operations may require lower level physical operations • Such as scanning a table
Sorting and Scanning • It is sometimes necessary or useful to sort data as it is scanned • To satisfy a query with an ORDER BY clause • Or because an algorithm requires sorted input • There are a number of ways in which a sort scan can be performed • Main memory sorting • B tree index • Multi-way mergesort
Statistics • There are M main memory frames available for an operation • M may or may not be the entire main memory • Information relating to a relation (table) R • B(R) or B – number of blocks of R • T(R) or T – number of tuples (i.e. records) of R • V(R, a) – number of distinct values for attribute a in R • Assume that data is accessed one block at a time from a disk • This assumption will be discussed later
Computation Model • Cost metric • The number of disk I/Os required for an operation • Assume that • Input to an operation is to be read from disk • The result is left in main memory (does not include write cost) • Note that the size of a result should not influence the choice of algorithm for a particular operation • Although it may influence the order of operations
I/O Cost for Scans • The cost to scan R depends on its organization • If R is organized into a single file the cost of a scan is B • And if B < M the cost of a sort scan is B • If R is distributed with records from other tables the cost of a scan may be higher • The use of an index does not reduce the cost to scan R • But may reduce the cost to retrieve a part of R
Introduction • In this section the costs of individual relational algebra operations are considered • There are often a number of algorithms that can be used to implement the same operation • Each operation is considered in isolation • How the individual operations interact is discussed in a later section • Note that the cost of a later operation may be affected by the result of an earlier operation, in terms of • The order of the records in the result • The algorithm used to achieve that result
Selection SELECT * FROM Customer WHERE age > 50 • Only records that match the selection are included in the result • The size of the result depends on the number of records that match the selection • Complex selections that contain more than a single term will be considered later age > 50(Customer)
Selections with No Index • If the file is not sorted on the selection attribute • Perform a file scan, cost is B reads • If the selection attribute is a superkey the scan can be terminated once a match as been found (cost is B/2) • The file is sorted on the selection attribute • Perform a binary search to find the first record that matches the selection criteria • Then read all records that match the selection criteria • Which may entail reading additional pages • log2(B) + the number of pages of matching records - 1
Selections with B+ Tree Index • Find the leaf page of the first index entry that points to a matching record • The number of disk reads is equal to the height of the tree • The cost to find a leaf page in a B+ tree index is usually in the order of 2 or 3 disk reads • Then retrieve the matching records • This cost is determined by the number of leaf pages containing matching entries, and • The number of records to be retrieved, and • Whether the index is primary or secondary
Primary B+ Tree Index • In a primary index the file is sorted on the search key • If the selection is small it may fit on one page • An additional disk read to access this page is required • If the selection is large, additional pages may have to be read • The selection may cover a large range of records, or • May be an equality selection on an attribute with few distinct values • The exact number of pages depends on the number of records that match the selection
Secondary B+ Tree Index • In a secondary index the sort order of the file (if any) is different from the search key of the index • The leaves of the tree form a dense index on the file • Adjacent matching data entries are unlikely to point to records on the same page • A disk read is required for each matching record • For large selections, a secondary index may be less efficient than scanning the entire table! • In practice it is important to recognize this, by estimating the number of records returned by a selection
Secondary B+ Tree Range Selection SELECT * FROM Patient WHERE lName = 'Zoobat' lName = "Zoobat"(Patient) … … … … 4 patients called Zoobat file
Selections with Hash Index • A hash index can be used to efficiently retrieve equality selections • The cost to retrieve the bucket containing the data entries is typically 1 or 2 disk reads • The cost to retrieve the corresponding records depends on whether the index is primary or secondary • If the index is primary, matching records are likely to reside on the same, single, file page whereas • If the index is secondary, matching records are likely to reside on different pages of the file • Hash indexes cannot be used for range selections
Complex Selections • Selections may be made up of a number of terms connected by and () and or () • These terms may reference different attributes • It is useful to know whether or not an index can be used to evaluate a selection • To determine this the selection is first expressed in conjunctive normal form (CNF) • A collection of conjuncts (terms joined by and) • Each conjunct consists either of a single term, or multiple terms joined by or (i.e. disjunctions) • e.g. (A B) C D (A C D) (B C D)
Selections with no Disjunctions • Hash indexes can be used if there is an equality condition for every attribute in the search key • e.g. a single hash index on {city, street, number} • city="London"street="Baker"number=221(Detective)matches • city="Los Angeles"street="Cahuenga"(Detective)does not • Tree indexes can be used if there is a selection on each of the first n attributes of the search key • e.g. B+ index on {city, street, number} • Bothcity="London"street="Baker"number=221(Detective) and • city="Los Angeles"street="Cahuenga"(Detective) match
Selections with no Disjunctions… • If an index matches a subset of the conjuncts • Use the index to return a result that contains some unwanted records • Scan this result for matches to the other conjuncts • city="London"street="Baker"number=221fName = "Sherlock" (Detective) • If more than one index matches a conjunct • Either use the most selective index, then scan the result, discarding records that fail to match to the other criteria • Or use all indexes and retrieve therids • Take the intersection of the rids and retrieve those records
Selectionswith no Disjunctions… • Consider the relation and selection shown below • Detective = {id, fName, lName, age, city, street, number, author} • city="New York"author="Spillane"lName="Hammer"(Detective) • With indexes • Secondary hash index, {city, street, number} • Secondary B+ tree index, {lName, fName} • Secondary hash index, {author} • There are two strategies: • Use the most selective of the two matching indexes, and search the results for the remaining criteria • Use both indexes, take the intersection of the rid can’t be used can be used can be used What if the B+ tree index is primary?
Selections with Disjunctions • If necessary, restate the selection in CNF • If a conjunct includes a disjunction with no index on its attribute, no index can be used for that conjunct • If all the conjuncts contain such a disjunction a file scan is necessary to satisfy the selection • Otherwise, use an index that matches one of the conjuncts and retrieve those records • The remaining selection criteria can then be applied to the results from using that index
Selections with Disjunctions … • Consider a selection of this form • (a b c) (d e f)(R) • Where each of a to f is an equality selection on an attribute • If each of the terms in either of the conjuncts has a matching index • Use the indexes to find therids • Take the union of the rids and retrieve those records • If there are indexes on the attributes of a, b, c, and e • Use the a, b, and c indexes and take the union of the rids • Retrieve the resulting records and apply the other criteria
Selections with Disjunctions … • Consider the selections shown below • (author="King" age>35)(lName="Tam" id=11)(Detective) • (author="King") (lName="Tam" id=11)(Detective) • Indexes on the relation • Secondary B+ tree index, {lName, fName} • Secondary hash index, {author} • Compare the two selections • In the first selection each conjunct contains a disjunction without an index (age, id) so a file scan is required • In the second selection the index on author can be used, and records that don't meet the other criteria removed
Projections SELECT fName, lName FROM Customer • Only columns in the column list are retained • Reducing the size of the result relation • There are two steps to implementing projections • Remove the unwanted columns • Eliminate any duplicate records, but only if DISTINCT has been specified • Duplicates can be removed either by sorting or hashing fName,lName(Customer)
External Sorting A Digression
External Sorting • Two-way external merge sort • External merge sort • Replacement sort • External merge sort notes
External Sorting Introduction • It is often necessary to sort data in a database • To sort the results of a query • To make some other operations more efficient • Bulk loading a tree index • Eliminating duplicate records • Joining relations • The focus on a DB sorting algorithm is to reduce the number of disk reads and writes
Internal vs. External Sorting • Sorting a collection of records that fit within main memory can be performed efficiently • There are a number of sorting algorithms that can be performed in n(log2n) time • That is, with n(log2n) comparisons, e.g., Mergesort, Quicksort, • Many DB tables are too large to fit into main memory at one time • So cannot be read into main memory and sorted • The focus in external sorting is to reduce the number of disk I/Os
Merge Sort – a Brief Reminder • Consider the Mergesort algorithm • Input sub-arrays are repeatedly halved • Until they contain only one element • Sub-arrays are then merged into sorted sub-arrays by repeated merge operations • merging two sorted sub-arrays can be performed in O(n) mergesort(arr, start, end) if(start < end) mid = start + end / 2 mergesort(arr, start, mid) mergesort(arr, mid+1, end) merge(arr, start, mid, mid+1, end)
Naïve External Merge Sort • Read the first two pages (of B pages) of the file • Sort their contents (independently of each other) • Merge the two pages • Write them out to a sorted run of size 2 • Repeat for the rest of the file • Producing B/2 sorted runs of size 2 • Merge the first two sorted runs of size 2 • Read in the first page of the first two runs into input pages • Merge to a single output page, and write it out when full • When all records in an input page have been merged read in the second page of that run
Naïve External Merge Sort … • After the first merge pass the file consists of B/2 sorted runs each of two pages • Read in the first page of each of the first two sorted runs • Leaving a third page free as an output buffer disk pages contain three records sorted runs of size 2 main memory
Memory Usage • Records from the input pages are merged into the output buffer • Once the output buffer is full it's contents are written out to disk, to form the first page of the first sorted run of length 4 disk main memory
Memory Usage • At this point all of the records from one of the input pages have been dealt with • The next page of that sorted run is read into the input page • And the process continues disk main memory
Cost of Naïve Merge Sort • Assume that B = 2k, after the first merge pass there are 2k-1 sorted runs, each two pages long • After the second pass there are 2k-2 sorted runs, of length 4 • After the kth pass there is one sorted run of length B • The number of passes is therefore log2B+1 • log2B is the number of merge passes required • The +1 is for the initial pass to sort each page • Each pass requires that each page be read and written back for a total cost of 2B(log2 B +1) • Note that only 3 frames of main memory are used!
First Stage Improvement • In the first stage of the process each page is read into main memory, sorted and written out • The pages are then ready to be merged • Instead of sorting each page independently, read M pages into main memory, and sort them together • M is the main memory frames available for the sort • After the first pass there will be B/M sorted runs, each of length M • This will reduce the number of subsequent merge passes
Merge Pass Improvement • In the merge passes perform an M-1 way merge rather than a 2 way merge, using • M-1 input pages, one for each of M-1 sorted runs and • 1 page for an output buffer • The first item in each of the M-1 input partitions is compared to each other to determine the smallest • Resulting in less merge passes, and less disk I/O • Each merge pass merges M-1 runs • After the first pass the runs are size (M-1)*M • Since M is the size of the runs from the first stage
Cost of External Merge Sort • The initial pass produces B/M sorted runs of size M • Each merge pass reduces the number of runs by a factor of M-1 • The number of merge passes is logM-1B/M • Each pass requires that the entire file is read and written • Total cost is therefore 2B (logM-1B/M + 1) • As M is typically relatively large this reduction is considerable
Number of Passes • B = 1,000,000 Even a large file can usually be sorted in two passes (a cost of 4B I/Os to sort and write out)
Generating Longer Initial Runs • In the first stage B/M sorted runs of size M are produced • The size of these preliminary runs determines how many merge passes are required • The size of the initial runs can be increased by using replacement sort • On average the size of the initial runs can be increased to 2*M • Using replacement sort increases complexity
Replacement Sort • M-2 pages are used to sort the file • Called the current set • One page is used for input, and • One page is used for output • The current set is filled • … and sorted disk current set input output main memory
Replacement Sort • The next page of the file is read in to the input buffer • The smallest record from the current set (and the input buffer) is placed in the output buffer • The first location of the current set is now available for the first record from the input buffer disk current set input output main memory
Replacement Sort • The process continues until the output buffer is full and • The contents of the input buffer are in the current set • At this point another input page can be read into the input buffer • And the output buffer can be written to disk as the first page of the first run disk current set input output main memory
Replacement Sort • The current set can be periodically re-sorted • At some point a page is read that contains records with values less than the values in records in the output buffer • The rest of the current set must now be written out to complete the first run • The process then begins again disk current set input output main memory
Revisiting I/O Costs • The cost metric used is simplistic • In practice it my be advantageous to make the input and output buffers larger than one page • This reduces the number of runs that can be merged at one time, so may increase the number of passes required • But, it allows a sequence of pages to be read or written to the buffers, decreasing the actual access time per page • We have also ignored CPU costs • If double buffering is used, the CPU can process one part of a run while the next is being loaded into main memory
Note: B+ Trees and Sorting • Primary B+ tree index • The index can be used to find the first page, but • Note that the file is already sorted! • Secondary B+ tree index • Leaves point to data records that are not in sort order • In the worst case, each data entry could point to a different page from its adjacent entries • Retrieving the records in order requires reading all of the index leaf pages, plus one disk read for each record! • In practice external sort is likely to be much more efficient than using a secondary index