150 likes | 246 Views
CS4432: Database Systems II. Query Optimizer – Cost Based Optimization. SQL query. parse. parse tree. convert. answer. logical query plan. execute. apply laws. statistics. Pi. “improved” l.q.p. pick best. estimate result sizes. {(P1,C1),(P2,C2)...}. l.q.p. +sizes.
E N D
CS4432: Database Systems II Query Optimizer – Cost Based Optimization
SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..}
sname rating > 5 bid=100 sid=sid Sailors Reserves (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Nested Loop Join) sid=sid Sailors Reserves A Query (Evaluation) Plan • An extended relational algebra tree • Annotations at each node indicate: • access methods to use for each table. • implementation methods used for each relational operator.
How to cost a physical plan? • We need estimated size of intermediate results – Chapter 16.4 • Cost of each operator/algorithm – Chapter 15 • Buffer available for the query
Result of cost-based optimization • Good physical plan • Consider different join orderings • Consider different access methods for accessing the relations
How to generate that ‘good’ Physical Plan? Many alternate search algorithms are possible: • Exhaustive listing of all possible plans • Dynamic programming • Branch and bound • Greedy bottom-up plan construction NOTE: often only left-deep trees are being considered to keep the search space small.
D D C C D B A C B A B A Why left-deep trees? • Fundamental decision in System R (IBM): • Only left-deep join treesare considered. • Left-deep trees can generate all fully pipelinedplans. • Intermediate results not written to temporary files. • Not all left-deep trees are fully pipelined (e.g., SM join).
Enumeration of Left-Deep Trees • Left-deep trees differ in : • the order of relations, • the access method for each relation, and • the join method for each join. • Number of left deep plans still exponential – n relations implies n! left-deep tree orderings
Enumeration of Left-Deep Trees • Enumerated using N passes (if N relations joined): • Pass 1: Find best 1-relation plan for each relation. • Pass 2: Find best way to join result of each 1-relation plan (as outer) to another relation. (All 2-relation plans.) • Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to the N’th relation. (All N-relation plans.) • For each subset of relations, retain: • Cheapest plan overall, plus • Cheapest plan for each interesting orderof the tuples. Pass 1 A B C D Pass 2 Pass 3
Enumeration Example • Example 16.3.3, Also read Chapter 16.6.5 • If too many relations (Chapter 16.6.6): • Dynamic Programming expensive if too many relations (say more than 6 relations). • Use greedy (faster algorithm, but may yield plans not as good as Dynamic Programming)
Operator Types • Stateful versus stateless operators • Select is stateless • Join is stateful • Blocking versus non-blocking operators • Select is non-blocking • Agg functions are blocking • Pipelined versus non-pipelined operators • Select is pipelinable • What about Join ? (see next slide)
Join? • Join : Revelation is that it depends on the implementation strategy chosen for an operator • Iteration-join : pipelinable • Merge-sort join : blocking • Index join : pipelinable • Hash join : blocking
Costing of a complete plan • We went over an example query plan • Important: first we classify operators as pipelined or not-pipelined • If pipelined, then for stateless operators the IO cost is zero (for example, for Select or Project)
Costing of a Complete Query Plan • What about a Select? How is it implemented? • If in middle of plan, pipeline it (one tuple at a time iteration) • If at leaf of plan, identify any potential index to use index-lookup to implement the Select • If index available, cost of implemention the select operator is equal to cost of an index lookup
Costing of a complete plan • Main idea: • Determine # of distinct values – V(R,a) • Determine physical impl. Strategies per operator • Then, compute IO costs for each operator • Then, sum up all costs. • Done.