Determining the Cost of Algebraic Operators

Determining the Cost of Algebraic Operators Background: Memory, Files, Indexes Selection External Sort Join Evaluating Individual Operators Other Finding Efficient Query Plans Optimizer Architecture Cost of Combining Operators Enumerating Plans

Relational Query Optimization Evaluating Queries with Several Operators

Topic • Until now, we have seen how to optimize queries with a single operator • Most queries have several operators • We now discuss how queries with several operators can be optimized

Simplest Way to Implement Complex Queries • Evaluate operators, one at a time • After each evaluation, write temporary results to disk • Read temporary results from disk, as input to the next operator

Simplest Way to Implement Complex Queries • Example: • Compute join and write result T1 to disk • Read T1, compute selection, and write result T2 to disk • Read T2, compute projection, and return to user SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid and R.bid = 100 and S.rating>5 sname(bid=100 and rating>5(Reserves Sailors))

Simplest Way to Implement Complex Queries • Compute join and write result T1 to disk • Read T1, compute selection, and write result T2 to disk • Read T2, compute projection, and return to user • A query plan: sname(bid=100 and rating>5(Reserves Sailors)) sname File scan, write to T2 rating > 5 bid=100 Block nested loops, write to T1 sid=sid Sailors Reserves 6

Simplest Way to Implement Complex Queries • Think about it: • Does this seem efficient? • No. join and select can be calc together, can print the name while choosing it. • 2) Can we use an index for the selection? No. the selection is for temp table that has no index. • 3) How would we implement the projection? • as we said in last part (last slide). • Compute join and write result T1 to disk • Read T1, compute selection, and write result T2 to disk • Read T2, compute projection, and return to user • A query plan: sname(bid=100 and rating>5(Reserves Sailors)) sname File scan, write to T2 rating > 5 bid=100 Block nested loops, write to T1 sid=sid Sailors Reserves 7

Query Optimization: Basic Architecture Query Parser Query Optimizer Plan Generator Plan Cost Estimator Catalog Manager Query Plan Evaluator

Simplifications • SQL Queries may be composed of several blocks • Each block is optimized separately • In our discussion, we assume a single block SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ and S.rating = (SELECT MAX(rating) FROM Sailors) GROUP BY S.sid HAVING count(*)>2

Simplifications • Queries may contain group-by and aggregation • Are applied as a final step in evaluation (how? Hash/ Sort) • We only consider queries without aggregation SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ and S.rating = (SELECT MAX(rating) FROM Sailors) GROUP BY S.sid HAVING count(*)>2

Simplifications • By making the simplifications discussed, we derive queries that can be translated into relational algebra. We show how to evaluate such queries. SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ and S.rating = (SELECT MAX(rating) FROM Sailors) GROUP BY S.sid HAVING count(*)>2

Query Plans

Translating to Algebra Trees SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid and R.bid = 100 and S.rating>5 sname(bid=100 and rating>5(Reserves Sailors)) sname rating > 5 bid=100 sid=sid Sailors Reserves

sname rating > 5 bid=100 sid=sid Sailors Reserves (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Query Plans • A query plan is a relational algebra tree, annotated with access and evaluation methods • Convention: In trees when using block/index nested loops join, left child is outer relation

(On-the-fly) sname (Sort-Merge Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Alternative Plans • There may be many different query plans for the same query. Query optimizer must: • enumerate such plans • choose the best plan

Pipelined Evaluation Evaluation “On the Fly” פעולת ביניים לא נכתבת אל הדיסק אלא מועברת ישירות לפעולה הבאה

Pipelining • When evaluating a query, the results of one operator may have to be fed into another operator. This can be done in two ways: • Materializing: the first operator creates a temporary relation, which is read by the second operator • Pipelining (on the fly computation): the results of the first operator are fed directly into the second operator • Pipelining can save on the I/O costs! • We save the cost of writing a temporary result to disk! • We save the cost of reading for the following operation!

Pushing Selections/Projections and Pipelining Why? • BNLO: R is read once. While reading check the selection cond. If it does add it to the projectile. Without writing to the disk. • INLO: same. • When both possible it is always better to push and pipeline

When is Pipelining Impossible • Impossible if: • Input must be read several times • Block nested loops join – inner relation • Input must be read via anindex • Index nested loops join – inner relation • Input must be read in a specific order (and the previous operator does not output in that order • Sort merge join

Example 1: Pipelining in Unary Operators • Suppose we want to apply two selections on the table Reserves: date<05/05/05 and bid=100. We have a BTree index on date: • Materializing: Find tuples matching date condition using BTree. Store in temporary relation. Scan relation to find tuples matching bid condition • Pipelining: Find tuples matching date condition using BTree. As tuples are retrieved, apply second condition • Pipelining into unary operators is called on-the-flyevaluation

C B A Example 2: Pipelining in Joins • Suppose we want to compute (A B) C • Materializing: Compute A B, store and then read and join with C • Pipelining: The joins can be pipelined if some type of nested loops join is used. • Read blocks of tuples from A • Find matching tuples from B (using a scan or index) • Find matching tuples from C

C B A Pipelining Joins: Questions • Can the execution of the join be pipelined if the join with C is performed using sort-merge join? • No. • Example: R(A, B), S(B, C), T(C, D) • Sort-merge (R, S) will be sorted by B • Sort-merge (S, T) will be sorted by C • Therefore C must be read from the file, Meaning it must have been written to the file

Fully Pipelined Execution • The execution is fully pipelinedif no temporary relations are written to disk throughout the entire query processing

D D C C D B A C B A B A Pipelining Joins: Questions • Which of the following trees can allow for fully pipelined execution (e.g., using a nested loops join)? Middle one.

D D C C D B A C B A B A Left Deep Plans • A plan is left-deep if the inner relation of each join (i.e., the right child of each join operator) is a base relation. Which of the following are left deep? • Left deep plans allow for fully pipelined execution

Alternative Plans Motivating Example

Motivating Example • We will consider the cost of various plans for the following query. • NOTE: No distinct in SELECT clause! SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid and R.bid = 100 and S.rating>5

What is the cost of this plan, if the buffer is of size 5? BR + BS * BR / (B-2) 1000+500*1000/3= 168,000 What would you suggest to change in this plan? Switch between the join and the select Switch between S and R Do the projection early (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves

(On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Alternative Plan: Pushing Selections • Selections generally reduce the sizes of the relations • Often a good strategy to push the selections, i.e., apply early, so that the join is over smaller relations • Which selections could be pushed in this query?

(On-the-fly) sname (Block NL Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors • This is an plan has fully-pushed selections כל הבחירות נעשות כמה שיותר מוקדם

(On-the-fly) sname (Block NL Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors • Find cost, assuming: • 5 buffer pages • 100 different boats, uniform distribution on bid • rating between 1 and 10, uniformly distributed

(On-the-fly) sname (Block NL Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors • Finding cost: קריאת reserve: 1000 כתיבת T1: גודל T1 הוא כ 1000/100. לכן עולה כ 10 קריאת salors: 500 כתיבת T2: 500/2 = 250. לכן כתיבת T2 תעלה כ 250 Block Nested Loop: 10 + 250*10/3 = 1010 סה"כ העלות היא: 2770 = 1000+10+500+250+1010

Pushing Selections/Projectionsand Pipelining If we tried to push selection/projection we would no longer have an index

(On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Alternative Plan: Pushing Projections • Projections also reduce the sizes of the relations • Often a good strategy to push the projections, i.e., apply early, so that the join is over smaller relations • Which projections could be pushed in this query? • Note: in general, duplicate elimination is performed only in the outermost projection

This is an plan has fully-pushed selections and projections (On-the-fly) sname (Block NL Join) sid=sid (On-the-fly; write to T2) (On-the-fly; write to T1) sid sname,sid (Scan) (Scan) rating > 5 bid=100 Reserves Sailors

נניח ש sid תופס 10 byte, ונניח sid, sname תופס 25 byte חישוב עלות קריאת Reserve: 1000 כתיבת T1: 1000/(100*4) = 3 (ערך עליון) קריאת Salors: 500 כתיבת T2: 500/(2*2) = 125 חישוב BNL: 3 + 125*3/3 = 128 סה"כ העלות: 1000+3+500+125+128 = 1756 (On-the-fly) sname (Block NL Join) sid=sid (On-the-fly; write to T2) (On-the-fly; write to T1) sid sname,sid (Scan) (Scan) rating > 5 bid=100 Reserves Sailors

Alternative Plan: Using Indexes • In there are indexes, then it may be preferable to access one (or more) of the relations via the indexes, instead of by using a sequential scan • Suppose we have a clustered hash index on bid of Reserves and a hash index on sid of Sailors

Find cost, assuming that access to hash table indices costs 1.2 I/Os Is the selection fully pushed? No.Is this good? No (index). (On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Alternative Plan: Using Indexes

חישוב עלויות עלות INL: BR+TR(time to find matching raws in S) עלות קריאת R היא בזמן הבחירה, לכן אין צורך לקרוא אותו שוב בשלב ה INL. קריאת Reserve: 1.2 + 1000*100/(100*100) = 11.2 חלוקה אחת ב 100 כי בהסתברות 0.01 נקבל bid=100 חלוקה שניה ב 100 היא בגלל שהאינדקס מכווץ. עלות INL: 1000*(1.2+1) סה"כ העלות: 2211.2 (On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Alternative Plan: Using Indexes

Calculating the Costs: Block NL • N is number of blocks in the outer relation • M is number of blocks in the inner relation • B is number of buffer blocks • Cost, selections/projections are not pipelined: N + M(N/(B-2)) • Cost, selections/projections are pipelined on outer relation: M(N/(B-2)) • Notes: If selections/projections are pushed through the join, then N and M are the sizes of the relations AFTER the selections/projections

Calculating the Costs: Index NL • N is number of blocks in the outer relation • T is number of tuples in the outer relation • S is the cost of traversing the index • Tmis number of tuples in the inner relation that match each tuple in the outer relation • P is number of tuples in the inner relation that fit into a page • Notes: If selections/projections are pushed through the join, then N and T are the sizes AFTER the selections/projections

Calculating the Costs: Sort-Merge Join • N is number of blocks in the outer relation • M is number of blocks in the inner relation • B is number of buffer blocks • Cost: 2N(1+ logB-1(N/B)) + 2M(1+ logB-1(M/B)) + N + M • If outer relation is already sorted: 2N(1+ logB-1(N/B)) + N + M • If inner relation is already sorted: 2M(1+ logB-1(M/B)) + N + M • Bother relations are already sorted: N + M • Notes: If selections are pushed through the join, then N and M are the sizes of the relations AFTER the selections

Determining the Cost of Algebraic Operators

Determining the Cost of Algebraic Operators

Presentation Transcript

CHAPTER 9 Determining the Cost of Capital

Determining the number of wavelengths

Determining the Cost of College

Determining the Cost of Education in New Jersey

Determining the # Of PCs

Determining the Sequence of DNA

Determining the nature of the LLSVP

Determining the nature of the LLSVP

Implementation of Relational Operators/Estimated Cost Select Join

Determining the Age of Rocks

Determining The “All In” Cost to Continue Operating

The Algebraic Approach

Determining the Cost of Algebraic Operators

PEPS, matrix product operators and the algebraic Bethe ansatz

Algebraic

The older patient. Can the EMA assist in determining cost-effectiveness of treatments?

Determining the number of wavelengths

Determining the “TRUE COST” of Low Income Customers

Determining The History Of Agriculture

Determining the Roofing Cost of Your House

Factors Determining the Cost of Custom Drapery in Mississauga