210 likes | 216 Views
This article explores various physical query plan alternatives selected from chapters 12, 14, and 15, focusing on query optimization and evaluating costs.
E N D
Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15
Query Optimization • NOTE: SQL provides many ways to express a query. • HENCE: System has many options for evaluating a query. • Optimizer is important for query performance: • Generates alternative plans • Chooses plan with least estimated cost. • Ideally, find best plan. • Realistically, consistently find a quite good one.
A Query Evaluation Plan • An extended relational algebra tree • Annotations at each node indicate: • access methods to use for each table. • implementation methods used for each relational operator.
sname rating > 5 bid=100 sid=sid Sailors Reserves (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Simple Nested Loops) sid=sid Sailors Reserves A Query Evaluation Plan
Query Optimization • Multi-operator Queries: Pipelined Evaluation • On-the-fly: The result of one operator is pipelined to another operator without creating a temporary table to hold intermediate result, called on-the-fly. • Materialized : Otherwise, intermediate results must be materialized before the next operator can access it. C B A
Alternative Plans: Schema Examples Reserves (sid: integer, bid: integer, day: dates, rname: string) Sailors (sid: integer, sname: string, rating: integer, age: real) • Reserves: • Each tuple is 40 bytes long, • 100 tuples per page, • 1000 pages. • Sailors: • Each tuple is 50 bytes long, • 80 tuples per page, • 500 pages.
sname rating > 5 bid=100 sid=sid Sailors Reserves Alternative Plans: Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 RA Tree:
sname rating > 5 bid=100 sid=sid Sailors Reserves (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Simple Nested Loops) sid=sid Sailors Reserves RA Tree: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Costs : 1. Scan Sailors : • For each page of Sailors, scan Reserves • 500+500*1000 I/Os • Or, 2. Scan Reserves • For each page of Reserves, scan Sailors • 1000+1000 * 500 I/Os Plan:
(On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Simple Nested Loops) sid=sid Sailors Reserves Alternative Plans: Motivating Example SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 • Cost: 500+500*1000 I/Os • Typically, bad plan ! • Reasons : • selections could be `pushed’ earlier, • no use made of indexes • Goal of optimization: Find more efficient plan Plan:
sname bid=100 rating > 5 sid=sid Sailors Reserves Alternative Plans - 2 -- No Indexes • Main idea : push selects down.
(On-the-fly) sname (Sort-Merge Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors Alternative Plans - 2 -- No Indexes • Main idea : push selects down.
(On-the-fly) sname (Sort-Merge Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors Alternative Plan - 2 • With 5 buffer pages, cost of plan: • Scan Reserves (1000) + write temp T1 (if we have 100 boats, uniform distribution then it is : 10 pages,). • Scan Sailors (500) + write temp T2 ( if we have 10 ratings then it is : 250 pages). • Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250) • Total: 4060 page I/Os.
(On-the-fly) sname (Sort-Merge Join) sid=sid (Scan; (Scan; write to rating > 5 write to bid=100 temp T2) temp T1) Reserves Sailors Alternative Plans - 2 temp T1 temp T2 • With 5 buffer pages, • Scanning and filtering: • 1010 + 750 IOs • Optimization1: block nested loops join: • join cost = 10+4*250, total cost = 2770. • Optimization2: `push’ projections: • T1 only sid, 10/4=[2.5]=3;T1 fits in 3 pages, • T2 only sid and sname, 250/2=125 pages • cost of BNL drops to 125 IOs, • Total cost < 2000 IOs
sname sid=sid rating > 5 bid=100 Reserves Sailors Alternative Plan : Using Indices? Push Selections Down ? What Indices help here? • Index on Reserves.bid? • Index on Sailors.rating? • Index on Sailors.sid? • Index on Reserves.sid?
(On-the-fly) sname (On-the-fly) rating > 5 sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Example Plan : With Index • With index on Reserves.bid : Assume 100 bid values. Assume 100,000 tuples. Assume 100 tuples/disk • We get 100,000/100 = 1000 tuples • On 1000/100 = 10 disk pages. • If index clustered, Cost = 10 I/Os.
(On-the-fly) sname (On-the-fly) rating > 5 sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Example Plan : With Index • With index on Reserves.bid : Assume 100 bid values. Assume 100,000 tuples. Assume 100 tuples/disk • We get 100,000/100 = 1000 tuples • On 1000/100 = 10 disk pages. • If index clustered, Cost = 10 I/Os.
(On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loop Join, with pipelining ) sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Example Plan : Use Another Index • Index on Sailors? Which? • Selection on Sailors may reduce number of tuples considered in join. • But then requires us to materialize the Sailor tuples again ?
(On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Index Nested Loop with Pipelining: • Outer is not materialized • Projecting out unnecessary fields from outer doesn’t help
(On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) sid=sid Sailors bid=100 Reserves Example Plan Continued • Index on Sailors.sid : • sid is key for Sailors. • At most one matching tuple, • unclustered on sid is OK. • Cost? • For each Reserves tuples (1000): • get matching Sailors tuple (1.2 I/O). • So total 1200 + 10 IOs.
(On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Alternative Plan : With Second Index • Selection Push down? • Push (rating>5) before join ? • Answer: • No, because of availability of sid index on Sailors. • Reason : • No index on selection result. • Then lookup requires scan Sailors.
Summary • A query is evaluated by converting it to a tree of operators and evaluating the operators in the tree. • There are alternative evaluation algorithms for each relational operator. • Query evaluation must compare alternative plans based on their estimated costs • Must understand query optimization to understand performance impact of a given database design on a query workload