360 likes | 521 Views
Query Processing: Joins. Computing Joins. One of the most expensive operations Lot of methods devised to join efficiently Block nested loops join Index nested loops join Sort-merge join Hash join. Schema for Examples. Buyers( id : integer , name : string, rating : integer, age : real)
E N D
Computing Joins • One of the most expensive operations • Lot of methods devised to join efficiently • Block nested loops join • Index nested loops join • Sort-merge join • Hash join
Schema for Examples Buyers(id: integer, name: string, rating: integer, age: real) Bids (bid: integer, pid: integer, day: dates, product: string) • Bids: • Each tuple is 40 bytes long, 100 tuples per page, 1000 pages (i.e., 100,000 tuples, 4MB total) • Buyers: • Each tuple is 50 bytes long, 80 tuples per page, 500 pages (i.e., 40,000 tuples, 2MB total)
Computing Joins:Equality Joins With One Join Column SELECT * FROM Bids R, Buyers S WHERE R.bid=S.id • In algebra: R S. Common! Must be carefully optimized. R S is large; so, R S followed by a selection is inefficient. • Assume: M pages in R, pR tuples per page, N pages in S, pS tuples per page. • We will consider more complex join conditions later. • Cost metric: # of page I/Os. We will ignore output costs.
Simple Nested Loops Join foreach tuple tr in Bids R do foreach tuple ts in Buyers S do if tr.bid= ts.id then output (tr, ts) • For each tuple in the outer relation R, we scan the entire inner relation S. • Cost: M + (pR * M) * N = 1000 + 100*1000*500 I/Os. = 50,001,000 pages!
Page-Oriented Simple Nested Loops Join foreach page r in R do foreach page s in S do foreach tuple tr in r do foreach tuple ts in s do if tr.bid= ts.id then output (tr, ts) • Page-oriented Nested Loops join: For each page of R, get each page of S, and write out matching pairs of tuples <tr, ts> • Cost: M + M*N = 1000 + 1000*500 = 501000 • If smaller relation (S) is outer, cost = 500 + 500*1000 = 500500 • Better to put smaller relation as outer loop
Block-Nested Loops Join • Cost can be reduced to M + (M/B-2) Nby using B buffers B-2
. . . Block Nested Loops Join • Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block’’ of outer R. • Use in-memory hash-table for efficient matching. R & S Join Result Hash table for block of R (k < B-1 pages) . . . . . . Output buffer Input buffer for S
Examples of Block Nested Loops • Cost: Scan of outer + #outer blocks * scan of inner • #outer blocks = • With Bids (R) as outer, and say B-2=100 (blocksize): • Cost of scanning R is 1000 I/Os; a total of 10 blocks. • Per block of R, we scan Buyers (S); 10*500 I/Os. • Total Cost: 1000 + 10*500 = 6000 pages • With 100-page blocksize of Buyers S as outer: • Cost of scanning S is 500 I/Os; a total of 5 blocks. • Per block of S, we scan Reserves; 5*1000 I/Os. • Total Cost: 500 + 5*1000 = 5500 pages! • So, cheaper to keep smaller relation as outer
Index Nested Loops Join foreach tuple tr in R do foreach tuple ts in S if bid == id(use index) add <tr, ts> to result • If there is an index on the join column of one relation (say S), can make it the inner and exploit the index. • Cost: M + ( (M*pR) * cost of finding matching S tuples) • For each R tuple, cost of probing S index is about 1.2 for hash index, 2-4 for B+ tree. Cost of then finding S tuples depends on clustering. • Clustered index: 1 I/O (typical), unclustered: upto 1 I/O per matching S tuple.
Examples of Index Nested Loops • Hash-index on id of Buyers (as inner): • Scan Bids R: 1000 page I/Os, 100*1000 tuples. • For each R tuple: 1.2 I/Os to get data entry in index, plus 1 I/O to get (the exactly one) matching S tuple. • Total: 100000*2.2 + 1000 = 221,000 I/Os • Hash-index on bid of Bids (as inner): • Scan Buyers S: 500 page I/Os, 80*500 tuples. • For each S tuple: 1.2 I/Os to find index page with data entries, plus cost of retrieving matching R tuples. Assuming uniform distribution, 2.5 bids per buyer (100,000 / 40,000). Cost of retrieving them is 1 or 2.5 I/Os depending on whether the index is clustered. • Total: 500+40000*2.2 = 88,500 (clustered) or 500+40000*3.5 = 140,500 (unclustered)
Sort-Merge Join (R S) • Sort R and S on the join column, then scan them to do a “merge’’ (on join col.), and output result tuples. • Advance scan of R until current R-tuple >= current S tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple. • At this point, all R tuples with same value in Ri (current R group) and all S tuples with same value in Sj (current S group) match; output <tr, ts> for all pairs of such tuples. • Then resume scanning R and S. • R is scanned once; each S group is scanned once per matching R tuple. (Multiple scans of an S group are likely to find needed pages in buffer.)
Sort-Merge Join sort Bids R on bid sort Buyers S on id while !eof(R) and !eof(S) do scan r and s concurrently until tR.bid=tS.id if (tR.bid=tS.id=c) output bid=cid=c) tR.bid = c R S tS.id = c
Sort-Merge Join • Cost of sorts assuming B buffers = 2 Mlog B-1 M + 2 N log B-1 N • Cost of merge depends on whether matching “groups” can be fit in buffers • If yes, merge step takes M+ N • Cost: 2M logB-1M + 2N logB-1N + (M+N) • Let blocksize(B-1) = 100, then cost is:2*1000*1.5 + 2*500*1.35 + 1000+500 = 5850 pages!
Hash-Join • Step 1: Hash R on bid and S on id into the same set of buckets • Step 2: Since matching tuples must be in same bucket, read each bucket in turn and output the result of the join • Cost (assuming each bucket fits in memory) = 3 (M+ N ) (not including final output cost)
Original Relation Partitions OUTPUT 1 1 2 INPUT 2 hash function h . . . B-1 B-1 B main memory buffers Disk Disk Partitions of R & S Join Result Hash table for partition Ri (k < B-1 pages) hash fn h2 h2 Output buffer Input buffer for Si B main memory buffers Disk Disk Hash-Join • Partition both relations using hash function h: R tuples in partition i will only match S tuples in partition i. • Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, search for matches.
Observations on Hash-Join • If we build an in-memory hash table to speed up the matching of tuples, a little more memory is needed. • If the hash function does not partition uniformly, one or more R partitions may not fit in memory. Can apply hash-join technique recursively to do the join of this R-partition with corresponding S-partition.
Cost of Hash-Join • In partitioning phase, read+write both relations: 2(M+N). In matching phase, read both: M+N • Cost: 3(M+N) = 3(1000+500) = 4500 I/Os. • Sort-Merge Join vs. Hash Join: • Given a minimum amount of memory both have a cost of about 3(M+N) I/Os (assuming that logB-1M=1). • Hash Join superior if relation sizes differ greatly. Also, Hash Join shown to be highly parallelizable. • Sort-Merge less sensitive to data skew; result is sorted!
General Join Conditions • Equalities over several attributes (e.g., R.A=S.A ANDR.B=S.B): • For Index NL, build index on <A, B> (if S is inner); or use existing indexes on A or B. • For Sort-Merge and Hash Join, sort/partition on combination of the two join columns. • Inequality conditions (e.g., R.B < S.B): • For Index NL, need (clustered!) B+ tree index. • Range probes on inner; # matches likely to be much higher than for equality joins. • Hash Join not applicable. • Sort Merge Join can be used, but more expensive • Block NL quite likely to be the best join method here.
Comparison • Hash Join seems best, but is sensitive to data skew • Sort-merge join had good asymptotic behavior (i.e., as relation size increases) • Sometime nested-block loops join can be the best (depends on relation parameters) • Choice may be affected by indices (e.g., Clustered B-Tree on some relation in sort-merge join)
Set Operations • Intersection and cross-product special cases of join. • Union (Distinct) and Except similar; we’ll do union. • Sorting based approach to union: • Sort both relations (on combination of all attributes). • Scan sorted relations and merge them. • Alternative: Merge runs from Pass 0 for both relations. • Hash based approach to union: • Partition R and S using hash function h. • For each S-partition, build in-memory hash table (using h2), scan corresponding R-partition and add tuples to table while discarding duplicates.
Aggregate Operations (AVG, MIN, etc.) • Without grouping: • In general, requires scanning the relation. • Given index whose search key includes all attributes in the SELECT or WHERE clauses, can do index-only scan. • With grouping: • Sort on group-by attributes, then scan relation and compute aggregate for each group. (Can improve upon this by combining sorting and aggregate computation.) • Similar approach based on hashing on group-by attributes. • Given tree index whose search key includes all attributes in SELECT, WHERE and GROUP BY clauses, can do index-only scan; if group-by attributes form prefix of search key, can retrieve data entries/tuples in group-by order.
Impact of Buffering • If several operations are executing concurrently, estimating the number of available buffer pages is guesswork. • Repeated access patterns interact with buffer replacement policy. • e.g., Inner relation is scanned repeatedly in Simple Nested Loop Join. With enough buffer pages to hold inner, replacement policy does not matter. Otherwise, MRU is best, LRU is worst (sequential flooding). • Does replacement policy matter for Block Nested Loops? • What about Index Nested Loops? Sort-Merge Join?
Summary • A virtue of relational DBMSs: queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!). • Many alternative implementation techniques for each operator; no universally superior technique for most operators. • Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops.
Query Evaluation • Problem: An SQL query is declarative – does not specify a query execution plan. • A relational algebra expression is procedural – there is an associated query execution plan. • Solution: Convert SQL query to an equivalent relational algebra and evaluate it using the associated query execution plan. • But which equivalent expression is best?
Naive Conversion • SELECT DISTINCTTargetList • FROM R1, R2, …, RN • WHERE Condition • is equivalent to • TargetList(Condition (R1 R2 ... Rn)) • but this may imply a very inefficient query execution plan. • Example: Name(Id=ProfId and CrsCode=‘CS532’ (Professor Teaching)) • Result can be < 100 bytes • But if each relation is 50K then we end up computing an • intermediate result Professor Teaching of size 1G • before shrinking it down to just a few bytes. • Problem: Find an equivalent relational algebra expression that can be evaluated “efficiently”.
Query Optimizer • Uses heuristic algorithms to evaluate relational algebra expressions. This involves: • estimating the cost of a relational algebra expression • transforming one relational algebra expression to an equivalent one • choosing access paths for evaluating the subexpressions • Query optimizers do not “optimize” – just try to find “reasonably good” evaluation strategies. Optimal plan can be very hard to find!
Highlights of System R Optimizer • Impact: • Most widely used currently; works well for < 10 joins. • Cost estimation: Approximate art at best. • Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes. • Considers combination of CPU and I/O costs. • Plan Space: Too large, must be pruned. • Only the space of left-deep plans is considered. • Left-deep plans allow output of each operator to be pipelinedinto the next operator without storing it in a temporary relation. • Cartesian products avoided.
Overview of Query Optimization • Plan:Tree of Relational Algebra ops, with choice of algorithm for each op. • Each operator typically implemented using a `pull’ interface: when an operator is `pulled’ for the next output tuple, it `pulls’ on its inputs and computes them. • Two main issues: • For a given query, what plans are considered? • Algorithm to search plan space for cheapest (estimated) plan. • How is the cost of a plan estimated? • Ideally: Want to find best plan. • Practically: Avoid worst plans!
Equivalence Preserving Transformations • To transform a relational expression into another equivalent expression we need transformation rules that preserve equivalence • Each transformation rule • Is provably correct (i.e., does preserve equivalence) • Has a heuristic associated with it
Selection and Projection Rules • Break complex selection into simpler ones: • Cond1Cond2 (R) Cond1(Cond2 (R) ) • Break projection into stages: • attr (R) attr (attr (R)), if attr attr • Commute projection and selection: • attr (Cond(R)) Cond (attr (R)), if attr all attributes in Cond
Commutativity and Associativity of Join(and Cartesian Product as Special Case) • Join commutativity: R S S R • used to reduce cost of nested loop evaluation strategies (smaller relation should be in outer loop) • Join associativity: R (S T) (R S) T • used to reduce the size of intermediate relations in computation of multi-relational join – first compute the join that yields smaller intermediate result • N-way join has T(N) N! different evaluation plans • T(N) is the number of parenthesized expressions • N! is the number of permutations • Query optimizer cannot look at all plans (might take longer to find an optimal plan than to compute query brute-force). Hence it does not necessarily produce optimal plan
Pushing Selections and Projections • Cond (R S) R Cond S • If Cond relates attributes of both R and S • Reduces size of intermediate relation since rows can be discarded sooner • Cond (R S) Cond (R) S • If Cond involves only the attributes of R • Reduces size of intermediate relation since rows of R are discarded sooner • attr(R S) attr(attr (R) S), if attr attr’ attributes(R) • reduces the size of an operand of product
Equivalence Example • C1 C2 C3 (R S) • C1(C2(C3(R S) ) ) • C1(C2(R) C3(S) ) • C2(R) C1C3(S) assuming C2 involves only attributes of R, C3 involves only attributes of S, and C1 relates attributes of R and S