530 likes | 614 Views
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications. C. Faloutsos Query Optimization – part 2. General Overview - rel. model. Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing.
E N D
Carnegie Mellon Univ.Dept. of Computer Science15-415 - Database Applications C. Faloutsos Query Optimization – part 2
General Overview - rel. model • Relational model - SQL • Functional Dependencies & Normalization • Physical Design • Indexing • Query optimization • Transaction processing 15-415 - C. Faloutsos
Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos
Sr #1 #2 #3 … #nr Reminder – statistics: • for each relation ‘r’ we keep • nr : # tuples; • Sr : size of tuple in bytes • V(A,r): number of distinct values of attr. ‘A’ 15-415 - C. Faloutsos
Derivable statistics • fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) • br: # blocks (= nr / fr ) 15-415 - C. Faloutsos
Derivable statistics • SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? 15-415 - C. Faloutsos
Selections • we saw simple predicates (A=constant; eg., ‘name=Smith’) • how about more complex predicates, like • ‘salary > 10K’ • ‘age = 30 and job-code=“analyst” ’ • what is their selectivity? 15-415 - C. Faloutsos
Selections – complex predicates • selectivity sel(P) of predicate P : == fraction of tuples that qualify sel(P) = SC(P) / nr 15-415 - C. Faloutsos
count A F grade Selections – complex predicates • eg., assume that V(grade, TAKES)=5 distinct values • simple predicate P: A=constant • sel(A=constant) = 1/V(A,r) • eg., sel(grade=‘B’) = 1/5 • (what if V(A,r) is unknown??) 15-415 - C. Faloutsos
count A F grade Selections – complex predicates • range query: sel( grade >= ‘C’) • sel(A>a) = (Amax – a) / (Amax – Amin) 15-415 - C. Faloutsos
count A F grade Selections - complex predicates • negation: sel( grade != ‘C’) • sel( not P) = 1 – sel(P) • (Observation: selectivity =~ probability) ‘P’ 15-415 - C. Faloutsos
P1 P2 Selections – complex predicates conjunction: • sel( grade = ‘C’ and course = ‘415’) • sel(P1 and P2) = sel(P1) * sel(P2) • INDEPENDENCE ASSUMPTION 15-415 - C. Faloutsos
P1 P2 Selections – complex predicates disjunction: • sel( grade = ‘C’ or course = ‘415’) • sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) • = sel(P1) + sel(P2) – sel(P1)*sel(P2) • INDEPENDENCE ASSUMPTION, again 15-415 - C. Faloutsos
P1 P2 Selections – complex predicates disjunction: in general sel(P1 or P2 or … Pn) = 1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn)) 15-415 - C. Faloutsos
Selections – summary • sel(A=constant) = 1/V(A,r) • sel( A>a) = (Amax – a) / (Amax – Amin) • sel(not P) = 1 – sel(P) • sel(P1 and P2) = sel(P1) * sel(P2) • sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) • UNIFORMITY and INDEPENDENCE ASSUMPTIONS 15-415 - C. Faloutsos
Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best 15-415 - C. Faloutsos
Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost? ‘r’ 1 1 2 ... ... M br 15-415 - C. Faloutsos
Sorting Q1: how to sort (‘external sorting’)? A1: create sorted runs of size M merge ‘r’ 1 1 2 ... ... M br 15-415 - C. Faloutsos
Sorting create sorted runs of size M (how many?) merge them (how?) M ... ... 15-415 - C. Faloutsos
Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ... M ….. ... ... 15-415 - C. Faloutsos
Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br M ….. ... ... 15-415 - C. Faloutsos
Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br M ….. ... ... 15-415 - C. Faloutsos
Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections, aggregations joins estimate cost; pick best 15-415 - C. Faloutsos
Projection - dup. elimination eg., selectdistinct c-id from TAKES How? Pros and cons? 15-415 - C. Faloutsos
Set operations eg., select* from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons? 15-415 - C. Faloutsos
Aggregations eg., selectssn, avg(grade) from TAKES group by ssn How? 15-415 - C. Faloutsos
Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting;projections, aggregations joins 2-way joins n-way joins estimate cost; pick best 15-415 - C. Faloutsos
2-way joins output size estimation: r JOIN s nr, ns tuples each case#1: cartesian product (R, S have no common attribute) #of output tuples=?? 15-415 - C. Faloutsos
2-way joins output size estimation: r JOIN s case#2: r(A,B), s(A,C,D), A is cand. key for ‘r’ #of output tuples=?? <=ns r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
2-way joins output size estimation: r JOIN s case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=?? r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
2-way joins #of output tuples~ nr * ns/V(A,s) or ns * nr/V(A,r) (whichever is less) r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting;projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best 15-415 - C. Faloutsos
2-way joins algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
2-way joins Algorithm #0: (naive) nested loop (SLOW!) for each tuple tr of r for each tuple ts of s print, if they match r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
2-way joins Algorithm #0: why is it bad? how many disk accesses (‘br’ and ‘bs’ are the number of blocks for ‘r’ and ‘s’)? r(A, ...) s(A, ......) nr ns br + nr*bs 15-415 - C. Faloutsos
2-way joins Algorithm #1: Blocked nested-loop join read in a block of r read in a block of s print matching tuples cost: br + br * bs r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos
2-way joins Arithmetic example: nr = 10,000 tuples, br = 1,000 blocks ns = 1,000 tuples, bs = 200 blocks alg#0: 2,001,000 d.a. alg#1: 201,000 d.a. r(A, ...) s(A, ......) 10,000 1,000 1,000 records, 200 blocks 15-415 - C. Faloutsos
2-way joins Observation1: Algo#1: asymmetric: cost: br + br * bs- reverse roles: cost= bs + bs*br Best choice? smallest relation in outer loop r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos
2-way joins Observation2 [NOT IN BOOK]: what if we have ‘k’ buffers available? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos
2-way joins Cost? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples br + br/(k-1) * bs • what if br=k-1? • what if we assign k-1 blocks to inner?) r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos
2-way joins Observation3: can we get rid of the ‘br’ term? cost: br + br * bs A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ......) nr, br ns records, bs blocks 15-415 - C. Faloutsos
2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
2-way joins - other algo’s sort-merge sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted) r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
2-way joins - other algo’s sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order r(A, ...) s(A, ......) nr ns 15-415 - C. Faloutsos
use an existing index, or even build one on the fly cost: br + nr * c (c: look-up cost) 2-way joins - other algo’s s(A, ......) ns r(A, ...) nr 15-415 - C. Faloutsos
hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets 2-way joins - other algo’s s(A, ......) r(A, ...) 0 1 max 15-415 - C. Faloutsos
how to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i 2-way joins - hash join details s(A, ......) Hs-0 Hr-0 r(A, ...) 0 1 max 15-415 - C. Faloutsos
what if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions) 3(br + bs) + 2* max 2-way joins - hash join details 15-415 - C. Faloutsos
Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting;projections, aggregations joins 2-way joins - output size estimation;algorithms n-way joins estimate cost; pick best 15-415 - C. Faloutsos
r1 JOIN r2 JOIN ... JOIN rn typically, break problem into 2-way joins n-way joins 15-415 - C. Faloutsos