CS 440 Database Management Systems

CS 440 Database Management Systems Query Execution

Queries • SELECT … FROM … WHERE... • Describe what data items we want • Logical query plan • No implementation! • RDBMS converts this to a physical query plan. • It describes how to execute the logical query plan.

Logical Query Plan • Logical Operators • Relational Algebra • Selection, Projection, Join, Union, … • Limited number of operators

Physical Query Plan • Physical operators • Algorithms to implement logical operators. • Logical operator: Join • Hash join, index join, nested loop join, … • Various physical operator for each logical operator

Physical Query Plan • Some physical operations for implicit logical operations • Table scan • Let’s study them to find the most efficient one for a given setting.

Basic Physical Operator: Scan • Read the entire or selected tuples of relation R. • Tuples satisfy some predicates • Table-scan: R resides in the secondary storage, read its blocks one by one. • Index-scan: If there is an index on R, use the index to find the blocks. • More efficient to find tuples that satisfy some predicates

Physical Operators: Making Choices • Logical query plan: what we want. • Physical query plan: how to get it done. • More than one physical plan for each logical plan • How to pick the right one? • We study the costs of physical operators.

Cost Statistics • I/O access is the dominant cost of a physical operator. • B(R): number of blocks of R. • T(R): number of tuples in R. • M: number of blocks that fit in main memory • V(R, A): number of distinct values in attribute A. • Table-scan: B(R)

Sort-Scan • SELECT … FROM … WHERE… ORDER BY • SELECT * FROM Beers ORDER BY name • Used in other physical operators • R has an index on the sorting attribute? • R fits in main memory? • R is too large and does not have index? • Two-pass, multi-way merge sort algorithm

Divide and Conquer • Divide data using physical rule • into smaller chunks that fit into memory • Sort in memory • Merge

Two Pass, Multi-way Merge Sort • Phase 1: Read R in groups of M blocks, sort, and write them as runs of size M on disk. M blocks . . . . . . Disk Disk M blocks of main memory

Two Pass, Multi-way Merge Sort • Phase 2: Merge M – 1 blocks at a time and write the results to disk. • Read one block from each run. • Keep one block for the output. Input 1 . . . . . . Input 2 Output . . . . Input M - 1 Disk Disk M blocks of main memory

Two Pass, Multi-way Merge Sort • Requirement: B(R) <= M (M – 1) or simply B(R) <= M2

Cost of Sort • 2B(R) in the first pass + B(R) in the second pass. • We do not consider the cost of writing the output in physical operators. • The results may be pipelined and never written to disk. • If writing the output on disk: 4B(R).

Cost of Sort • Review: • Table-scan: B(R) • Sort-scan: 3B(R) ( if R does not have index). • Requirement: the relation is clustered. • R is stored in B blocks. • If R is unclustered, its tuples are mixed with tuples from other relations. • Table-scan: T(R). • Sort-scan: T(R) + 2B(R).

Iterator Model • Each physical operator implements three functions: • Open: starts the operation and initializes the data structures. • GetNext: returns the next tuple in the result. • Close: ends the operation and frees the resources. • It enables pipelining • Other option: compute the result of the operator in full and store it in disk or memory: inefficient. • pipelining is not always possible: sort-scan

Operator Categorization • Based on the access method used: • Sorting-based. • Index-based. • Hash-based. • Based on the number of I/O stages: • One pass: reading the blocks only once. Used where the entire relation fits in main memory. • Two pass: relations may not fit in main memory, but they are not extremely large: two pass merge-sort. • Multi-pass: Extremely large relations.

One-pass Algorithms

One-pass Algorithms • Selection s(R) and Projection P(R). • Both can be pipelined. • We need only one block in main memory: M >= 1 • Cost for both is B(R) • No index on the selection attribute • R is clustered on disk. Selection/ Projection Input buffer Output buffer

Duplicate Elimination: d(R) • COUNT( DISTINCT …) • Keep a list of distinct tuples seen so far in main memory • Main memory hash-table, search tree, … . • Requirement: M >= B(d(R)) • We have to effectively estimate the number distinct values in R to plan ahead. • Efficient (probabilistic) estimation algorithms are available. • Cost: B(R) with clustered relation and no index.

Groupingand aggregation: gA, sum(B)(R) SELECT sum(price) FROM Sell GROUP BY(beer) • Keep a list of distinct beers and their sum(price) values in main memory. • The number of distinct beers must not exceed the size of main memory. • Cost: B(R) • Can we use pipelining?

One-pass Algorithms for Set/Bag Operators R ∩ S, R U S, R – S • Bag union? • Just read and combine the tuples. • We need at least one buffer block in main memory. • Cost: B(R) + B(S)

General One-pass Algorithm for Set Operators • Read the smaller relation (R) into main memory. • Build a search data structure over the tuples of R in main memory: hash-table, … • Read the blocks from S and probe its tuples over R’s. • Requirement: M >= B(R). • Cost: B(R) + B(S).

One-pass Algorithms for Set Operators • Set union? • Set intersection? • Set difference? • Join, product, bag difference, bag intersection • See the textbook

One-and-a-half-pass Algorithms

Nested Loop Join: One-and-a-half-pass Algorithm • Tuple-based nested loop join R ∞ S For each tuple r in R For each tuple s in S If r and s match, output (r,s) • R: outer relation, S: inner relation. • We need to fit at least two blocks in main memory • Cost: T(R)T(S) • Can be pipelined using iterator model.

Block-based Nested Loop Join For each (M -1) blocks b in S For each block a in R For each tuple s in b For each tuple r in a If r and s match, output (r,s) • Cost: • Read B(S) blocks from S. • Read B(S) B(R) / (M – 1) blocks from R. • S is better to be the smaller relation.

Two-pass Algorithms

Two-pass Algorithms • We study them according to their access method • Sorting-based Algorithm. • Index-based algorithms. • Hash-based algorithms.

Sorting-based Algorithms: Duplicate Elimination d(R) • Use two-pass multi way merge sort method • Stage 1: Read R in groups of M block, sort, and write them on disk. • Stage 2: Merge M – 1 blocks at a time and output only distinct tuples. • Cost: 3 B(R) • Requirement: B(R) <= M2 • For one-pass algorithm: cost= B(R), B(R) <= M.

Sorting-based Algorithms: Grouping gA, sum(B)(R) • Similar to duplicate elimination. • Sort M blocks and compute sum for sets of tuples in the same group. • Merge M – 1 blocks and update the value of sum. • Cost: 3 B(R) • Requirement: B(R) <= M2

Sorting-based Algorithms: Set Operators • R ∩ S, R U S, R – S • Stage 1: Divide and sort R and S into the buckets of size M. • Stage 2: Merge n runs of R and m runs of S and output the results, where n + m <= M • Cost: 3B(R) + 3B(S) • Requirement: B(R)+B(S) <= M2

Two-pass Join Algorithm based on Sorting • One-pass Join algorithm: min(B(R), B(S)) < M • Read and keep the smaller relation into main memory, go over the larger relation block by block. • Cost: B(R)+B(S) • Nested loop join algorithm: min(B(R), B(S)) > M • Read M-1 blocks of the smaller relation into main memory, read R block by block • Cost: B(R)B(S)/M • Too slow.

Two-pass Join Algorithm based on Sorting • Stage 1: Sort R and S according to the join attribute. • Cost:4B(R) + 4B(S), why? • Stage 2: Load R and S in sorted order, join tuples • Cost: B(R)+B(S) • What if more than M tuples match on join attribute? • We have to use nested loop join algorithm • Cost:5B(R)+5B(S) • Requirements: B(R) <= M2 , B(S) <= M2

Sort-merge Join Algorithm • We like to avoid nested loop join, when the number of matching tuples is more than M. • Sort R and S in M runs (overall) of size M on disk. • Merge and join the tuples. • Cost: 3B(R)+3B(S) • Requirements: B(R) + B(S) <= M2 • More efficient but more strict requirement.

Index-based Algorithms

Selection on equality: sA=a(R) • Clustered index on the selection attribute. • Cost: B(R) / V(R,A), V(R,A) is the number of distinct values of A. • Unclustered index on the selection attribute • Cost:T(R) / V(R,A).

Selection on equality: sA=a(R) • B(R) is 5000, T(R) is 200,000, and V(R,A) is 10: • Index-based selection: • Clustered index on A: B(R) / V(R,A) = 500. • Unclustered index on A: T(R) / V(R,A) = 20,000. • Un-indexed selection: • If R is clustered: B(R) = 5000 • If R is unclustered: T(R) = 200,000. • Observation: for small values of V(R,A), unclustered index is not helpful.

Index-based Join: R∞ S • R has an index over the join attribute. • Read S, for each tuple of S lookup the matching tuple(s) in R. • If S is clustered: • Clustered index on R: B(S) + T(S) B(R) / V(R,A). • Unclustered index on R: B(S) + T(S) T(R) / V(R,A). • Not efficient? • If S is small, we do not need to examine all tuples in R: more efficient than previous algorithms. • If V(R,A) is very large: efficient

Index-based Join: R∞ S • Both R and S have clustered B+ tree indexes over the join attribute. • Best case! • Read blocks from R and S and merge them. • Merge join, zig-zag join • Cost: B(R) + B(S)

Hash-based Algorithms

Hash-based Algorithms: Unary Operators • Partition R into M - 1 buckets of almost equal size using some hashing function. • Store the buckets on disk • All tuples that need to be processed together are in the same bucket. • Duplicate elimination, …. • Load R bucket by bucket in main memory and generate the output.

Relation R Buckets 1 Buffer 2 hash function . . . M-1 Main memory Disk Disk Hash-based Algorithms: Unary Operators • Size of each bucket is almost B(R) / (M – 1) • Does each bucket fit in main memory? • B(R) / (M – 1) <= M, roughly B(R) <= M2 Buffers . . . . . . 1 2 B(R)

Hash-based Algorithm for Grouping g • Stage 1: Partition R into M - 1 buckets • Use the join attribute as the hash key. • Stage 2: Apply grouping to each bucket • Load each bucket in main memory. • All tuples in the same group belong to the same bucket. • Cost: 3 B(R). • Requirement: B(R) <= M2 • Similar to the sort-based algorithm.

Hash-based Algorithms: Binary Operators • Partition R and S into M - 1 buckets of almost equal sizes using some hashing functions. • All tuples that need to be processed together are in the buckets with the same hash value. • Difference, intersection, … • Examine pairs of buckets from R and S • Load the smaller bucket into the main memory and scan the larger bucket.

Hash-based Algorithm for Join • Stage 1: Partition R and S into M - 1 (separate) buckets. • Hash key is the join attribute. • Store the results on disk. • Stage 2: Join the tuples in the pairs of buckets with the same hash value. • Apply a one-pass join algorithm. • Cost: 3 B(R) + 3 B(S). • The smaller bucket must fit in main memory. • Requirement: min( B(R), B(S)) <= M2

Comparing Hash-based and Sort-based Algorithms • Hash-based algorithms need smaller amount of main memory for binary operators • Hash-based: min(B(R), B(S)) <= M2 • Sort-based: B(R) + B(S) <= M2 • Sort-based algorithms generated sorted results • Useful when there is ORDER BY in the query or the following operators need sorted input • It is not easy to generate balanced buckets in hash-based algorithms • More detailed discussion at 15.5.7.

CS 440 Database Management Systems