190 likes | 863 Views
Semijoin Reduction in Query Processors. Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE 2001. Introduction . Semijoin operator: B SJ A Reduces size of B – can reduce cost of other operations. A. B. A’. Agenda.
E N D
Semijoin Reduction in Query Processors Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE 2001
Introduction • Semijoin operator: B SJ A • Reduces size of B – can reduce cost of other operations A B A’
Agenda • Usefulness of semijoin reducers • Integration of semijoin reducers in a dynamic programming optimizer • Performance experiments • Variants for complex queries • Discussion
Distributed System • Reducing inter-site communication • Traditional application of semijoins • In a symmetric system useful when • Redundant semijoins are cheap • Cause significant reduction of intermediate results • In a centralized system, can help in reducing disk I/O
Client Server System • Will lead to load balancing effects • Clients can communicate with servers, servers mutually cannot • If C can be reduced – will reduce response time, and communication cost client client SJ C(2) B(1) A(1) C(2) A(2) A(1) B(1)
Search Space • Join predicates: • Full join • Semijoin: A generalized selection on A • Controlled application of semijoins • Avoid plans like ((A SJ B) SJ B) • Redundant plans have no new predicates applied at a semijoin • Allow only join operations at a node which apply predicates not yet applied in a subtree • Predicate space thrice as big as that using joins
Dynamic Programming • Used in most commercial optimizers • Bottom-up optimization technique
Access Root Algorithm • Implements the conventional approach to semijoins • Semijoins used to reduce base tables • Very easily integrated with existing systems
Extension needed • Every table appears at most once in a reducing plan • Miss plans like (A SJ B) SJ (C SJ B) • Usually intermediate join results are large – most benefit from reduction • Join Root Algorithm
Join Root Algorithm • Complete search space of non-redundant semijoin plans • Semijoins are applied at all query plan levels • Plans with multiple occurrences of tables • Semijoin and join generation integrated into one phase
Join Root Algorithm • Steps: • Generate the initial set of base table access plans and include them in for all i • As in the dynamic pgming algorithm, optimize all subsets of size i, and use for size i+1 • Consider a subset S of size i • Consider a proper subset O of S. In the std dynamic pgming algo we perform: S O O S-O S-O
Join Root Algorithm • In the join root algo we perform • Note that these plans are stored as plans for S P S-O O-P P O S-O SJ O P S-O
Join Root Algorithm • However since semijoin is only a reducer, the plan should be a plan for O • Thus plans are rearranged into their actual semantic categories • After that plans are pruned as before • After the rearrangement, it might be that several plans having semijoins are incomplete • Completed by applying joins using a fixpoint iteration scheme
Qualitative Aspects • Join graph topology: very important factor • More predicates make the ‘join root’ advantageous • Allocation schema: useful mainly for distributed systems with replication • Query complexity: running time of ‘join root’ suffers with large number of relations • Network topology: • Lower bandwidth availability and n/w restrictions (client server system) increase relative benefits
Quality of Plans • 5 way join query in a client server environment with two servers Avg SC Avg SC
Heuristics • Best k variants • Use of Base tables only as reducers • Limit the number of fix point iterations • These heuristics improve running time considerably without affecting quality in most cases