1 / 19

Semijoin Reduction in Query Processors

Semijoin Reduction in Query Processors. Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE 2001. Introduction . Semijoin operator: B SJ A Reduces size of B – can reduce cost of other operations. A. B. A’. Agenda.

kali
Download Presentation

Semijoin Reduction in Query Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semijoin Reduction in Query Processors Stocker, Kossman, Braumandl, Kemper Integrating Semi-Join-Reducers into State-of-the-Art Query Processors ICDE 2001

  2. Introduction • Semijoin operator: B SJ A • Reduces size of B – can reduce cost of other operations A B A’

  3. Agenda • Usefulness of semijoin reducers • Integration of semijoin reducers in a dynamic programming optimizer • Performance experiments • Variants for complex queries • Discussion

  4. Distributed System • Reducing inter-site communication • Traditional application of semijoins • In a symmetric system useful when • Redundant semijoins are cheap • Cause significant reduction of intermediate results • In a centralized system, can help in reducing disk I/O

  5. Client Server System • Will lead to load balancing effects • Clients can communicate with servers, servers mutually cannot • If C can be reduced – will reduce response time, and communication cost client client SJ C(2) B(1) A(1) C(2) A(2) A(1) B(1)

  6. Search Space • Join predicates: • Full join • Semijoin: A generalized selection on A • Controlled application of semijoins • Avoid plans like ((A SJ B) SJ B) • Redundant plans have no new predicates applied at a semijoin • Allow only join operations at a node which apply predicates not yet applied in a subtree • Predicate space thrice as big as that using joins

  7. Dynamic Programming • Used in most commercial optimizers • Bottom-up optimization technique

  8. Dynamic Programming

  9. Access Root Algorithm • Implements the conventional approach to semijoins • Semijoins used to reduce base tables • Very easily integrated with existing systems

  10. Access Root

  11. Extension needed • Every table appears at most once in a reducing plan • Miss plans like (A SJ B) SJ (C SJ B) • Usually intermediate join results are large – most benefit from reduction • Join Root Algorithm

  12. Join Root Algorithm • Complete search space of non-redundant semijoin plans • Semijoins are applied at all query plan levels • Plans with multiple occurrences of tables • Semijoin and join generation integrated into one phase

  13. Join Root Algorithm • Steps: • Generate the initial set of base table access plans and include them in for all i • As in the dynamic pgming algorithm, optimize all subsets of size i, and use for size i+1 • Consider a subset S of size i • Consider a proper subset O of S. In the std dynamic pgming algo we perform: S O O S-O S-O

  14. Join Root Algorithm • In the join root algo we perform • Note that these plans are stored as plans for S P S-O O-P P O S-O SJ O P S-O

  15. Join Root Algorithm • However since semijoin is only a reducer, the plan should be a plan for O • Thus plans are rearranged into their actual semantic categories • After that plans are pruned as before • After the rearrangement, it might be that several plans having semijoins are incomplete • Completed by applying joins using a fixpoint iteration scheme

  16. Qualitative Aspects • Join graph topology: very important factor • More predicates make the ‘join root’ advantageous • Allocation schema: useful mainly for distributed systems with replication • Query complexity: running time of ‘join root’ suffers with large number of relations • Network topology: • Lower bandwidth availability and n/w restrictions (client server system) increase relative benefits

  17. Running TimeChoice of algorithm depends on query graph

  18. Quality of Plans • 5 way join query in a client server environment with two servers Avg SC Avg SC

  19. Heuristics • Best k variants • Use of Base tables only as reducers • Limit the number of fix point iterations • These heuristics improve running time considerably without affecting quality in most cases

More Related