1 / 6

Randomized Algorithms for optimizing large join queries

Explore Randomized Algorithms like Iterative Improvement and Simulated Annealing for optimizing large join queries, along with two-phase optimization using query execution strategies as state space. Examine differences in join strategies, costs, and explore the impact of query parameters.

tbatten
Download Presentation

Randomized Algorithms for optimizing large join queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Randomized Algorithms for optimizing large join queries

  2. Algorithms • Iterative improvement • start at a random state • move to random neighbor with less cost until at local minimum • Simulated annealing • start at a random state • start at initial “temperature” • move to a random neighbor • if neighbor is cheaper, move • if neighbor is more expensive move with some probability (controlled by temperature and cost) • once the algorithm has reached equilibrium, lower temperature and repeat • Two Phase Optimization • Do Iterative Improvement for awhile • Do SA using the output of II as a starting state

  3. State space • all possible execution strategies of a query form a state space • a strategy is a join processing tree • differences between strategies • join order • join method • each point is connected to other points • neighbors are a single change in strategy: • join method • commutativity • associativity • left join exchange • right join exchange • each point in the space has an associated cost: I/O cost only

  4. Evaluation • tree queries and star queries • 5-100 joins • 3 types of relation catalogs • II is okay • SA is bad at first, good in the long run • 2PO is best of both worlds

  5. Analysis: • Query size: • small queries (5-10) show no difference • Variance in catalog parameters • differences increase as size and variety of relations increase • relation size has more impact than selectivity

  6. State Space analysis • It’s a cup/ well

More Related