Randomized Algorithms for optimizing large join queries

Randomized Algorithms for optimizing large join queries

Algorithms • Iterative improvement • start at a random state • move to random neighbor with less cost until at local minimum • Simulated annealing • start at a random state • start at initial “temperature” • move to a random neighbor • if neighbor is cheaper, move • if neighbor is more expensive move with some probability (controlled by temperature and cost) • once the algorithm has reached equilibrium, lower temperature and repeat • Two Phase Optimization • Do Iterative Improvement for awhile • Do SA using the output of II as a starting state

State space • all possible execution strategies of a query form a state space • a strategy is a join processing tree • differences between strategies • join order • join method • each point is connected to other points • neighbors are a single change in strategy: • join method • commutativity • associativity • left join exchange • right join exchange • each point in the space has an associated cost: I/O cost only

Evaluation • tree queries and star queries • 5-100 joins • 3 types of relation catalogs • II is okay • SA is bad at first, good in the long run • 2PO is best of both worlds

Analysis: • Query size: • small queries (5-10) show no difference • Variance in catalog parameters • differences increase as size and variety of relations increase • relation size has more impact than selectivity

State Space analysis • It’s a cup/ well

Randomized Algorithms for optimizing large join queries

Randomized Algorithms for optimizing large join queries

Presentation Transcript

Spatial Join Queries

Randomized Algorithms

Randomized and De-Randomized Algorithms

Optimizing RDF Chain Queries using Genetic Algorithms

Randomized Algorithms

Randomized Algorithms

Randomized Algorithms

Optimizing Join Statements

Join Queries

Randomized Algorithms

Randomized Algorithms

Randomized Algorithms

Randomized Algorithms

Randomized Algorithms

Randomized Algorithms for optimizing large join queries

Randomized Algorithms

Optimizing RDF Chain Queries using Genetic Algorithms

Randomized Algorithms

Optimizing SQL Queries