70 likes | 90 Views
Explore Randomized Algorithms like Iterative Improvement and Simulated Annealing for optimizing large join queries, along with two-phase optimization using query execution strategies as state space. Examine differences in join strategies, costs, and explore the impact of query parameters.
E N D
Algorithms • Iterative improvement • start at a random state • move to random neighbor with less cost until at local minimum • Simulated annealing • start at a random state • start at initial “temperature” • move to a random neighbor • if neighbor is cheaper, move • if neighbor is more expensive move with some probability (controlled by temperature and cost) • once the algorithm has reached equilibrium, lower temperature and repeat • Two Phase Optimization • Do Iterative Improvement for awhile • Do SA using the output of II as a starting state
State space • all possible execution strategies of a query form a state space • a strategy is a join processing tree • differences between strategies • join order • join method • each point is connected to other points • neighbors are a single change in strategy: • join method • commutativity • associativity • left join exchange • right join exchange • each point in the space has an associated cost: I/O cost only
Evaluation • tree queries and star queries • 5-100 joins • 3 types of relation catalogs • II is okay • SA is bad at first, good in the long run • 2PO is best of both worlds
Analysis: • Query size: • small queries (5-10) show no difference • Variance in catalog parameters • differences increase as size and variety of relations increase • relation size has more impact than selectivity
State Space analysis • It’s a cup/ well