60 likes | 146 Views
Randomized Algorithms for optimizing large join queries. Algorithms. Iterative improvement start at a random state move to random neighbor with less cost until at local minimum Simulated annealing start at a random state start at initial “temperature” move to a random neighbor
E N D
Algorithms • Iterative improvement • start at a random state • move to random neighbor with less cost until at local minimum • Simulated annealing • start at a random state • start at initial “temperature” • move to a random neighbor • if neighbor is cheaper, move • if neighbor is more expensive move with some probability (controlled by temperature and cost) • once the algorithm has reached equilibrium, lower temperature and repeat • Two Phase Optimization • Do Iterative Improvement for awhile • Do SA using the output of II as a starting state
State space • all possible execution strategies of a query form a state space • a strategy is a join processing tree • differences between strategies • join order • join method • each point is connected to other points • neighbors are a single change in strategy: • join method • commutativity • associativity • left join exchange • right join exchange • each point in the space has an associated cost: I/O cost only
Evaluation • tree queries and star queries • 5-100 joins • 3 types of relation catalogs • II is okay • SA is bad at first, good in the long run • 2PO is best of both worlds
Analysis: • Query size: • small queries (5-10) show no difference • Variance in catalog parameters • differences increase as size and variety of relations increase • relation size has more impact than selectivity
State Space analysis • It’s a cup/ well