280 likes | 288 Views
Learn about the cost-based query transformation techniques in Oracle. Explore examples, performance results, and optimization strategies.
E N D
Cost-Based Query Transformation in Oracle Rafi Ahmed, Allison Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zait, Thierry Cruanes Oracle USA VLDB 2006
Outline • Traditional relational optimizer • Heuristic-based transformation • Cost-based transformation (CBT) • Examples of transformations in Oracle • Cost-based transformation framework • Techniques for efficient CBT • State-space search algorithms • Results of performance study
Traditional Relational Optimizer • Logical and physical optimization phases • Query transformation based on heuristics or rules • Scope – a single query block • Access path, join orders, and join method
Heuristic-Based Transformation • Early evaluation of restriction and projection • Pruning of redundant operations • Minimizing/merging of QBs • Must not introduce, replicate, or re-position DISTINCT or GROUP-BY operators
Heuristic-Based Unnesting:Q2 Q3 Q2 SELECT d.dept_name, d.budget FROM departments d WHERE EXISTS (SELECT 1 FROM employees e WHERE d.dept_id = e.dept_id and e.salary > 200000; Q3 SELECT d.dept_name, d.budget FROM departments d, employees e WHERE d.dept_id S= e.dept_id and e.salary > 200000;
Table Elimination:Q5 Q6 Q5 SELECT e.name, e.salary FROM employees e left outer join departments d on e.dept_id = d.dept_id; Q6 SELECT e.name, e.salary FROM employees e;
Cost-Based Transformation • Certain transformations must be cost-based • Requires a systematic framework • Combines physical and logical components into a global optimizer
A Query with Subqueries: Q1 SELECT e1.employee_name, j.job_title FROM employees e1, job_history j WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.salary > (SELECT AVG (e2.salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id) and e1.dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id = 'US');
Subquery Unnesting: Q10[Q1 Q10] SELECT e1.employee_name, j.job_title FROM employees e1, job_history j, (SELECT AVG(e2.salary) avg_sal, dept_id FROM employees e2 GROUP BY dept_id) V WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.dept_id = V.dept_id and e1.salary > V.avg_sal and e1.dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id = 'US');
Group-By View Merging:Q11 [Q10 Q11] SELECT e1.employee_name, j.job_title FROM employees e1, job_history j, employees e2 WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e2.dept_id = e1.dept_id and e1.dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id = 'US') GROUP BY e2.dept_id, e1.emp_id, j.rowid, e1.employee_name, j.job_title, e1.salary HAVING e1.salary > AVG (e2.salary);
A Query with Distinct View: Q12 SELECT e1.employee_name, j.job_title e2.employee_name as mgr_name FROM employees e1, job_history j, employees e2, (SELECT DISTINCT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id IN (‘UK’,'US')) VD WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.mgr_id = e2.emp_id and e1.dept_id = VD.dept_id;
Join Predicate Pushdown: Q13 [Q12 Q13] SELECT e1.employee_name, j.job_title e2.employee_name as mgr_name FROM employees e1, job_history j, employees e2, (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id IN (‘UK’,'US') and e1.dept_id = d.dept_id) VD WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.mgr_id = e2.emp_id;
A UNION-ALL Query:Q14 SELECT e.first_name, e.last_name, job_id, d.department_name, l.city FROM employees e, departments d, locations l WHERE e.dept_id = d.dept_id and d.location_id = l.location_id UNION ALL SELECT e.first_name, e.last_name, j.job_id, d.department_name, l.city FROM employees e, job_history j, departments d, locations l WHERE e.emp_id = j.emp_id and j.dept_id = d.dept_id and d.location_id = l.location_id;
Join Factorization: Q15[Q14 Q15] SELECT V.first_name, V.last_name, V.job_id, d.department_name, l.city FROM departments d, locations l, (SELECT e.first_name, e.last_name, e.job_id, e.dept_id FROM employees e UNION ALL SELECT e.first_name, e.last_name, j.job_id, j.dept_id FROM employees e, job_history j WHERE e.emp_id = j.emp_id) V WHERE d.dept_id = V.dept_id and d.location_id = l.location_id;
Group-By Placement SELECT T2.B2, T3.C3, SUM(T1.A1) AS SM, COUNT(T1.D1) AS CN FROM T1, T2, T3, T4 WHERE T1.B1 = T2.B2 and T2.D2 = T3.D3 and T3.C3 = T4.C4 GROUP BY T2.B2, T3.C3; SELECT V1.B2, V1.C3, SUM(V1.S1*V2.CN2) AS SM, SUM (V1.CN1*V2.CN2) AS CN FROM (SELECT SUM (T1.A1) AS S1, COUNT(T1.D1) AS CN1, T2.B2 AS B2, T3.C3 AS C3 FROM T1, T2, T3 WHERE T1.B1 = T2.B2 and T2.D2 = T3.D3 GROUP BY T2.B2, T3.C3) V1, (SELECT T4.C4 AS C4, COUNT(*) AS CN2 FROM T4 GROUP BY T4.C4) V2 WHERE V1.C3 = v2.C4 GROUP BY V1.B2, V1.C3;
Transformation Interaction • Sequential order of transformations(S.O.T.) • The general strategy: Apply a transformation to all QBs of the SQL statement before applying the next transformation • Interleaving – violates S.O.T. • Perform GBVM after SU to decide upon SU • C (Q) = 40; C(SU(Q)) = 50; C(GBVM(SU(Q))) = 30 => do SU • Juxtaposing– violates S.O.T. • Perform GBVM and JPPD separately to decide upon GBVM • C(Q) = 60; C(GBVM(Q)) = 45; C(JPPD(Q)) = 35 implies No GBVM
Oracle Query Processing Architecture Front-end Heuristic-based Transformation Physical Optimizer Cost-Based Transformation Execution
Cost-Based Transformation Framework • Various transformation techniques • State spaces for the transformations • State-space search algorithms • Capability for deep copying QBs, etc. • Physical optimizer: cost model • Transformation directives and cost annotations
Techniques for Efficient Search • Limiting the number of states • Greedy approach • Cost cut-off • Memory management and caching • Selective interleaving/juxtaposing • Re-use of sub-tree cost annotations
State Space Search Algorithms • Exhaustive Search (2**N) • Enumerates all states in the state space of a QB • Guaranteed to provide the best solution for T • Linear Search (N+1) • Next state is generated from the best state by applying T on the next element. • Linear search assumes transformations on different elements are independent of each other. • A state is generated by applying T on one element at a time. Accept that state as the best so far, only if it reduces the overall cost of the QB.
State Space Search Algorithms, Cont’d • Iterative Improvement Search (N+1, 2**N) • Start from an initial state and move to the next neighboring state looking for a local minima by always choosing a downward move • Repeat the search for another local minima starting with different initial state in the next iteration • Stop the search, if there are no more new states to be generated or some terminating condition has been reached. Choose the most optimal of all the local minima as the best state • Two-pass Search (2) • Choose the best state from two states, where in one state all elements are transformed and in the other none of the elements are transformed.
Performance Study • Oracle Application: HR, OE, CRM, Financial, Supply Chain, etc. • Schema: 14,000 tables • Average of 8 tables per query (1 – 159) • Workload: 241,000 queries – mostly of SPJ types • 19,000 – 8% – queries relevant to cost-based transformation
Conclusion • CBQT – a novel contribution • Some new transformations • CBQT essential for complex DSS queries and other applications • Significant performance gain