1 / 28

Cost-Based Query Transformation in Oracle: Optimizing Performance for Efficient Data Processing

Explore cost-based query transformation techniques in Oracle, including state-space search algorithms and results of performance studies. Learn about heuristic and cost-based strategies for query optimization.

jbucklin
Download Presentation

Cost-Based Query Transformation in Oracle: Optimizing Performance for Efficient Data Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost-Based Query Transformation in Oracle Rafi Ahmed, Allison Lee, Andrew Witkowski, Dinesh Das, Hong Su, Mohamed Zait, Thierry Cruanes Oracle USA VLDB 2006

  2. Outline • Traditional relational optimizer • Heuristic-based transformation • Cost-based transformation (CBT) • Examples of transformations in Oracle • Cost-based transformation framework • Techniques for efficient CBT • State-space search algorithms • Results of performance study

  3. Traditional Relational Optimizer • Logical and physical optimization phases • Query transformation based on heuristics or rules • Scope – a single query block • Access path, join orders, and join method

  4. Heuristic-Based Transformation • Early evaluation of restriction and projection • Pruning of redundant operations • Minimizing/merging of QBs • Must not introduce, replicate, or re-position DISTINCT or GROUP-BY operators

  5. Heuristic-Based Unnesting:Q2 Q3 Q2 SELECT d.dept_name, d.budget FROM departments d WHERE EXISTS (SELECT 1 FROM employees e WHERE d.dept_id = e.dept_id and e.salary > 200000; Q3 SELECT d.dept_name, d.budget FROM departments d, employees e WHERE d.dept_id S= e.dept_id and e.salary > 200000;

  6. Table Elimination:Q5 Q6 Q5 SELECT e.name, e.salary FROM employees e left outer join departments d on e.dept_id = d.dept_id; Q6 SELECT e.name, e.salary FROM employees e;

  7. Cost-Based Transformation • Certain transformations must be cost-based • Requires a systematic framework • Combines physical and logical components into a global optimizer

  8. A Query with Subqueries: Q1 SELECT e1.employee_name, j.job_title FROM employees e1, job_history j WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.salary > (SELECT AVG (e2.salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id) and e1.dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id = 'US');

  9. Subquery Unnesting: Q10[Q1 Q10] SELECT e1.employee_name, j.job_title FROM employees e1, job_history j, (SELECT AVG(e2.salary) avg_sal, dept_id FROM employees e2 GROUP BY dept_id) V WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.dept_id = V.dept_id and e1.salary > V.avg_sal and e1.dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id = 'US');

  10. Group-By View Merging:Q11 [Q10 Q11] SELECT e1.employee_name, j.job_title FROM employees e1, job_history j, employees e2 WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e2.dept_id = e1.dept_id and e1.dept_id IN (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id = 'US') GROUP BY e2.dept_id, e1.emp_id, j.rowid, e1.employee_name, j.job_title, e1.salary HAVING e1.salary > AVG (e2.salary);

  11. A Query with Distinct View: Q12 SELECT e1.employee_name, j.job_title e2.employee_name as mgr_name FROM employees e1, job_history j, employees e2, (SELECT DISTINCT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id IN (‘UK’,'US')) VD WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.mgr_id = e2.emp_id and e1.dept_id = VD.dept_id;

  12. Join Predicate Pushdown: Q13 [Q12 Q13] SELECT e1.employee_name, j.job_title e2.employee_name as mgr_name FROM employees e1, job_history j, employees e2, (SELECT dept_id FROM departments d, locations l WHERE d.loc_id = l.loc_id and l.country_id IN (‘UK’,'US') and e1.dept_id = d.dept_id) VD WHERE e1.emp_id = j.emp_id and j.start_date > '19980101' and e1.mgr_id = e2.emp_id;

  13. A UNION-ALL Query:Q14 SELECT e.first_name, e.last_name, job_id, d.department_name, l.city FROM employees e, departments d, locations l WHERE e.dept_id = d.dept_id and d.location_id = l.location_id UNION ALL SELECT e.first_name, e.last_name, j.job_id, d.department_name, l.city FROM employees e, job_history j, departments d, locations l WHERE e.emp_id = j.emp_id and j.dept_id = d.dept_id and d.location_id = l.location_id;

  14. Join Factorization: Q15[Q14 Q15] SELECT V.first_name, V.last_name, V.job_id, d.department_name, l.city FROM departments d, locations l, (SELECT e.first_name, e.last_name, e.job_id, e.dept_id FROM employees e UNION ALL SELECT e.first_name, e.last_name, j.job_id, j.dept_id FROM employees e, job_history j WHERE e.emp_id = j.emp_id) V WHERE d.dept_id = V.dept_id and d.location_id = l.location_id;

  15. Group-By Placement SELECT T2.B2, T3.C3, SUM(T1.A1) AS SM, COUNT(T1.D1) AS CN FROM T1, T2, T3, T4 WHERE T1.B1 = T2.B2 and T2.D2 = T3.D3 and T3.C3 = T4.C4 GROUP BY T2.B2, T3.C3; SELECT V1.B2, V1.C3, SUM(V1.S1*V2.CN2) AS SM, SUM (V1.CN1*V2.CN2) AS CN FROM (SELECT SUM (T1.A1) AS S1, COUNT(T1.D1) AS CN1, T2.B2 AS B2, T3.C3 AS C3 FROM T1, T2, T3 WHERE T1.B1 = T2.B2 and T2.D2 = T3.D3 GROUP BY T2.B2, T3.C3) V1, (SELECT T4.C4 AS C4, COUNT(*) AS CN2 FROM T4 GROUP BY T4.C4) V2 WHERE V1.C3 = v2.C4 GROUP BY V1.B2, V1.C3;

  16. Transformation Interaction • Sequential order of transformations(S.O.T.) • The general strategy: Apply a transformation to all QBs of the SQL statement before applying the next transformation • Interleaving – violates S.O.T. • Perform GBVM after SU to decide upon SU • C (Q) = 40; C(SU(Q)) = 50; C(GBVM(SU(Q))) = 30 => do SU • Juxtaposing– violates S.O.T. • Perform GBVM and JPPD separately to decide upon GBVM • C(Q) = 60; C(GBVM(Q)) = 45; C(JPPD(Q)) = 35 implies No GBVM

  17. Oracle Query Processing Architecture Front-end Heuristic-based Transformation Physical Optimizer Cost-Based Transformation Execution

  18. Cost-Based Transformation Framework • Various transformation techniques • State spaces for the transformations • State-space search algorithms • Capability for deep copying QBs, etc. • Physical optimizer: cost model • Transformation directives and cost annotations

  19. Techniques for Efficient Search • Limiting the number of states • Greedy approach • Cost cut-off • Memory management and caching • Selective interleaving/juxtaposing • Re-use of sub-tree cost annotations

  20. State Space Search Algorithms • Exhaustive Search (2**N) • Enumerates all states in the state space of a QB • Guaranteed to provide the best solution for T • Linear Search (N+1) • Next state is generated from the best state by applying T on the next element. • Linear search assumes transformations on different elements are independent of each other. • A state is generated by applying T on one element at a time. Accept that state as the best so far, only if it reduces the overall cost of the QB.

  21. State Space Search Algorithms, Cont’d • Iterative Improvement Search (N+1, 2**N) • Start from an initial state and move to the next neighboring state looking for a local minima by always choosing a downward move • Repeat the search for another local minima starting with different initial state in the next iteration • Stop the search, if there are no more new states to be generated or some terminating condition has been reached. Choose the most optimal of all the local minima as the best state • Two-pass Search (2) • Choose the best state from two states, where in one state all elements are transformed and in the other none of the elements are transformed.

  22. Optimization Times for a Query with Four Subqueries

  23. Performance Study • Oracle Application: HR, OE, CRM, Financial, Supply Chain, etc. • Schema: 14,000 tables • Average of 8 tables per query (1 – 159) • Workload: 241,000 queries – mostly of SPJ types • 19,000 – 8% – queries relevant to cost-based transformation

  24. Improvement with CBQT

  25. Improvement with Unnesting .

  26. Improvement with JPPD

  27. Summary of Performance ResultsTotal non-SPJ Queries: ~ 19 K

  28. Conclusion • CBQT – a novel contribution • Some new transformations • CBQT essential for complex DSS queries and other applications • Significant performance gain

More Related