440 likes | 592 Views
An Insiders View of How the Optimizer Works. Tang Tao Oracle University Principal Instructor .
E N D
An Insiders View of How the Optimizer Works Tang Tao Oracle University Principal Instructor
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Optimizer evolution CBO Databases became more feature rich As environment changes Potential for plan changes RULES Rule are not enough • In the beginningthere were rules • Optimizer evolved to • be cost based Databases become more real-time, ad-hoc environments CBO Reactive tuning with the use of advisors and auto jobs Reactive tuning not enough • Optimizer proactively adapts to become self-learning
Adaptive Query Optimization Overview • Adaptive Query Optimization Adaptive PlansJoin methods Parallel distribution methods Adaptive StatisticsDiscovered at compile time Discovered at run time Adaptive Plans Adaptive Statistics Parallel distribution Methods Join Methods At compile time At run time
Adaptive Execution Plans Query: Find all of the products with a unit price of 15 that we have sold more that 1 of Two possible join methods for this query Nested Loops Hash Join Adapt join methods
Adaptive Execution Plans Rows coming out of order_items table are buffered up to a point If row count is less than the threshold use nested Loops otherwise switch to hash join Alternative sub-plans are pre-computed Sub-plans stored in the cursor Stats collect inserted before join Rows buffered until final decision is made Adapt join methods Table scan Prod_info NESTED LOOPS HASH JOIN Default Plan is a nested loops join Table scan Order _items Index Scan Prod_info_ind Statistics Collector
Adaptive Execution Plans Statistics collector disabled after decision is made and becomes a pass through operation Number of rows seen in statistics collector exceeds threshold Plan switches to hash join Statistics collect disabled Plan resolved on first execution & remains the same for subsequent executions Adapt join methods Statistics Collector Table scan Prod_info NESTED LOOPS HASH JOIN Final Plan is a hash join Table scan Order _items Index Scan Prod_info_ind
Adaptive Execution Plans Explain plan command always shows default plan Example shows a nested loops join as default plan No statistics collector shown in plan Displaying the default plan
Adaptive Execution Plans After the statement has completed use DBMS_XPLAN.DISPLAY_CURSOR tosee the final plan selected Example shows that hash join picked at execution time Again the statistics collector is not visible in the plan Displaying the final plan
Adaptive Execution Plans Full adaptive plan displayed when format parameter ‘+adaptive’ is set Example shows both the nested loops and hash join in the plan Displaying the full adaptive plan
Adaptive Execution Plans Additional information displayed on why operations are inactive can be seen with format parameter ‘+report’ Displaying plan with +adaptive & +report formats
Adaptive Execution Plans New column in V$SQL IS_RESOLVED_ADAPTIVE_PLAN Indicates statement had an adaptive plan which was resolved on first execution Resolved plan is used for subsequent executions Statistics collectors and buffering is disabled Indicator in V$SQL
Adaptive Execution Plans Adaptive plans are enabled by default Can be put in reporting mode OPTIMIZER_ADAPTIVE_REPORTING_ONLY Reporting mode shows what would have happened during execution in the plan Reporting mode
Adaptive Execution Plans Adaptive join methods • Join method decision deferred until runtime • Default plan is computed using available statistics • Alternate sub-plans are pre-computed and stored in the cursor • Statistic collectors are inserted at key points in the plan • Final decision is based on statistics collected during execution • Default plan and sub-plans have a valid range for statistics collected • If statistics prove to be out of range, the sub-plans will be swapped • Requires buffering near the swap point to avoid returning rows to user • Possible sub-plans are nested loop joins or hash joins and vice versa
When user issues a parallel SQL statement theshadowprocess becomes the Query Coordinator User Parallel servers communicate among themselves & the QC using messages that are passed via memory buffers in the shared pool How the data is distributed among the processes is based on the operation being done & number of rows expected QC gets parallel servers from global pool and distributes the work to them Parallel servers - individual sessions that perform work in parallel Adaptive Execution Plans Adaptive parallel data distribution Shadowprocess is spawned User connects to the database
Adaptive Distribution Methods Hybrid-HASH Join • Cardinality based distribution skew common scenario • Crucial for parallel join of very small data sets with very large data sets • Distribution method decision based on expected number of rows • New adaptive distribution method HYBRID-HASH • Statistic collectors inserted in front of PX process on the left hand side of the join • If actual number of rows less than threshold, switch from HASH to Broadcast • Threshold number of total rows < 2x DOP • Enabled by default
Adaptive Execution Plans # rows returned less than threshold so rows distributed via Broadcast Querycoordinator Hybrid hash join between EMP and DEPT Distribution method based on runtime stats Statistics collector inserted in front of PX processes scanning DEPT Adaptive parallel data distribution Statistics Collector threshold 2X DOP P1 P2 P3 P4 DEPT EMP P5 P6 P7 P8
Adaptive Distribution Methods Broadcast/Round Robin Hybrid hash join between EMP and DEPT Distribution method based on runtime stats If DEPT uses Broadcast- EMP uses Round-Robin DOP used is 6 Number of rows returned is 4 Broadcast used because 24 rows distributed (6 X 4)
Adaptive Query Optimization • Adaptive Query Optimization Adaptive PlansJoin methods Parallel distribution methods Adaptive StatisticsDiscovered at compile time Discovered at run time Overview Adaptive Plans Adaptive Statistics Parallel distribution Methods Join Methods At compile time At run time
Dynamic Statistics • During compilation optimizer decides if statistics are sufficient to generate a good plan or not • Dynamic statistics are used to compensate for missing, stale, or incomplete statistics • They can be used for table scans, index access, joins and group bys • One type of dynamic statistics is dynamic sampling
Dynamic Statistics Dynamic sampling has a new level 11(AUTO) Decision to use dynamic sampling depends on the complexity of predicate, existing statistics and total execution time Dynamic statistics shared among queries Dynamic Sampling
Dynamic Statistics SGA SHARED DYNAMIC STATISTICS Select * Fromproduct_information Wherelist_price-min_price=29 And category_id not in (11,22) And prod_name like ‘Smart%’; Table cardinality estimate for prod_info 5 Optimizer determines plan using dynamic statistics 2 SHARED_POOL Cursor 0: Select * from product_information … SQL statement is submitted 1 4 Dynamic sampling occurs on small number of blocks from table Resulting dynamic statistics are stored in cache Optimizer checks for existing statistics in data dictionary DATA DICTIONARY 3 Statistics found but need to be augmented due to complex predicates
Dynamic Statistics SGA SHARED DYNAMIC STATISTICS • Select supplier_id, prod_name Fromproduct_information Wherelist_price-min_price=29 And category_id not in (11,22) And prod_name like ‘Smart%’; Table cardinality estimate for prod_info 10 Optimizer determines plan using dynamic statistics 7 SHARED_POOL Cursor 0: Select supplier_id, prod_name … Different SQL statement is submitted with same predicates 6 9 Necessary Dynamic statistics found in shared cache Optimizer checks for existing statistics in data dictionary DATA DICTIONARY 8 Statistics found but need to be augmented due to complex predicates
Adaptive Statistics Re-optimization • During execution optimizer estimates are compared to execution statistics • If statistics vary significantly then a new plan will be chosen for subsequent executions based on execution statistics • Re-optimization uses statistics gathered from previous executions • First introduced as Cardinality Feedback in 11.2
Adaptive Statistics Initial execution of a query is monitored Example shows initial plan is hash join between sales and customers Cardinality estimates 8X off Cardinality Feedback pre 12c Initial Cardinality estimates are more than 8X off
Adaptive Statistics Execution Plan after the second execution Execution statistics used to reparse the statement on the second execution New plan shows correct cardinality estimates and a new join order Information learnt is stored in the cursor only and is lost if cursor ages out Cardinality Feedback pre 12c Estimates are now correct
Adaptive Statistics Re-optimization in 12c • Join statistics are also monitored • Works with adaptive cursor sharing for statement with binds • New Column in V$SQL IS_REOPTIMIZABLE • Information found at execution time is persisted as SQL Plan Directives
Adaptive Statistics New column in V$SQL IS_REOPTIMIZABLE Indicates that the statement will be re-parsed on the next execution Re-optimization – indicator in V$SQL
Adaptive Statistics Adapt join method example showed the join change from nested loops to hash join But cursor is also marked IS_REOPTIMIZABLE Why? Re-optimization will occur on second execution because cardinality estimates off Adaptive Plans & Re-optimization working together Initial Cardinality estimates are off
Adaptive Statistics On second execution statement is re-parsed using execution statistics New child cursor created with new plan New plan changes the join order Adaptive Plans & Re-optimization working together
Adaptive Statistics SQL Plan Directives • Directives are additional information used during optimization to generate a better plan • For example, when table T1 is joined to T2 use dynamic statistics to get accurate cardinality estimate • Directives are collected on query expressions not at a statement level • Allows for directives to be used for multiple statements • Persisted on disk in the SYSAUX tablespace • Directives will be automatically maintained • Managed using the new package DBMS_SPD
SQL Plan Directives SGA DIRECTIVE CACHE Directive: Use DS for customers table when column city, country are used Select * FROM customers where state =‘CA’ AND country=‘USA’; 2 SHARED_POOL IS_REOPTIMIZABLE = Y SQL statement is submitted 1 During execution cardinality estimate discovered to be wrong and directive created 3 Optimizer determines plan Cursor 0: Select * from customers …….. SysauxTablespace
SQL Plan Directives SGA DIRECTIVE CACHE Directive: Use DS for customer table when column city, country are used Select * FROM customers where state=‘CA’ AND country=‘USA’; 5 Optimizer determines new plan using execution statistics SHARED_POOL Cursor 0: Select * from customers ….. IS_REOPTIMIZABLE = Y Cursor 1: Select * from customers ….. 4 Same SQL statement is submitted again 6 This execution and subsequent execution use child cursor 1 IS_REOPTIMIZABLE = N Directives periodically persisted to disk in the sysauxtablespace SQL Directives SQL Directives SysauxTablespace Sqlid Binds Stats Extension DIRECTIVE CACHE
SQL Plan Directives SGA DIRECTIVE CACHE Select email FROM customers where state=‘MA’ AND country=‘USA’; 9 Optimizer determines plan with help of directive SHARED_POOL Cursor 0: Select email from customers 7 New SQL statement is submitted on customers table 10 Optimizer adds column group creation for state & country columns to next statistics gather on customer table 8 SQL Directives Optimizer checks for directives on customers table and finds one on the columns state and country SysauxTablespace Sqlid Binds Stats Extension DIRECTIVE CACHE
Summary • Optimizer begins to learn from its experiences • Adaptive Plans • Adapt join methods • Adapt parallel distribution methods • Adaptive Statistics • Dynamic statistics • Re-optimization • SQL Plan Directives
“With an ever-expanding portfolio of tools and applications, Oracle University is strongly oriented toward the individual learner.” 2013 leader inIT EducationAccording toIndustry Analyst Firm IDC “Of the vendors evaluated, Oracle is most adept at describing the value of training at multiple parts of the technology deployment life cycle.” “This decentralized control [of the curricula development] helps maximize the relevance of the training offering to the elements that are most useful to end users.”
One of the World’s Largest Training Organizations Worldwide classrooms: 200+ in 89 countries Courses: 2000+ Languages supported: 28 Classes per week: 600+, 90+ Virtual Classes On Demand Courses: 100 and growing Students trained per year:400,000+ Global education partnerships:500+ Oracle certified professionals: 1.6+ million
Training & Certification for the Entire Stack ROLE-BASED TRAINING Administrators Developers Business Users Architects Implementers WHEN AND HOW YOU NEED IT In-Class Training On Demand Private Live Virtual Class VALIDATED BY INDUSTRY-RECOGNIZED CERTIFICATIONS Oracle Certified Professional Oracle Certified Expert Oracle Certified Associate Oracle Certified Master
Contact Oracle University at: Telephone: 800 810 9931 orEmail: OUChina_CN@oracle.com Website: www.oracle.com/cn/education 2013 leader in IT Education education.oracle.com