440 likes | 612 Views
DM213 - The Adaptive Optimizer - Evolution of the Optimizer from 11.9.2 to 12.0 and Beyond. Eric Miner Development Engineer II ESD Optimizer Group eric.miner@sybase.com.
E N D
DM213 - The Adaptive Optimizer - Evolution of the Optimizer from 11.9.2 to 12.0 and Beyond • Eric Miner • Development Engineer II • ESD • Optimizer Group • eric.miner@sybase.com
Review the evolution of the optimizer from ASE 11.9.2 to ASE 12.0, and to introduce planned changes in the “Galaxy” optimizer Provide an opportunity for you to give input on features and functionality you’d like to see included in the optimizer Objectives
General knowledge of what the optimizer is and how it works Experience in tuning the optimizer and/or analyzing optimizer issues Curiosity about future features and functionality of the optimizer Input on changes and features you’d like to see included in future versions of the optimizer Assumptions
The optimizer changes in order to perform new and different task The optimizer will always be evolving It must be flexible enough to deal with different types of work loads Features and functionality important to Customers need to be implemented as quickly at possible Changes in optimization theory in the academic world need to be examined and incorporated if found useful Why Should The Optimizer Evolve?
Major changes for the optimizer Major changes in the statistics New statistics used in costing New costing methods and enhancement of existing costing Tool added to allow reading,writing and simulating of the statistics The11.9.2 Adaptive Optimizer
The Statistics Project Major changes to the accuracy, storage and use of the optimizer statistics Optimizer statistics stored in two system tables sysstatistics - Column level statistics systabstats - table/index level statistics Column Level (distribution) statistics are no longer associated with an index - now associated with a column Table/Index level statistics are centralized in systabstats The11.9.2 Adaptive Optimizer
sysstatistics - The column level statistics The histogram cell boundary values and associated weights.These are the distribution values The weight for each cell makes them more accurate than in previous versions Two cell types - range and frequency count cells The range cell and total density values The date/time of last modification to the column statistics The default selectivity values (formerly the magic numbers) Column level statistics can be written directly The11.9.2 Adaptive Optimizer The Statistics Project cont.
systabstats - The table and Index level statistics Row and page counts of the table and leaf of indexes The number of deleted and forwarded rows The number of empty datapages and index leaf pages height of an index. The Datapage, Index page and Data row cluster ratios Dynamically updated - in memory structure flushed to systabstats by a number of actions - can’t be written The11.9.2 Adaptive Optimizer The Statistics Project cont.
Update statistics has been extended to allow for placement of statistics on columns update statistics table_name (col_name) update index statistics table_name [ind_name] update all statistics table_name Specify the requested number of steps (cells) to use when building the column’s histogram update statistics table_name (col_name) using 200 values Changes to Update Statistics
Utility allows you read, write and simulate statistics Displays all statistics used by the optimizer (sysstatistics and systabstats) Output files can be edited and read in to write the column level statistics or output files can be used to Outputs can be used to ‘go back’ to a previous set of statistics Only column level statistics can be written with optdiag - exception is simulate mode Optdiag
Output files very useful when you need to examine a dataset, table or its indexes Optdiag simulate mode can be used to simulate the effect of changes to any statistical value and/or a number of configuration values on the optimizer Simulate writes the column level statistics but not the table level - table level stats for simulate are stored in special area of sysstatistics and used to optimize query when set statistics simulate is on Optdiag outputs may be requested by TS Optdiag cont.
Cluster ratios are used by the optimizer to cost large I/O and NC access Three cluster ratios - Data Page, Index Page and Data Row - Cluster ratio of 1 = perfect clusting Data Page and Index Page measure the degree of clustering of their respective page type within extents Used in costing large I/O access Data Row measures the clustering of index leaf rows in relation to data rows Used to cost NC/PI access Cluster Ratios and Large I/O Costing
Used by the optimizer to avoid sorts for ORDER BY If index order matches ORDER BY sort can be avoided. The optimizer will do a ‘sort avert’ costing if possible If the sort avert plan is found cheapest it will be used Other Costing - Directional Index Scan
For DOL tables unique index no longer required It is no longer necessary for the optimizer to use only a unique index More indexes available, more likely an efficient plan will be chosen Uses fixed row ids as identifiers Other Costing - Updateable Cursors and Isolation Level 0
Forwarded rows exist only on DOL tables A read of a forwarded row requires two reads - One to read the pointer on the home page One to read the actual row The optimizer takes the presence of forwarded rows in to account when costing. Check optdiag for a count of forwarded rows in the table Use reorg to move forwarded rows back to their home pages - remove fragmentation Other Costing - Forwarded Row Costing
Traceon 302 rewritten for ease of use and to incorporate new statistics Five distinct blocks of information Each block describes different areas of costing Obscure wording rewritten, plain English All new statistics are displayed when used in costing Traceon 302 Rewritten
ASE 12.0 introduces some enhancements and expanded functionality Many involve both optimization and query processing Sort Merge Joins Join Transitive Closure Predicate Factoring and Transformation Like optimization enhancements 50 Table Limit Abstract Query Plans The ASE 12.0 Adaptive Optimizer
New join method designed primarily for DSS queries SMJ can be more efficient in many DSS/DW queries than nested loop. May be less efficient for OLTP Ordered joins provide clustered access to joining rows; result in less logical and physical I/Os. Can exploit indexes that pre-order rows on joining columns. All aspects of performing an SMJ are costed by the optimizer SMJ can be done in parallel The ASE 12.0 Adaptive OptimizerSort Merge Joins (SMJ)
Example select … from part, partsupp, lineitem where p_partkey = ps_partkey and ps_partkey = l_partkey and ps_orderkey = l_orderkey and p_type = ‘CD’ Unsorted Accessto innermost table Part Clustered on p_partkey Partsupp Clustered on ps_partkey Lineitem Clustered on l_orderkey Part Clustered on p_partkey Partsupp Clustered on ps_partkey Sorted Accessto innermost table Lineitem Sorted on l_partkey
Example Table T1 Table T2 where T1.pk = T2.pk R E A D N E X T R E A D N E X T 77 77 78 79 79 81 80 84 81 87 82 90 83 91 84 94
By default SMJ is off in ASE 12 SMJ may not be efficient for all queries, to set it on server wide - sp_configure “enable sort-merge join and JTC”,1 Or, boot with trace 384 At the session level - set sort_merge on The ASE 12.0 Adaptive OptimizerSort Merge Joins (SMJ) cont
Provides the optimizer with more possible join orders More join orders will result in more join plans costed and may result in more efficient plan being chosen select A.a from A,B,C where A.A=B.b and B.b=C.c Adds join and A.a=C.c and four extra orders JTC makes this possible for equi-joins JTC off by default to set it on server wide- sp_configure “enable sort-merge join and JTC”,1 Or, boot with trace 334 At the session level - set JTC on The ASE 12.0 Adaptive Optimizer Join Transitive Closure (JTC)
PFT can add easily optimized AND clauses to queries containing OR clauses PFT aids optimization by extracting optimizable clauses from predicates linked with ORs and substitutes them with clauses linked with ANDs PFT makes more access methods available for optimization - the more possible methods to cost the more likely an efficient plan will be chosen Occurs before entering the optimizer Will not alter the original query semantics The ASE 12.0 Adaptive OptimizerPredicate Factoring & Transformation
Can’t be turned off Traceon 302 will print clauses in the clause info block that don’t appear in the query Example query: select * from lineitem, part where (p_partkey = l_partkey and l_quantity >= 10)or (p_partkey = l_partkey and l_quantity <= 20) Above query is transformed to the following: select * from lineitem, partwhere ((p_partkey = l_partkey and l_quantity >= 10)or (p_partkey = l_partkey and l_quantity <= 20) )and (p_partkey = l_partkey)and (l_quantity >= 10 or l_quantity <= 20) The ASE 12.0 Adaptive OptimizerPFT continued
Costing of LIKE predicates with a leading wildcard has been improved Previously the optimizer could not cost such a query selectivity of 1.0 was all that was possible A pattern match search of histogram boundary values is done. If a match is found the weight(s) of the cells are added to get selectivity If no match is found selectivity is 1/#of steps - If 20 steps selectivity would be 0.05 The ASE 12.0 Adaptive OptimizerLIKE Optimization Enhancement
Number of user tables in a query has increased 50 user tables and 14 worktables Designed to allow for a queries with a large number of of non-flattened subqueries Requires more scan descriptors be configured Short circuiting employed - as the number of tables increases the number of tables costed at a time by join processing decreases 2-25 4, 26-37 3, 38-50 2 - Plans may be overlooked The ASE 12.0 Adaptive Optimizer 50 Table Limit
Not designed for 50 tables in a join- however…. If you use a large number of tables in a join expect optimization to be slower due the the plans that must be costed. If possible set tablecount to 4 if short circuiting likely Consider using an Abstract Plan if possible The ASE 12.0 Adaptive Optimizer 50 Table Limit cont.
A persistent readable, editable and reusable description of a query plan Associated with a query, not syntactically part of the query - no special syntax, can be included in query if desired Can be captured and stored when a query is run Capture mode Can be created manually - new T-SQL extensions Can be used by a query when it is re-running Can be edited The ASE 12.0 Adaptive Optimizer Abstract Query Plans
Designed as a method to temporarily work around optimizer issues with upgrade or bugs while fixes are made - In most cases won’t be needed Stored in sysquerplans When captured will contain the entire query plan Full plan - bypasses the optimizer Possible to specify part of the plan and let the optimizer complete the plan Partial plan - Allows fine control of the optimizer The ASE 12.0 Adaptive Optimizer Abstract Query Plans cont.
Far more granular and flexible control than the force options offer Can be used as a ‘super force’ option The ASE 12.0 Adaptive Optimizer Abstract Query Plans cont.
Caveats Future features and functionality may change before the release of a new ASE In depth details are not presented at this time - this is a high level discussion Your input and suggestions are welcome and important to us The Future
Project code name - Galaxy - DSS/DW oriented The optimizer will be rewritten - some of the planned new features and functionality include - New search engine - more accesses possible, star join support Parallel Optimization - better optimization of PLL queries New join costing method - more accurate join optimization On-Line Statistics - a method to allow column level statistics to be gathered quickly The Future Adaptive Optimizer
New search engine - flexible and configurable The search engine needs to be flexible in order to allow different levels of optimization for different needs - DSS vs. OLTP The search engine’s levels of optimization will be configurable You’ll be able to set the amount of searching (plan costing) you want done Useful when ASE is doing OLTP vs. DSS Exhaustive vs. restrictive search degree of ‘pruning’ of plans The Future Adaptive Optimizer -New Search Engine
Index selection for joins is currently done in the ‘prep phase’- indexes costed before join costing is done. Predicates on columns of a composite index are costed separately to use the index either for the join or the SARG Index selection for joins will now be done while join plans are being costed (permuted) Predicates (joins and SARGs) on the columns of a composite index will be costed together, making join index costing more granular The Future Adaptive Optimizer -New Search Engine cont.
More join plans possible Number of plans costed will be controllable Star joins supported - optimizer will look for and cost star joins when found May want to use a high level of optimization to examine all star join plans The Future Adaptive Optimizer -New Search Engine cont.
New optimization for costing parallel queries Parallel optimization is a critical factor in a DSS/DW system Parallel optimization will be better able to balance response time and resource utilization Will find an efficient PLL plan based on resource utilization Will take into account that I/O can be done in parallel and cost it accordingly - cost resource usage rather than simple I/O Resources include - disks, CPUs, memory and network The Future Adaptive Optimizer - Parallel Optimization
Parallel optimization is configurable Activate or inhibit parallel optimization phases based on operating requirements Levels of parallel optimization can be set Configurable unit costs of each resource type Includes disk speed, CPU, network, memory The Future Adaptive Optimizer - Parallel Optimization cont.
New method will result in more accurate costing of joins By merging histograms on the fly rows that will not participate in the join will not effect costing More accurate estimate of the number of rows that will join SARG values will be used to restrict the portion of original and join histograms to be used in costing The Future Adaptive Optimizer - New Join Costing Method
Boundary values are merged, weights are more granular , those values not qualified by SARG values will be eliminated Data skew (many values with few rows - a few values with many rows) has a direct effect on the total density value and thus on join costing New costing method does not use the Total Density value Unless it participates in the join - a highly duplicated value will not effect the join costing Must have statistics on the joining columns The Future Adaptive Optimizer - New Join Costing Method cont.
Designed to gather column level statistics quicker and when you want Sampling is at the heart of on-line statistics A full read of all rows of the column not required - fewer reads, fewer I/O Can set pages or extents to be read for sampling Reading pages/extents is more efficient than sampling rows Can specify percentage (degree) of sampling to do - 10% - 0.1% expected range - percentage will be persistent for a column The Future Adaptive Optimizer - On-Line Statistics
Manual or automatic? Sampling can be specified at create index or update statistics Degree of sampling will not change unless you specify another value Set on-line statistics to gather stats on its own Based on the DML stream? Based on the number of rows in a column that change? The Future Adaptive Optimizer - On-Line Statistics cont.
The optimizer will always be evolving The optimizer will be flexible enough to handle new demands on it It will better handle DSS/DW tasks Customer input on new features and functionality must be weighed and implemented New theories in optimization must be examined and incorporated if found useful The optimizer can not stand still Conclusion
Portal-Ready Integration Availability • Access & Analysis Corporate Information • Plug’n Play with Standard Internet Technologies • Continuous Database Availability • Realtime Database Maintenance & Tuning Database Requirements Productivity • Portal Productivity Technologies • Web Transaction Management
Come up and get a business card or eminer@sybase.com Questions/Answers and Input