1 / 127

Query Tree Disassembly

Learn how query plans in Sybase IQ outline important execution details, why they are vital for performance analysis, and how to generate and interpret them effectively.

allshouse
Download Presentation

Query Tree Disassembly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 9 Query Tree Disassembly

  2. Why are Query Plans Important? • Query plans describe every important detail about • how a query was executed, or will be executed. • You suspect (or know) a query runs poorly? • Query plans can help you investigate the issue • You’re not sure you have the best indexes for a query? • Query plans tell you what indexes are used in a query

  3. Query Execution Phases in IQ • Upon submission of a query: • Syntax and Permissions are Checked • Performed by the ASA front-end • Query is Parsed • Broken down into optimizable objects • Query is Optimized • Most efficient execution method that fits within available resource constraints is determined • Query Plan is created • Query is executed • Resources are cleaned up

  4. Sybase IQ Query Execution Server Front End Shared with ASAnywhere (ASA) Handles Connections Parses Incoming Statements Cross-DB Decomposition (CIS) Security Checking Java Support Stored Procedures Optimizer Predicate Inference Predicate Selectivity Estimation Join Optimization Grouping Algorithm Selection Subquery Optimization Index Access Selection Run-Time Engine Prefetch Manager Predicate Execution Tuple (Row) Projection Join Execution Grouping Execution Sorting Subquery Execution

  5. How do I See a Query Plan? • Query Plans are Generated - • In the IQ Message File • As HTML Pages (if requested) in a separate file • Set Option Commands used with Query Plans • Usually set as Temporary Options • Example: Set Temporary Option Query_Detail = ‘On’; • In 12.5 and later, Query_Plan option is ‘ON’ by default • Query Plan appears in the IQ Message File • You may turn this option off, if desired

  6. Database Options for Query Plans • Query_Plan = ‘On’ (default ‘On’) • Writes a basic query plan in the IQ Message File • May not provide enough detail • Query_Plan_As_HTML = ‘On’ (default ‘Off’) • Creates a basic query plan as an external HTML file • Query_Detail = ‘On’ (default ‘Off’) • Adds more detailed information into query plans (Recommended) • Has no effect unless used with Query_Plan or Query_Plan_As_HTML

  7. More Database Options for Query Plans • Query_Plan_After_Run = ‘On’ (default ‘Off’) • Delays creating the query plan until query completes • Query_Timing = ‘On’ (default ‘Off’) • Includes execution times at each stage of the query • Must be used with Query_Plan_After_Run • Query_Name = ‘query_name’ (default ‘’) • Prints name provided in the query plan, and as part of the file name for HTML query plans • NoExec = ‘On’ (default ‘Off’) • Creates query plan but does not execute the query

  8. HTML Query Plans • Since HTML plans are easier to read these will be used • in this presentation • Query_Detail will also be set ‘On’ for all examples • When you specify Query_Plan_As_HTML = ‘On’ • File will be written in the directory with the Database File (.db) unless the Query_Plan_As_HTML_Directory option is set • The File name that is produced - • Prefaced with User Name, Date and Time • Also contains the “Query_Name” if specified • Has a “.html” file extension and can be opened with a browser

  9. Simple Example of HTML Query Plan File • Set temporary option Query_Name =‘stanton’; • Select count(*) from Central_Fact_Table; • HTML file generated in Database File Directory

  10. Query Plan (HTML) Output – Two Parts 1) Query Tree 2) Query Detail

  11. HTML Query Plan – The Query Tree Query_Name Estimated RowsMoving Up the Tree Node Type and Number The Node Number (underlined) is a hot linkto the Nodes below in the Query Detail

  12. Query Tree and Nodes • Query Tree is a visual representation of the Query Plan • Consists of Nodes representing execution plan steps • Nodes are numbered sequentially and identified by type • Some optimizations can eliminate nodes, so there may be gaps in the Node number sequence • Query Tree also includes the estimated number of rows flowing up each link between nodes (or the actual number of rows flowing if Query_Plan_After_Run was ‘On’) • The Query Tree is displayed inverted • In detailed plans, the top node in the Query Tree is the Root Node • Leaf Nodes are typically at the bottom edges of the tree

  13. Leaf Nodes A More Complex Query Tree

  14. Node Detail - General • Header Shows the Node Number and Node Type • Number is a hot link back to this node within the Query Tree • Child Node(s) (if any) • Node(s) that ‘feed’ this node • Hot link to a Child Node (Leaf Nodes rarely have children) • Estimated Result Rows • Number of rows the optimizer thinkswill come out of this Node • Actual Result Rows are shown withQuery_Plan_After_Run = ‘On’

  15. Root Scrolling Cursor Store Filter Group By Sort Hash Order By Store Semi-join Filter Filler Leaf Join Hash (HJ) Hash Pushdown (HJPD) Nested Loop (NL) Nested Loop Pushdown (NLPD) Sort-Merge (SM) Sort-Merge Pushdown (SMPD) Cartesian Subquery Union All Some Node Types (there are others)

  16. Node Details • Some Nodes Contain a Wealth of Information • Detail in other nodes is not as important • Nodes to Concentrate On - • Root • Leaf • Join • Group By • Filter

  17. Root Node (Query_Detail = ‘ON’) • Node Detail • Child Node • Query Name (if any) • User Name (login) • Temp Space Used • Num of Users in Server • Num of CPUs • Database Options Set • Outputs from this node

  18. Examining the Root Node • Effective Number of Users • This is the count of active cursors, but one user can have multiple cursors open simultaneously • Many active users will have an effect on query response • Optimizer may choose a different Query Plan depending upon the number of CPUs, the available IQ cache space, and the effective number of users • Number of CPUs • This is the number of CPUs that IQ believes are available • Because of hyper-threaded CPUs, or the -iqncpus server startup option, this number may not match the actual number of CPUs on the system

  19. Examining the Root Node - 2 • Temp Space Usage • This estimate is the total over the life of the query • The maximum at any point in time may be much lower • Is this estimated usage greater than the Temp Cache Size? • It may indicate Temp Cache is paging to disk • Database Options • A user or DBA may have set options influencing joins, aggregation, or other aspects of optimizer and server behavior • Any options set to non-default values which may affect query performance or behavior are displayed with their current value

  20. Leaf Nodes • A Leaf Node typically represents access to a Table in IQ • Several Types of Leaf Nodes: • (Regular) Leaf • Aggregation Leaf • Grouping Leaf • Distincting Leaf • Ordered Leaf • All run-time access to indexes happen in Leaf Nodes

  21. Leaf Nodes - 2 • All Leaf Nodes have a wealth of information • Table Row Count • Condition(s) - WHERE clause search predicates local to this table • Estimated Selectivity of predicates • Usefulness of the predicate • Index used by the predicate • Columns from this table referenced by this query with: • Data type • Indexes available • Count of NULLs in the column, if any • Count (or estimate) of distinct values in the column • Estimate of distinct value count after all conditions evaluated

  22. Leaf Nodes - 3 • The portion of a table projected by a query • is referred to as the ‘found set’ • The optimizer knows that Table ‘A’ has <N> rows • If there are no predicates on columns in this table, then we are done • We just need the column(s) to be projected from this table • Tuples (Rows) are passed up the Query Tree to the next Node • The found set will be all <N> rows in the table

  23. Leaf Nodes - 4 • The portion of a table projected by a query • is referred to as the ‘found set’ • If there are predicates on this table, then the found set will be the set of rows that satisfy all of those predicates • In this case, the optimizer must determine: • 1) What portion of the table will satisfy each predicate (Selectivity) • 2) Collective effect of all the predicates (Estimated Result Rows) • 3) The best order to execute the predicates (Usefulness) • 4) What effect these predicates will have, both individually and collectively, on the estimate of the number of distinct values for each column projected after all the conditions evaluated

  24. Leaf Nodes - 5 - Selectivity • When NO ‘usable’ index exists on a column in a predicate, • then the optimizer estimates the selectivity based entirely • on the type of predicate (partial list): • Percent of tablePredicate Type (Estimated) Selectivity Equality (=) 20% 0.2000000 Open Range (>) 40% 0.4000000 Between 40% 0.4000000 Like (%) 20% 0.2000000 Inter-column equality (t.a = t.b) 30% 0.3000000 Inter-column comparison (t.a < t.b) 50% 0.5000000 • (You will see these Selectivity estimates in Query Plans)

  25. Leaf Nodes - 6 - Index-based Selectivity • If a ‘useable’ index does exist on a column in a predicate • it may enable more accurate selectivity information • Depends on the type of predicate and the index type(s) • May also depend on the number of values in the column which satisfy that predicate • If an index was usable, the Selectivity will usually look like a precise number without all those trailing zeros: 0.34982376 • Beware: functions used on a column may negate the ability to use an index for execution or for selectivity estimation! • SUBSTRING( t.a, 5, 5 ) = ‘homes’ • May result in a Column Scan • The optimizer will use the default rule for rows returned (20% of the rows for an Equality search)

  26. Leaf Nodes - 7 - Usefulness • Usefulness is used to rank the predicates for execution order • Predicate with the highest value (score) is executed first • Usefulness values run from 0.0 to 10.0 (10.0 is the most useful) • Remaining predicates executed in descending order of Usefulness • What determines a predicate’s Usefulness? • How far the predicate will reduce the found set (Selectivity) • Type of predicate • Index(es) available on the column • Optimized FP indexes on the column • How fast the predicate can be executed • How many resources the predicate execution will require

  27. Leaf Nodes - 8 - Example Query • Single table (147 million rows) - 5 Predicates • What is the best index to use to execute each of them ? • What is the best order to execute them ? • SELECT PAID_AMT • FROM CENTRAL_FACT_TABLE • WHERE ELIG_BEG_DATE >= '2000-01-01' • AND MEMBER_GENDER = 'M' • AND MEMBER_DOB < '1950-01-01' • AND COMPLICATION = 'OUTPATIENT' • AND SERVICE_ROLLUP = 'Neoplasms'

  28. Leaf Nodes - 9 - Predicate Types • Equality and Inequality ( = , != ) • Includes IN and NOT IN lists • Ranges ( <, >, <=, >=, Between) • Including NOT (!) • Like ( % ) or Not Like • Contains • IS [NOT] NULL

  29. Leaf Nodes - 10 - Predicate and Index • Equality and Inequality ( = , != ) • Includes IN and NOT IN lists • Ranges ( <, >, <=, >=, Between) • Like ( % ) or Not Like • Contains • IS [NOT] NULL LF and HG DATE, HNG, LF and HG FP and WD Null Bit

  30. Leaf Nodes - 11 - HG and LF Indexes • These indexes provide exact meta-data to the optimizer: • The number of distinct values for a column • The number of rows for each value • They also provide fast run-time access to the set of cells that contain a particular value

  31. Leaf Nodes - 12 - HG and LF Indexes • When columns with these indexes are used in certain predicates they will provide the optimizer with the exact number of rows that will satisfy the predicate • Equality, inequality, IN, and NOT IN predicates • Range predicates (if the range does not contain too many values) • If a range predicate is too wide to get an exact row count, the LF or HG index will still provide the exact count of distinct values within the range • See the option, HG_SEARCH_RANGE,to change ‘too wide’

  32. Leaf Nodes - 13 - FP(1) and FP(2) • These optimized FP indexes can also provide exact • meta-data to the optimizer: • The number of distinct values for a column • The number of rows for each value • They do not provide fast run-time access to the set of cells • that contain a particular value, as the HG or LF indexes do. • They can dramatically reduce the storage required for a • column, and thereby reduce I/O costs for projection.

  33. Leaf Nodes - 14 - FP(1) and FP(2) • For predicates involving expressions on columns, or for • predicates where index except an FP can be used, these • optimized FP indexes can: • Provide an exact selectivity to the optimizer • Greatly accelerate the execution of such predicates • Examples of predicates affected: • SUBSTR(T.X, 1, 2) = 'TE’ • T.NAME LIKE ‘%B%’

  34. Leaf Nodes - 15 - Minimize_Storage Option • New in 12.5 was a database option, MINIMIZE_STORAGE, • to auto-create optimized FP indexes for all columns, without • requiring IQ UNIQUE be declared on each column. • Late addition to 12.5 Release • Option description did not make first printing of any 12.5 IQ Docs • Look in the Release Bulletin for 12.5 for details • When set ‘ON’, MINIMIZE_STORAGE: • Equivalent to IQ UNIQUE(255) for all columns in tables you create • As data is loaded an FP(1) index will ‘roll over’ first to an FP(2) and then to a flat (unoptimized) FP index as necessary • Highly recommended for all tables < 1000 columns

  35. Leaf Nodes - 16 - FP(1) and FP(2) • NOTE: In 12.6 and later, the optimized FP indexes will be • used to accelerate the evaluation of single column • expressions, even if they are not used in predicates. • See the 12.6 documentation on the option, • FPL_Expression_Memory_KB,for details. • Examples of expressions affected: • SELECT SUBSTR(T.X, 1, 2), (1.0 - T.DISCOUNT)

  36. Leaf Nodes - 17 - Other Indexes • Except for special cases, the other indexes can be used in predicates, but: • They will not provide exact counts of rows to the optimizer • Optimizer will use the default selectivity (discussed earlier) • Later you will see how to help the optimizer avoid bad estimates • Some of the special cases are: • CONTAINS predicate with a WD index • DATEPART function predicate on a column with a DATE index

  37. Leaf Nodes - 18 - Query Example • Back to our example query … • SELECT PAID_AMT • FROM CENTRAL_FACT_TABLE • WHERE ELIG_BEG_DATE >= '2000-01-01' • AND MEMBER_GENDER = 'M' • AND MEMBER_DOB < '1950-01-01' • AND COMPLICATION = 'OUTPATIENT' • AND SERVICE_ROLLUP = 'Neoplasms'

  38. Leaf Nodes - 19 - Example Leaf Node This is a largeLeaf Node ! (about 2 pages) We will examine it in parts

  39. Estimated Result Set Original Table Size Leaf Nodes - 20 - Example Leaf Node: Row Info (Information here is hidden for clarity)

  40. 0.199 % of the table is returned That is very useful!! Leaf Nodes - 21 - Example Leaf Node: Predicate Info • Each predicate for the table has been analyzed: • If an HG, LF, FP(1), orFP(2) index exists it provides Selectivity statistics (number rows returned as a Percentage of the Table) • Usefulness was assigned based partly on the selectivity • The smaller the selectivity, the more useful the predicate

  41. Leaf Nodes - 22 - Example Leaf Node: Predicate Info Selectivity Usefulness Index Used

  42. Leaf Nodes - 23 - Example Predicates • Predicates ranked by Usefulness: • Predicate • ========================================== • SERVICE_ROLLUP = 'Neoplasms' .001 9.99 • MEMBER_GENDER = 'M' .389 9.61 • ELIG_BEG_DATE >= '2000-01-01' .764 6.23 • MEMBER_DOB < '1950-01-01' .389 5.61 • COMPLICATION = 'OUTPATIENT' .200 2.80 Selectivity Usefulness This looks like a ‘Guess’ – 20% for an Equality

  43. Leaf Nodes - 24 - Example Estimated Result Rows • In 12.5 and earlier, the ‘Estimated Result Rows’ from this table • is the product of the Selectivity estimates from each predicate • This implies that all predicates are assumed to be uncorrelated • Condition 1: 149,178,536 * 0.001995 = 297,587 rows • Condition 2: 297,587 * 0.389099 = 115,791 rows • Condition 3: 115,791 * 0.764858 = 88,564 rows • Condition 4: 88,564 * 0.389793 = 34,521 rows • Condition 5: 34,521 * 0.200000 = 6,904 rows

  44. Leaf Nodes - 25 - Example Estimated Result Rows • NOTE: In 12.6 and later, the ‘Estimated Result Rows’ from this table • will be the actual number of rows that satisfy the entire set • of predicates, because all simple predicates will be executed • early in the optimization process. • For most predicates, there will no longer need to be an • assumption that predicates are statistically uncorrelated.

  45. Leaf Nodes - 26 - Column Information • There is more detail on each column referenced • This shows how much IQ knows about the column’s data • 8578 distinct values in column • 1096 values satisfy the >= condition • Column stored as FP(2) • HG and Date Index on column elig_beg_date >= '2000-01-01'

  46. Leaf Nodes - 27 - Back to Our Query Example • Was the Optimizer Estimate correct? • Use Query_Plan_After_Run option • Delays the printing of the Query Plan until Query Completes • New details appear in the Query Plan After Run: • Estimated and Actual Row Counts at each Node • Estimated and Actual Temp Space Used • For predicates with multiple algorithms available, the algorithm actually used

  47. Leaf Nodes - 28 - Example - Query Plan After Run • Row Estimate • Off by a factor or 72.6 • Temp Space Usage • Just slightly below actual • Questions … • Why is row estimate inaccurate? • Is this a problem?

  48. Row Estimates • Several Reasons for Wrong Estimates • Method used to derive estimate is statistical, not perfect • Assumes even distribution of data • Assumes predicates are not correlated • One Predicate was a Guess (no useable index on that column) • Importance of HG, LF, FP(1), and FP(2) indexes cannot be stressed too highly • Inaccurate Estimates May NOT be a Problem • Only a problem if those inaccurate estimates cause the optimizer to select a poorly performing query plan • Nodes using HASH tables are sometimes sensitive to row counts • Can be negative performance implications for underestimates • Sort Nodes are not sensitive to inaccurate estimates

  49. Leaf Nodes - 29 - Corrective Action and Results • Created LF Index on the COMPLICATION column for this 149 million row table • Took 110 Seconds (Sun 64 w/ 900 MHZ cpu) • New index is more Selective (and Useful) • Estimates are now much closer to reality • This is about as close as we can get in 12.5

  50. Leaf Nodes - 30 - Query Plan After Index Added

More Related