1 / 50

Ó IBM Corporation 2003

Tampa Bay Relational Users Group Query diagnosis. IBM Silicon Valley Lab, U.S.A. Ó IBM Corporation 2003. Query analysis and tuning. Format the SQL statement Prepare the statement for human tuning Separate sections for: SELECT list FROM clause WHERE clause … Tools support

ramona
Download Presentation

Ó IBM Corporation 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tampa Bay Relational Users Group Query diagnosis IBM Silicon Valley Lab, U.S.A. Ó IBM Corporation 2003

  2. Query analysis and tuning • Format the SQL statement • Prepare the statement for human tuning • Separate sections for: • SELECT list • FROM clause • WHERE clause • … • Tools support • Data Studio fixpack 2.2.0.1 includes SQL formatting • Show transformed SQL text

  3. Sample unformatted query EXPLAIN PLAN SET QUERYNO = 1 FOR SELECT DISTINCT ITEM.ITEM_NBR AS ITEM_NBR, ITEM.PRDT_ID, STOREITEM.WK_STRT_DT AS WK_STRT_DT ,STOREITEM.DC_ID AS DC_ID FROM PROD.TIPA004_STITM_PROJ AS STOREITEM , PROD.TITM001_ITEM AS ITEM WHERE ITEM.BUS_UNIT_ID = ‘GS‘ AND ITEM.BUS_UNIT_ID = STOREITEM.BUS_UNIT_ID AND ITEM.MJR_CATG_ID = '00754‘ AND ITEM.INTMD_CATG_ID = '00043‘ AND ITEM.ITEM_NBR = STOREITEM.ITEM_NBR AND ITEM.MJR_CATG_ID = STOREITEM.MJR_CATG_ID AND ITEM.INTMD_CATG_ID = STOREITEM.INTMD_CATG_ID AND STOREITEM.RTL_DEPT_NBR = 1 AND AD_ITEM_FLG = 'Y‘ AND WK_STRT_DT = '2002-02-08'; Unformatted SQL, where to start?

  4. Formatted EXPLAIN PLAN SET QUERYNO = 1 FOR SELECT DISTINCT ITEM.ITEM_NBR AS ITEM_NBR, ITEM.PRDT_ID, STOREITEM.WK_STRT_DT AS WK_STRT_DT ,STOREITEM.DC_ID AS DC_ID FROM PROD.TIPA004_STITM_PROJ AS STOREITEM ,PROD.TITM001_ITEM AS ITEM WHERE ITEM.BUS_UNIT_ID = STOREITEM.BUS_UNIT_ID AND ITEM.MJR_CATG_ID = STOREITEM.MJR_CATG_ID AND ITEM.INTMD_CATG_ID = STOREITEM.INTMD_CATG_ID AND ITEM.ITEM_NBR = STOREITEM.ITEM_NBR AND ITEM.BUS_UNIT_ID = ‘GS‘ AND ITEM.MJR_CATG_ID = '00754‘ AND ITEM.INTMD_CATG_ID = '00043‘ AND STOREITEM.AD_ITEM_FLG = 'Y‘ AND STOREITEM.RTL_DEPT_NBR = 1 AND STOREITEM.WK_STRT_DT = '2002-02-08';

  5. Analyzing query • Observe “interesting predicates” • Optimizer may produce inaccurate filter factor estimate • Range predicates with parameter markers • Predicates using interesting literals • Probable defaults • Complex predicates • Complex OR expressions • Negation predicates • Column expressions • Non-column expressions

  6. Sample query Pat’s diagnosis

  7. Query breakdown SELECT … FROM SETL_TRANS S ,BRANCH CUST ,BRANCH_ADDR A WHERE S.ADV_ABA_R = ? AND S.PROCESS_DT < '9999-12-31‘ AND S.TYPE_CD IN ('A', ‘C’, ‘X’) AND S.CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’) AND S.STLMT_DT = ? AND S.ACCT_NUM = CUST.ACCT_NUM AND CUST.CUST_EFCT_DT <= ? AND CUST.CUST_INACTV_DT > ? AND A.ACCT_NUM = CUST.ACCT_NUM AND A.CUST_EFCT_DT <= ? AND A.CUST_INACTV_DT > ? AND A.ADDR_TYP_CD = ' '

  8. Identify peculiar predicates SELECT … FROM SETL_TRANS S ,BRANCH CUST ,BRANCH_ADDR A WHERE S.ADV_ABA_R = ? AND S.PROCESS_DT < ‘9999-12-31’  MAX DATE AND S.TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’) AND S.CLR_CYCLE_CD IN ('EOD', 'IMD‘, ‘OPN’) AND S.STLMT_DT = ? AND S.ACCT_NUM = CUST.ACCT_NUM AND CUST.CUST_EFCT_DT <= ?  Range with marker AND CUST.CUST_INACTV_DT > ?  Range with marker AND A.ACCT_NUM = CUST.ACCT_NUM AND A.CUST_EFCT_DT <= ?  Range with marker AND A.CUST_INACTV_DT > ?  Range with marker AND A.ADDR_TYP_CD = ' ‘  COL = blank

  9. Why are they peculiar? Predicates with typical default often skewed. AND S.PROCESS_DT < ‘9999-12-31’  MAX DATE AND A.ADDR_TYP_CD = ' ‘  COL = blank Range predicates with parameter markers - Impossible to estimate without literal AND CUST.CUST_EFCT_DT <= ?  Range with marker AND CUST.CUST_INACTV_DT > ?  Range with marker AND A.CUST_EFCT_DT <= ?  Range with marker AND A.CUST_INACTV_DT > ?  Range with marker

  10. Range predicate interpolation Table 104. Default filter factors for interpolation Note: Op is one of these operators: <, <=, >, >=. COMMENT: This is DB2’s documented guess for an impossible to estimate Filter factor.

  11. Analyzing query • Embed information within statement • Table information • CARDF • NPAGES • Column information for predicates • Local predicates • Join predicates • Observe where the filtering is • Selectivity of a predicate is relative to table cardinality • Investigate “suspicious” predicates • Determine actual versus estimated filtering • If there is a problem, identify options

  12. Embed statistics SELECT … FROM SETL_TRANS S CARDF 1,600,254 NPAGES 21,627 ,BRANCH CUST CARDF 31,696 NPAGES 1132 ,BRANCH_ADDR A CARDF 58,627 NPAGES 2791 WHERE S.ADV_ABA_R = ? COLCARDF 19,712 AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11 LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05 AND S.TYPE_CD IN ('A', 'C', ‘X‘, ‘Z’) COLCARDF 4 AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 AND S.STLMT_DT = ? COLCARDF 13 AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527 AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 AND CUST.CUST_INACTV_DT > ? COLCARDF 279 LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07 AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527 AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 AND A.CUST_INACTV_DT > ? COLCARDF 274 LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’ AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5

  13. Suspicious predicate analysis 1) The first range predicate, we’re looking for all values less than ‘9999-12-31. So the predicate searches for all values less than a number significantly greater Than the HIGH2KEY – so basically, all of the rows qualify here. (since the optimizer has the literal value, it KNOWS that all rows qualify). 2) For the column = blank predicate, I don’t believe a skew search was ever done. You could look to see how many values are blank. Is it > 20%? 1/5 = 20%. 1) AND S.PROCESS_DT < '9999-12-31‘ COLCARDF 11 LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05 2) AND A.ADDR_TYP_CD = ' ‘ COLCARDF 5 Conclusion: First predicate is should not be causing this SQL statement any Problems.

  14. Suspicious predicate analysis The literal value used for each of the parameter markers in this case happened To be the same, and the value was 2004-04-06. Comparing the literal value to the HIGH2KEY and what range would qualify Is how I determined the ESTIMATED FF WITH LITERAL. The ESTIMATED FF WITH MARKER is from the chart in the Admin guide. The “error” is how different the optimizers DEFAULT estimate is from ACTUAL filtering. 3) AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 ESTIMATED FF WITH LITERAL: = 100% ESTIMATE WITH MARKER: 1/30 = 3% ( 97% error ) 4) AND CUST.CUST_INACTV_DT > ? COLCARDF 279 LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07 ESTIMATED FF WITH LITERAL: = 99% ESTIMATE WITH MARKER: 1/10 = 10% ( 89% error ) 5) AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 ESTIMATED FF WITH LITERAL: = 100% ESTIMATE WITH MARKER: 1/30 = 3% ( 97% error ) 6) AND A.CUST_INACTV_DT > ? COLCARDF 274 LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’ ESTIMATED FF WITH LITERAL: = 99% ESTIMATE WITH MARKER: 1/10 = 10% ( 89% error )

  15. Suspicious predicate analysis • Conclusion • The range predicates with parameter markers introduce significant filter factor error. So we should recognize that this filter factor error can cause significant cost estimation problems for the optimizer – possibly resulting in poor access path choice.

  16. Where’s the filtering? WHERE S.ADV_ABA_R = ? COLCARDF 19,712 (Very selective predicate) AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11 (This predicate doesn’t filter anything, known from suspicious predicate analysis) AND S.TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4 (In-list looking for 4 values, COLCARDF 4 – not filtering) AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 (In-list looking for 3 values, COLCARDF 3 – not filtering) AND S.STLMT_DT = ? COLCARDF 13 (COL = LIT, COLCARDF 13 – somewhat filtering, but not great selectivity) AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527 (For the range predicates, we know that optimizer PERCIEVES them to be selective but In reality, they are not. This was determined during suspicious predicate analysis) AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 AND CUST.CUST_INACTV_DT > ? COLCARDF 279 AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527 AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 AND A.CUST_INACTV_DT > ? COLCARDF 274 AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5 (COL = blank. Probably this column is skewed on blank. COLCARDF 5, not typically Very filtering)

  17. Where’s the filtering? SELECT … FROM SETL_TRANS S CARDF 1,600,254 NPAGES 21,627 ,BRANCH CUST CARDF 31,696 NPAGES 1132 ,BRANCH_ADDR A CARDF 58,627 NPAGES 2791 WHERE S.ADV_ABA_R = ? COLCARDF 19,712 AND S.PROCESS_DT < ‘9999-12-31’ COLCARDF 11 LOW2KEY 2004-03-24 HIGH2KEY 2004-04-05 AND S.TYPE_CD IN ('A', 'C', ‘X', ‘Z') COLCARDF 4 AND S.CLR_CYCLE_CD IN ('EOD', 'IMD', ‘OPN') COLCARDF 3 AND S.STLMT_DT = ? COLCARDF 13 AND S.ACCT_NUM = CUST.ACCT_NUM COLCARDF 15360 / 26,527 AND CUST.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 AND CUST.CUST_INACTV_DT > ? COLCARDF 279 LOW2KEY 2004-03-04 HIGH2KEY 2004-04-07 AND A.ACCT_NUM = CUST.ACCT_NUM COLCARDF 26,527 / 26,527 AND A.CUST_EFCT_DT <= ? COLCARDF 2,496 LOW2KEY 1994-09-02 HIGH2KEY 2004-04-06 AND A.CUST_INACTV_DT > ? COLCARDF 274 LOW2KEY ‘2004-03-04’ HIGH2KEY ‘2004-04-07’ AND A.ADDR_TYP_CD = ‘ ‘ COLCARDF 5 Most selective by far

  18. Index analysis • One significant input to the optimizer is… • Available indexes • What join sequence they encourage • Some index performance considerations • Provide efficient access for local predicates • Encourages table to be outer table • Provide efficient access for join predicates • Encourage access to table as INNER table of join • Provide ordering to avoid sort • Analysis: • Are there appropriate indexes to support this query?

  19. Identify indexes Table: SETL_TRANS INDEX IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH INDEX: IXBRNC01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRNC02 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR INDEX: IXBRAD01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRAD02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT)

  20. Index candidate usage Table: AJT_SETL_TRANS INDEX IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH INDEX: IXBRNC01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX IXBRNC02 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR INDEX: IXBRAD01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRAD02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT) Key: RED = Range predicate, stops matching BLUE: Join predicate GREEN: Local equals predicate / in-list

  21. Index design analysis(by table) • BRANCH table (Index design OK!) • Index IXBRNC02 supports local access • CONCERN: Predicate on this column has filter factor grossly overestimated, so optimizer will perceive the access to be more efficient to this table than what really occurs! • Index IXBRNC01 supports join access • BRANCH_ADDR table (Index design OK!) • Index IXBRAD01 leading column on local filtering • Predicate on this column has filter factor grossly over estimated • Allows table to be considered as inner table efficiently • Index IXBRAD02 leading column supports join • Allows table to be an efficient inner table

  22. Index design analysis(by table) • SETL_TRANS table (Not OK!) • INDEX IXSTRN01 has one index. • No efficient for join • join predicate needs to be leading col) • No efficient index for outer access • Leading column of index qualifies ALL rows

  23. Overlay table size Table: SETL_TRANS CARDF 1,600,254 NPAGES 21,627 INDEX IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) TABLE: BRANCH CARDF 31,696 NPAGES 1132 INDEX: IXBRNC02 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRNC01 (ACCT_NUM, CUST_EFCT_DT) TABLE: BRANCH_ADDR CARDF 58,627 NPAGES 2791 INDEX: IXBRAD01 (CUST_INACTV_DT, CUST_EFCT_DT) INDEX: IXBRAD02 (ACCT_NUM, ADDR_TYP_CD, CUST_EFCT_DT) Key: RED = Range predicate, stops matching BLUE: Join predicate GREEN: Local equals predicate / in-list Biggest table, worst index Options. Must scan 1.6 million rows!

  24. Possible new indexes • Existing index IXSTRN01 (PROCESS_DT, CLR_CYCLE_CD, ADV_ABA_R, TYPE_CD, ACCT_NUM, STLMT_DT) • Efficient outer table access INDEX opt_1 (ADV_ABA_R, STLMT_DT, ACCT_NUM) • Efficient inner table access: INDEX opt_2 (ACCT_NUM)

  25. Summary of this SQL • Indexes on BRANCH, BRANCH_ADDR look better than they are • Range predicate with parameter marker estimates 3% of rows qualify • In reality, 99% qualify • Inefficient index available on SETL_TRANS table • No efficient outer table index available • No efficient inner table index available • This is the biggest table, with the best filter!!! • Optimizer bad join method due to combination of above factors • Performed full scan of transaction index 26,000 times • Resolution: • Providing new index on SETL_TRANS should provide more stable, faster access than ever before • REOPT, or providing literal values avoids the disaster without new index

  26. SQL 2 SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8

  27. Local predicate analysis SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 Skewed, not selective AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018  skewed, selective AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 • Both ‘A’ and ‘B’ tables have selective predicates. • COUNTRY_CD and PART_CD predicates – there is skew, optimizer assumes uniform distribution • B.PART_NUM – Slightly skewed. 3% one value. Uniform estimate is 0.4%. • PREFERRED – skewed, query searches for an infrequently occurring value. • Without looking at indexes, seems ‘A’ and ‘B’ will compete to be outer table • Qualified rows of 67.1 and 77.8 pretty close

  28. Local index analysis – ‘A’ SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 Skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 Y U 151 3 0.999 PART_CD 5 5 COUNTRY_CD 208 251 FILE 2496 3054 DR 46 3176 SECTOR 178 3548 PDV 16830 17598 IXPRT02 N D 128 2 0.794 PART_CD 5 5 PART_TYPE 8 28 PDV 16830 16850 FILE 2496 16905 IXPRT03 N D 26 2 0.998 PART_TYPE 8 8 PART_CD 5 28 COUNTRY_CD 208 579 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598

  29. Local index analysis – ‘A’ SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005  ??? ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2  ??? 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 Skewed, not selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 Y U 151 3 0.999 PART_CD 5 5 COUNTRY_CD 208 251 FILE 2496 3054 DR 46 3176 SECTOR 178 3548 PDV 16830 17598 IXPRT02 N D 128 2 0.794 PART_CD 5 5 PART_TYPE 8 28 PDV 16830 16850 FILE 2496 16905 IXPRT03 N D 26 2 0.998 PART_TYPE 8 8 PART_CD 5 28 COUNTRY_CD 208 579 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598

  30. Local index analysis B SELECT COLS FROM CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 WHERE B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective LOW2KEY=N HIGH2KEY=Y INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR02 N D 50 2 0.624 PART_NUM 260 278 IXCTR03 N D 56 2 0.348 BEGIN_DT 1015 1015 CONTRACTOR_ID 1047 2555 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 CONT_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555 PART_NUM 260 34722

  31. Local index analysis B SELECT COLS FROM CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 WHERE B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126  skewed, selective LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998  skewed, not selective LOW2KEY=N HIGH2KEY=Y • Note: SUB_CONTRACTOR is selective due to search for least frequent value. Is not in any candidate index. • Otherwise, local index support looks good. • May be able to drop IXCTR02 with reverse index scan support. INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR02 N D 50 2 0.624 PART_NUM 260 278 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 CONT_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555 PART_NUM 260 34722

  32. Local index analysis C SELECT COLS FROM CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 WHERE C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018  skewed, selective INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887 • Table C • There is index support for local filtering. • Trailing join column (good)

  33. Indexes for local summary • Each table with local filtering had efficient indexes to support local filtering • Positives: • Efficient access paths exist. • Negatives: • Each table will compete for the outer • More “apparently efficient” choices, more stress on optimizer, opportunity for incorrect choice

  34. Join graph B C E D A • Two most selective tables ‘A’ and ‘B’ not joined directly • C – D – E each join on same column (PRODUCT_ID) • Shaping up like ‘A’ with 67 outer rows as outer vs ‘B’ with 77 rows as outer

  35. Join considerations • Index support for certain join sequences • Indexes available to support matching index access for different desirable join sequences? • Join reduction / fan-out considerations • Consider expansion / contraction of result size through different join sequences

  36. Join indexes A SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 N P 83 2 0.782 PART_NUM 17598 17598 IXPRT02 N D 112 2 0.782 PART_NUM 17598 17598 PART_TYPE 8 17598 PART_CD 5 17598 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598 IXPRTxx N D 122 2 0.782 PART_NUM 17598 17603 PART_TYPE 8 17598 PART_CD 5 -1 COUNTRY_CD 208 17603

  37. Join indexes A • Join access available through join the ‘D’ table only • Via PART_NUM if ‘D’ is the outer • There are multiple indexes to support ‘A’ as inner • IXPRT02 and IXPRTxx appear redundant • IXPRTxx is superset of IXPRT02, same column sequence INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPRT01 N P 83 2 0.782 PART_NUM 17598 17598 IXPRT02 N D 112 2 0.782 PART_NUM 17598 17598 PART_TYPE 8 17598 PART_CD 5 17598 IXPRT04 N D 99 2 0.782 PART_TYPE 8 8 PART_NUM 17598 17598 IXPRTxx N D 122 2 0.782 PART_NUM 17598 17603 PART_TYPE 8 17598 PART_CD 5 -1 COUNTRY_CD 208 17603

  38. SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 Join indexes B INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CONT_TYPE 7 34728 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 BEGIN_DT 1015 34722 END_DT 2656 34722 CONT_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 BEGIN_DT 1015 2555 PART_NUM 260 34722

  39. Join indexes B • Join access available through join the ‘C’ table only • Via CONTRACTOR_ID if ‘C’ is the outer • There are multiple indexes to support ‘B’ as inner • IXCTR01 has PART_NUM as leading local • Join from outer will hit far fewer leaf pages due to leading local predicate • Smaller “swath” of leaf pages: NLEAF * 1/PART_NUM COLCARDF • 210 * (1/260) ~= 1 leaf page • Makes this index “outstanding” from inner index access perspective • Also an effective “outer” index since it provides good local filtering and join order for a join to ‘C’ table as inner • IXCTR04, IXCTR05 lead with join predicate • Support the join effectively • Join scattered over all leaf pages INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCTR01 Y P 210 3 0.962 PART_NUM 260 278 CONTRACTOR_ID 1047 34722 CE_TYPE 7 34728 IXCTR04 N D 316 3 0.927 CONTRACTOR_ID 1047 1047 PART_NUM 260 34722 CE_DTDIFFREEL 1015 34722 CE_DTLANCREEL 2656 34722 CE_TYPE 7 34728 IXCTR05 N D 250 3 0.896 CONTRACTOR_ID 1047 1047 CE_DTDIFFREEL 1015 2555 PART_NUM 260 34722

  40. SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 Join indexes C INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR01 Y U 21367 3 1.0 PRODUCT_ID 1391650 1391650 CONTRACTOR_ID 316 1794093 CO_DTHRCONTACT 1645213 2093750 IXCPR02 N D 14771 3 0.999 CONTRACTOR_ID 316 316 PRODUCT_ID 1391650 1794093 IXCPR03 N D 16188 3 0.998 CONTRACTOR_ID 316 316 CO_PHASECONTACT 4 783 PRODUCT_ID 1391650 1931232 IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887

  41. Join indexes C • Join access available through join the ‘B’, ‘D’, and ‘E’ tables • Via CONTRACTOR_ID if ‘B’ is the outer composite • Via PRODUCT_ID if ‘D’ or ‘E’ are in the outer composite • There is support for either join sequence. • CPNQCC02 has PRODUCT_ID as leading column to support ‘D’ or ‘E’ in outer composite • CPNQXC02 and IXCPR03 have CONTRACTOR_ID as leading join column if ‘B’ is the outer composite • IXCPR03 would also be a candidate if B were cartesianed with D or E. Not that I think that’s likely. • CPMQXCOH would likely be preferred index if ‘B’ were in outer composite • Selective leading local on PREFERRED bounds the leaf pages that would be hit to < 2% of all leaf pages • Makes ‘C’ a possible efficient outer – good local filtering, provides join ordering for join to ‘B’ table INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXCPR01 Y U 21367 3 1.0 PRODUCT_ID 1391650 1391650 CONTRACTOR_ID 316 1794093 CO_DTHRCONTACT 1645213 2093750 IXCPR02 N D 14771 3 0.999 CONTRACTOR_ID 316 316 PRODUCT_ID 1391650 1794093 IXCPR03 N D 16188 3 0.998 CONTRACTOR_ID 316 316 CO_PHASECONTACT 4 783 PRODUCT_ID 1391650 1931232 IXCPR04 N D 15352 3 0.998 PREFERRED 3 3 CONTRACTOR_ID 316 552 PRODUCT_ID 1391650 1808887

  42. Join indexes D SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPDA05 Y P 44900 4 0.975 PRODUCT_ID 7058356 7058356 IXPDA02 N D 70586 4 0.868 PART_NUM 6132 6132 PRODUCT_ID 7058356 7058356 IXPDA06 N D 66590 4 0.975 PRODUCT_ID 7058356 7058356 PART_NUM 6132 7058356

  43. Join indexes D • ‘D’ is accessed in multiple directions • Via PART_NUM if ‘A’ is the outer • Via PRODUCT_ID if accessed through ‘C’ or ‘E’ • Both join direction supported by matching index access. • RT_ENTID leading column of IXPDA02 • PRODUCT_ID leading column of IXPDA05, IXPDA06 • The non-primary key indexes are defined as allowing duplicates – but they cannot. • PRODUCT_ID is the primary key and is included in a unique index. • Any index which contains PRODUCT_ID therefore is unique. Defining as unique would save some space in the index. Duplicate indexes have slightly larger control structures to allow for duplicate RIDS. • DB2 must allow for duplicates if the index is not explicitly defined as unique since you could drop the unique index.

  44. SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 Join indexes E • Join access available through C and E tables • Both tables join on PRODUCT_ID column • Join is supported via IXPDA01 index • PRODUCT_ID only column • Unique index (no fan-out when joining to this table) INDEX CLU UR NLEAF NLEVEL CR KEYCOLNAME COLCARDF MCARDF IXPPA01 N U 141499 4 0.609 PRODUCT_ID 21366326 21366326

  45. Join fan-out SELECT COLS FROM PART A CARDF=17,598 QUALIFIED_ROWS=67.1 NPAGESF=1,467 , CONTRACTOR B CARDF=34,728 QUALIFIED_ROWS=77.8 NPAGESF=724 , CONT_PARTS C CARDF=2,093,750 QUALIFIED_ROWS=38,382 NPAGESF=52,189 , PARTS_PROD_ASMBLY D CARDF=7,058,356 QUALIFIED_ROWS=7,058,356 NPAGESF=68,644 , PARTS_PROD_ASM_DTL E CARDF=21,366,326 QUALIFIED_ROWS=21,366,320 NPAGESF=1,236,490 WHERE A.COUNTRY_CD = ? COLCARDF=208 MAX_FREQ=36.408% FF=0.005 ‘FR’ = 36.4% ‘GB’ = 17% ‘DE’=10% AND A.PART_CD = ? COLCARDF=5 MAX_FREQ=47.199% FF=0.2 4 = 47%, 2 = 27%, 6 = 17%, 1 = 8%, blank = < 1% AND A.PART_TYPE IN ( 'F', 'I', 'P' ) COLCARDF=8 MAX_FREQ=79.867% FF=0.958 AND B.PART_NUM = ? COLCARDF=260 MAX_FREQ=3.032% FF=0.004 AND B.SUB_CONTRACTOR = 'Y' COLCARDF=2 MAX_FREQ=87.402% FF=0.126 LOW2KEY=N HIGH2KEY=Y AND B.SUSPENDED = 'N' COLCARDF=2 MAX_FREQ=99.833% FF=0.998 LOW2KEY=N HIGH2KEY=Y AND C.PREFERRED = 'Y' COLCARDF=3 MAX_FREQ=76.832% FF=0.018 AND B.CONTRACTOR_ID = C.CONTRACTOR_ID COLCARDF=1,047/316 FF=9.551E-4 AND D.PART_NUM = A.PART_NUM COLCARDF=6,132/17,598 FF=5.682E-5 AND C.PRODUCT_ID = D.PRODUCT_ID COLCARDF=1,391,650/7,058,356 FF=1.417E-7 AND C.PRODUCT_ID = E.PRODUCT_ID COLCARDF=1,391,650/21,366,326 FF=4.68E-8 AND E.PRODUCT_ID = D.PRODUCT_ID COLCARDF=21,366,326/7,058,356 FF=4.68E-8 • Look at join fan-out issues • Qualified outer rows * (CARDF of inner / MAX(join colcardf) • A  D • 67.1 rows * (7,058,356 / 17598) ~= 27,000 rows • B  C or C  B • 77.8 rows * (2,093,750 / 1047) ~= 155,500 rows (after local filtering on C, down to 38K) • So B  C expected to fan-out far more.

  46. Explain • Join sequence • Access ‘A’ via index IXPRT01 (PART_CD, COUNTRY_CD, …) ~67 rows • Nested loop join to ‘D’ using index IXPDA02 (RV_ENT_ID, PRODUCT_ID) ~27,000 rows • Sort merge join to C • Sorting composite into PRODUCT_ID sequence • Access ‘C’ via IXCPR04 (PREFERRED, CONTRACTOR_ID) • Sorting new into PRODUCT_ID sequence ~7,900 rows • Nested loop join to B via index IXCTR01 ~7,900 rows • (PART_NUM, CONTRACTOR_ID, CE_TYPE) • Nested loop join to E via index IXPDA01 ~7,900 rows • (PRODUCT_ID) • Blue = local predicate • Green = join predicate

  47. Issues – A as outer? • Is local filtering to ‘A’ table accurate? • There is skew, but use of markers precludes recognition of skew • Qualified rows and fan-out could be much worse than estimated • ‘A’ as outer could be underestimated, depends on what values being used • Sort merge join to ‘C’ to avoid 27K probes • Does not want to probe 27k times matching + fan-out on PRODUCT_ID • Uses efficient local index instead • 1 probe to scan of 38k rows via PREFERRED • 27K probes * 2 rows per inner via index on PRODUCT_ID • Index on PREFERRED, PRODUCT_ID likely would might avert SMJ in this context • Hesitant to recommend index – since A  D  C could be an inefficient sequence.

  48. Issues – B / C as outer? • B as outer • Less skew on B.PART_NUM = ? – less uncertainty in cost estimate • Fan-out to 38K rows is discouraging • B  C supported by efficient local + equals index • (PREFERRED, CONTRACTOR_ID, PRODUCT_ID) • C also a desirable outer • Index on (PREFERRED,CONTRACTOR_ID,PRODUCT_ID) provides good local filter • Could access B via local filtering on B.PART_NUM = ?, materialize 77 rows into workfile for sort merge join

  49. Summary Query 2 • Bottom line: • Uniform distribution estimate on ‘A’ table allows it to compete very favorably. • If ‘FR’, ‘GB’, ‘DE’ values used for COUNTRY_CD – ‘A’ as outer no longer desirable. • Are ‘FR’, ‘GB’, ‘DE’ values frequently used for this query? • If PART_CD = ‘4’ value is used frequently – ‘A’ as outer no longer desirable. • Is ‘4’ used frequently? • Split query, REOPT, OPTHINTS… • Multiple choices • Local filtering spread across several tables • Estimated filtering looks good • Efficient access paths (index to support local, join predicates) exist • More difficult for optimizer to identify the cheapest path • Scenario more regression prone • Optimizer may need more statistics, ability to use more statistics (REOPT) for optimizer identify the cheapest path

  50. Commentary • How to perform SQL analysis • Format query so it’s readable • Annotate with important statistics • Tables: • Table cardinality, NPAGES, qualified number of rows • Predicates • COLCARDF, LOW2KEY, HIGH2KEY, filter factor estimate • Are table level estimates reasonable based on your knowledge? • If you don’t know – perform counts to find out if estimates are accurate • If you don’t know how selective things are, how will you know what the best path should be? • Are predicate level filtering estimates reasonable? • Reference table, index, indexed columns report • Is the best local filtering supported through matching index access? • Any mis-estimated local filtering that’s also matching indexable (may cause one path to look far more efficient than reality) • With trailing join predicates to provide order to next desired table (bonus) • Is there adequate (matching) index support for desired join sequences? • Develop understanding of “plausible” and “desirable” access paths • Examine EXPLAIN output • Does optimizer choose the path you expect? • If not, you should have better understanding of what makes other access paths competitive, tuning can be more targeted • Eg. Certain predicate appears filtering, but is not. • Can use REOPT, or trick – targeted to solve a specific problem. • Skilled targeted tuning is less susceptible to re-regress than blind tuning (where problem is not understood)

More Related