1 / 53

You’re Smarter than a Database

Learn how to overcome optimizer's bad cardinality estimates in SQL queries and improve query performance. Four examples and strategies to fix cardinality problems.

bettya
Download Presentation

You’re Smarter than a Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. You’re Smarter than a Database Overcoming the optimizer’s bad cardinality estimates

  2. About me • Bobby Durrett • US Foodservice • bobbyandmarielle@yahoo.com • Scripts in http://www.geocities.com/bobbyandmarielle/sqltuning.zip

  3. What you know

  4. What the database knows

  5. Before SQL • Example - mainframe Datacom/DB COBOL • List index names • Write loops read a from one index i1 where one.c=10 while more table one rows exist get next row read b from two index i2 where two.a = one.a while more table two rows exist get next row print one.a,two.b end while end while

  6. SQL • Tell what you want, not how to get it select one.a,two.b from one,two where one.c=10 and one.a=two.a;

  7. Pre-SQL versus SQL • Pre-SQL code very efficient – runs in megabytes – VSE mainframe COBOL • Labor intensive • SQL can be inefficient – runs in gigabytes (if you are lucky!) • Much more productive – do in minutes what took hours before – create tables

  8. What the database doesn’t know • Optimizer has a limited set of statistics that describe the data • It can miscalculate the number of rows a query will return, its cardinality • A cardinality error can lead optimizer to choose a slow way to run the SQL

  9. Example plan/Cardinality ------------------------------------------------- | Id | Operation | Name | Rows | Cost ------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 3 |* 1 | TABLE ACCESS FULL| TEST1 | 10 | 3 ------------------------------------------------- • Plan = how Oracle will run your query • Rows = how many rows optimizer thinks that step will return • Cost = estimate of time query will take, a function of the number of rows

  10. How to fix cardinality problems • Find out if it really is a cardinality issue • Determine the reason it occurred • Single column • Multiple columns • Choose a strategy • Give the optimizer more information • Override optimizer decision • Change the application

  11. Four examples • Four examples of how the optimizer calculates cardinality • Full scripts and their outputs on portal, pieces on slides – edited for simplicity

  12. Step 1: Find out if it really is a cardinality issue • Example 1 • Data select a,count(*) from test1 group by a; A COUNT(*) ---------- ---------- 1 10 • Query select * from test1 where a=1;

  13. Step 1: Find out if it really is a cardinality issue • Get estimated cardinality from plan ------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------- | 0 | SELECT STATEMENT | | 10 | |* 1 | TABLE ACCESS FULL| TEST1 | 10 | ------------------------------------------- • Do query for actual number of rows select count(*) from test1 where a=1;

  14. join table join table table Step 1: Find out if it really is a cardinality issue • Plan is a tree – find cardinality and select count(*) on part of query represented by that part of plan.

  15. Step 2: Understand the reason for the wrong cardinality • Unequal distribution of data: • Within a single column • Last name • “Smith” or “Jones” • Among multiple columns – • Address • Zipcode and State

  16. Step 2: Understand the reason for the wrong cardinality • Example 2 - Unequal distribution of values in a single column • 1,000,000 rows with value 1 • 1 row with value 2 select a,count(*) from TEST2 group by a; A COUNT(*) ---------- ---------- 1 1000000 2 1

  17. Step 2: Understand the reason for the wrong cardinality • SQL statement – returns one row select * from TEST2 where a=2;

  18. Step 2: Understand the reason for the wrong cardinality • Plan with wrong number of rows = 500,000 • Full scan instead of range scan – 100 times slower --------------------------------------------- | Operation | Name | Rows | --------------------------------------------- | SELECT STATEMENT | | 500K| | INDEX FAST FULL SCAN| TEST2INDEX | 500K| ---------------------------------------------

  19. Step 2: Understand the reason for the wrong cardinality • Column statistics – two distinct values LOW HIGH NUM_DISTINCT ---------- ---------- ------------ 1 2 2 • Table statistic – total # of rows – 1,000,001 NUM_ROWS ---------- 1000001

  20. Step 2: Understand the reason for the wrong cardinality • Rows in plan = (rows in table)/ (distinct values of column) • 500000=1000001/2 • Optimizer knew that there were only two values – assumed they had equal number of rows

  21. Step 2: Understand the reason for the wrong cardinality • Example 3 - Combinations of column values not equally distributed • 1,000,000 rows with values 1,1 • 1,000,000 rows with values 2,2 • 1 row with value 1,2 • ~ Equal numbers of 1’s and 2’s in each column A B COUNT(*) ---------- ---------- ---------- 1 1 1000000 1 2 1 2 2 1000000

  22. Step 2: Understand the reason for the wrong cardinality • SQL statement – retrieves one row select sum(a+b) from TEST3 where a=1 and b=2;

  23. Step 2: Understand the reason for the wrong cardinality • Plan with wrong number of rows = 500,000 • Inefficient full scan ---------------------------------------------- | Operation | Name | Rows | ---------------------------------------------- | SELECT STATEMENT | | 1 | | SORT AGGREGATE | | 1 | | INDEX FAST FULL SCAN| TEST3INDEX | 500K| ----------------------------------------------

  24. Step 2: Understand the reason for the wrong cardinality • Column statistics C LOW HIGH NUM_DISTINCT - ---------- ---------- ------------ A 1 2 2 B 1 2 2 • Table statistic – total # of rows – 2,000,001 NUM_ROWS ---------- 2000001

  25. Step 2: Understand the reason for the wrong cardinality • Rows in plan = (rows in table)/ (distinct values A * distinct values B) • 500000=2000001/(2 * 2) • Optimizer assumes all four combinations (1,1),(1,2),(2,1),(2,2) equally likely

  26. Step 2: Understand the reason for the wrong cardinality • How to tell which assumption is in play? • Select count(*) each column select a,count(*) from TEST3 group by a; select b,count(*) from TEST3 group by b; • count(*) each column combination select a,b,count(*) from TEST3 group by a,b;

  27. Step 3: Choose the best strategy for fixing the cardinality problem • Giving the optimizer more information • Histograms • SQL Profiles • Overriding optimizer decisions • Hints • Changing the application Try to use optimizer as much as possible to minimize development work

  28. Step 3: Choose the best strategy for fixing the cardinality problem • Giving the optimizer more information – using histograms • Works for unequal distribution within a single column • A histogram records the distribution of values within a column in up to 254 “buckets” • Works best on columns with fewer than 255 distinct values

  29. Step 3: Choose the best strategy for fixing the cardinality problem • Run gather_table_stats command to get histograms on the column – 254 is max number of buckets method_opt=>'FOR ALL COLUMNS SIZE 254'

  30. Step 3: Choose the best strategy for fixing the cardinality problem • Plan for Example 2 with correct number of rows with histogram • Uses range scan ----------------------------------------- | Operation | Name | Rows | ----------------------------------------- | SELECT STATEMENT | | 1 | | INDEX RANGE SCAN| TEST2INDEX | 1 | -----------------------------------------

  31. Step 3: Choose the best strategy for fixing the cardinality problem • Column statistics – two buckets LOW HIGH NUM_DISTINCT NUM_BUCKETS ---------- ---------- ------------ ----------- 1 2 2 2 • Table statistic – unchanged NUM_ROWS ---------- 1000001

  32. Step 3: Choose the best strategy for fixing the cardinality problem • Time without histograms (1 second): Elapsed: 00:00:01.00 • Time with histograms(1/100th second): Elapsed: 00:00:00.01

  33. Step 3: Choose the best strategy for fixing the cardinality problem • Giving the optimizer more information – using SQL Profiles • Works for unequal distribution among multiple columns • Includes information about the relationship between columns in the SQL – correlated columns or predicates

  34. Step 3: Choose the best strategy for fixing the cardinality problem • SQL Tuning Advisor gathers statistics on the columns ...DBMS_SQLTUNE.CREATE_TUNING_TASK(... ...DBMS_SQLTUNE.EXECUTE_TUNING_TASK(... • Accept the SQL Profile it creates to use the new statistics ...DBMS_SQLTUNE.ACCEPT_SQL_PROFILE (...

  35. Step 3: Choose the best strategy for fixing the cardinality problem • Example 3 plan with correct number of rows = 1 using SQL profile -------------------------------------------------- | Operation | Name | Rows | Bytes | -------------------------------------------------- | SELECT STATEMENT | | 1 | 6 | | SORT AGGREGATE | | 1 | 6 | | INDEX RANGE SCAN| TEST3INDEX | 1 | 6 | -------------------------------------------------|

  36. Step 3: Choose the best strategy for fixing the cardinality problem • Time without a profile (1 second): Elapsed: 00:00:01.09 • Time with a profile(1/100th second): Elapsed: 00:00:00.01

  37. Step 3: Choose the best strategy for fixing the cardinality problem • Overriding optimizer decisions – using hints • Example 4 has unequal distribution of column values across two tables – histograms and SQL Profiles don’t work • Hint forces index range scan • Small amount of additional code – not like Cobol on mainframe

  38. Step 3: Choose the best strategy for fixing the cardinality problem • Example 4 - SMALL table • MANY relates to 1 – there are many rows with value 1 • FEW relates to 2 – there are few with value 2 insert into SMALL values ('MANY',1); insert into SMALL values ('FEW',2);

  39. Step 3: Choose the best strategy for fixing the cardinality problem • Example 4 - LARGE table: • 1,000,000 rows with value 1 • 1 row with value 2 NUM COUNT(*) ---------- ---------- 1 1000000 2 1

  40. Step 3: Choose the best strategy for fixing the cardinality problem • SQL statement – returns one row select B.NUM from SMALL A,LARGE B where A.NUM=B.NUM and A.NAME='FEW';

  41. Step 3: Choose the best strategy for fixing the cardinality problem • Plan with wrong number of rows = 125,000 ---------------------------------------------- | Operation | Name | Rows | ---------------------------------------------- | SELECT STATEMENT | | 125K| | HASH JOIN | | 125K| | TABLE ACCESS FULL | SMALL | 1 | | INDEX FAST FULL SCAN| LARGEINDEX | 1000K| ----------------------------------------------

  42. Step 3: Choose the best strategy for fixing the cardinality problem • Column statistics – two buckets on all columns – using histograms LOW HIGH NUM_DISTINCT NUM_BUCKETS ---------- ---------- ------------ ----------- 1 2 2 2 LOW HIGH NUM_DISTINCT NUM_BUCKETS ---- ---- ------------ ----------- FEW MANY 2 2

  43. Step 3: Choose the best strategy for fixing the cardinality problem • Table statistics – SMALL has 2 rows, LARGE 1000001 NUM_ROWS ---------- 2 NUM_ROWS ---------- 1000001

  44. Step 3: Choose the best strategy for fixing the cardinality problem • 125000=1000001/8 • Optimizer appears to assume all eight combinations of the three columns’ values are equally likely • Can’t verify formula – references don’t include formula with histograms • Even worse without histograms – cardinality is 500000

  45. Step 3: Choose the best strategy for fixing the cardinality problem • No SQL profile from SQL Tuning Advisor: There are no recommendations to improve the statement. • Neither histograms nor SQL profiles help example 4

  46. Step 3: Choose the best strategy for fixing the cardinality problem • Statement with hints: • Use index • Don’t do full scan select /*+ INDEX(B LARGEINDEX) NO_INDEX_FFS(B LARGEINDEX) */ B.NUM from SMALL A,LARGE B where a.NUM=B.NUM and A.NAME='FEW';

  47. Step 3: Choose the best strategy for fixing the cardinality problem • Time without a hint (1 second): Elapsed: 00:00:01.03 • Time with a hint (1/100th second): Elapsed: 00:00:00.01

  48. Step 3: Choose the best strategy for fixing the cardinality problem • Changing the application • Change your tables so that the optimizer gets your SQL’s cardinality right • Requires more work designing tables, but keeps productivity benefits of SQL

  49. Step 3: Choose the best strategy for fixing the cardinality problem • Example 4 – moved NAME column to LARGE table and split table in two • One million (‘MANY’,1) rows in LARGEA • One (‘FEW’,2) row in LARGEB • Query: select NUM from (select * from largea union select * from largeb) where NAME='FEW';

  50. Step 3: Choose the best strategy for fixing the cardinality problem • Plan is just as efficient as with hint: • Number of rows = 2 (reality is 1) • Range Scan -------------------------------------------------------------- | Id | Operation | Name | Rows | -------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | | 1 | VIEW | | 2 | | 2 | SORT UNIQUE | | 2 | | 3 | UNION-ALL | | | | 4 | TABLE ACCESS BY INDEX ROWID| LARGEA | 1 | |* 5 | INDEX RANGE SCAN | LARGEAINDEX | 1 | | 6 | TABLE ACCESS BY INDEX ROWID| LARGEB | 1 | |* 7 | INDEX RANGE SCAN | LARGEBINDEX | 1 | --------------------------------------------------------------

More Related