1 / 40

SQL Top-N and Pagination Pattern

SQL Top-N and Pagination Pattern. Maxym Kharchenko. What is top-N. Give me the top 10 salaries in the “Sales” dept Give me the top 10 best selling books Give me the 10 latest orders. What is top-N. Setup. SQL> @ desc cities Name Null? Type

cadee
Download Presentation

SQL Top-N and Pagination Pattern

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SQL Top-Nand Pagination Pattern Maxym Kharchenko

  2. What is top-N • Give me the top 10 salaries in the “Sales” dept • Give me the top 10 best selling books • Give me the 10 latest orders

  3. What is top-N

  4. Setup SQL> @desc cities Name Null? Type -------------------------- -------- --------------- NAME NOT NULL VARCHAR2(100) STATE NOT NULL VARCHAR2(100) POPULATION NOT NULL NUMBER PCTFREE 99 PCTUSED 1 http://www.census.gov

  5. Naïve Top-N Give me the top 5 cities by population SELECT name, population FROM cities WHERE rownum <= 5 ORDER BY population DESC; NAME Pop ---------------------- ------ Robertsdale city 5,276 Glen Allen town (pt.) 458 Boligee town 328 Riverview town 184 Altoona town (pt.) 30 Statistics 7 consistent gets

  6. Naïve Top-N explained ----------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Time | ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 110 | 00:00:01 | | 1 | SORT ORDER BY | | 5 | 110 | 00:00:01 | |* 2 | COUNT STOPKEY | | | | | | 3 | TABLE ACCESS FULL| CITIES | 10 | 220 | 00:00:01 | -----------------------------------------------------------------

  7. Correct top-N query SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC ) WHERE rownum <= 5; SELECT name, population FROM cities ORDER BY population DESC FETCH FIRST 5 ROWS ONLY; <= 11g >= 12c

  8. Correct top-N query: Execution SELECT * FROM ( SELECT name, population FROM cities ORDER BY population DESC ) WHERE rownum <= 5; NAME Pop -------------------- ---------- Los Angeles city 3,792,621 Chicago city (pt.) 2,695,598 Chicago city (pt.) 2,695,598 Chicago city 2,695,598 New York city (pt.) 2,504,700 Statistics 56024 consistent gets

  9. Reading, filtering and sorting --------------------------------------------------------------------- | Id | Operation | Name | Rows |TempSpc| Time | --------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | | 00:01:58 | |* 1 | COUNT STOPKEY | | | | | | 2 | VIEW | | 56072 | | 00:01:58 | |* 3 | SORT ORDER BY STOPKEY| | 56072 | 1768K| 00:01:58 | | 4 | TABLE ACCESS FULL | CITIES | 56072 | | 00:01:54 | ---------------------------------------------------------------------

  10. Proper data structure Ordered By: Population CREATE INDEX i_pop ON cities(population); -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 56072 | 00:00:01 | | 4 | INDEX RANGE SCAN DESCENDING| I_POP | 10 | 00:00:01 | -------------------------------------------------------------------- Statistics 12 consistent gets

  11. Why index works Ordered By: Population • Colocation • Can stop after reading N rows • No Sort CREATE INDEX i_pop ON cities(population);

  12. More elaborate top-N Give me the top 5 cities by population in Florida SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 5; NAME Pop -------------------- ---------- Jacksonville city 821,784 Miami city 399,457 Tampa city 335,709 St. Petersburg city 244,769 Orlando city 238,300 Statistics 264 consistent gets

  13. Uncertain nature of filtering Ordered By: Population WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 5; WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 200; Statistics 264consistent gets Statistics 19747consistent gets

  14. Multi column indexes CREATE INDEX i_state_pop ON cities(state, population); where state=‘FL’ AL AK AZ CO FL MA WA State Population • Ordered By: • State • State+Population • Not Ordered by: Population WHERE state=‘FL’ NOW: Ordered By: Population

  15. Multicolumn indexes ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1099 | 00:00:01 | |* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 11 | 00:00:01 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida') Statistics 12 consistent gets

  16. Trips to the table ------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 1099 | 00:00:01 | |* 4 | INDEX RANGE SCAN DESCENDING| I_STATE_POP | 11 | 00:00:01 | ------------------------------------------------------------------------- Predicate Information (identified by operation id): 1 - filter(ROWNUM<=5) 4 - access("STATE"='Florida') Statistics 12 consistent gets

  17. Index range scan: cost math Window: 500 records 4-5 logical reads ~ 5-10 logical reads ~ 10-500 logical reads

  18. Covering index CREATE INDEX i_state_pop ON cities(state, population); CREATE INDEX i_state_pop_c ON cities(state, population, name); Statistics 12 consistent gets Statistics 7 consistent gets -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I_STATE_POP_C | 506 | 00:00:01 | --------------------------------------------------------------------------

  19. Ideal top-N • Use the index • Make the best index • And read only from the index

  20. Less than ideal top-N • Effect of query conditions • Effect of deletes and updates • Technicalities

  21. Condition better! WHERE active != 'N' ORDER BY order_date DESC ) WHERE rownum <= 10; CREATE TABLE orders ( … active char(1) NOT NULL CHECK (active IN ('Y', 'N')) WHERE active = 'Y' ORDER BY order_date DESC ) WHERE rownum <= 10; Statistics 12345 consistent gets Statistics 10 consistent gets

  22. Trade WHERE for ORDER BY CREATE INDEX t_idx ON t(a, b, c); SELECT * FROM (SELECT * FROM t WHERE a=12 ORDER BY c) ) WHERE rownum <= 10; Statistics 1200 consistent gets WHERE a=12 ORDER BY c Statistics 12 consistent gets WHERE a=12 ORDER BY b, c Statistics 12 consistent gets WHERE a=12 AND b=0 ORDER BY c

  23. Tolerate filtering SELECT * FROM ( SELECT name, population FROM cities WHERE state != 'Florida' ORDER BY population DESC ) WHERE rownum <= 10; Statistics 28 consistent gets

  24. Tolerate filtering -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 11 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 11 | 00:00:01 | |* 3 | TABLE ACCESS BY INDEX ROWID | CITIES | 55566 | 00:00:01 | | 4 | INDEX RANGE SCAN DESCENDING| I_POP | 12 | 00:00:01 | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(ROWNUM<=10) 3 - filter("STATE"<>'Florida')

  25. Updates and Deletes SQL> @desc cities2 Name Null? Type ---------------------- -------- ---------------- NAME NOT NULL VARCHAR2(100) STATE NOT NULL VARCHAR2(100) POPULATION NOT NULL NUMBER BUDGET_SURPLUS NOT NULL VARCHAR2(1) CREATE INDEX i2_pop ON cities2(budget_surplus, population, name);

  26. Updates and Deletes SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' ORDER BY population DESC ) WHERE rownum <= 5; ------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 12 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 | ------------------------------------------------------------------- Statistics 7 consistent gets

  27. Updates and Deletes UPDATE cities2 SET budget_surplus='N' WHERE rowid IN ( SELECT * FROM ( SELECT rowid FROM cities2 ORDER BY population DESC ) WHERE rownum <= 200); ------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 12 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_POP | 56067 | 00:00:01 | ------------------------------------------------------------------- Statistics 207 consistent gets

  28. Updates and Deletes

  29. Updates and Deletes ALTER TABLE cities2 ADD (version number default 0 NOT NULL); CREATE INDEX i2_vpop ON cities2(budget_surplus, version, population); UPDATE cities2 SET version=1WHERE budget_surplus='Y' AND version=0; Budget_surplus Y Y Budget_surplus Version 0 Y 1 Population

  30. Updates and Deletes SELECT * FROM ( SELECT name, population FROM cities2 WHERE budget_surplus='Y' AND version=1 ORDER BY population DESC ) WHERE rownum <= 5; -------------------------------------------------------------------- | Id | Operation | Name | Rows | Time | -------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 00:00:01 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 1 | 00:00:01 | |* 3 | INDEX RANGE SCAN DESCENDING| I2_VPOP | 1 | 00:00:01 | -------------------------------------------------------------------- Statistics 9 consistent gets

  31. Pagination SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 20 ) WHERE rn > 10; SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 10;

  32. Dumb Pagination ) WHERE rownum <= 20 ) WHERE rn > 10; Statistics 22 consistent gets ) WHERE rownum <= 30 ) WHERE rn > 20; Statistics 32 consistent gets

  33. Smart pagination SELECT * FROM ( SELECT name, population FROM cities WHERE state='Florida' AND population < 154750 ORDER BY population DESC ) WHERE rownum <= 10; SELECT * FROM ( SELECT * FROM ( SELECT name, population, rownum AS rn FROM cities WHERE state='Florida' ORDER BY population DESC ) WHERE rownum <= 20 ) WHERE rn > 10; Statistics 22 consistent gets Statistics 12 consistent gets

  34. Top-N with joins SELECT * FROM ( SELECT c.name as city, c.population, s.capital FROM cities c, states s WHERE c.state_id = s.id AND c.state='Florida' ORDER BY c.population DESC ) WHERE rownum <= 5 / Driving table: Filter state Order By population Join state_id Select name Joined to table: Join id Select capital Use Nested Loops! Build indexes like this!

  35. Top-N with joins: Good ------------------------------------------------------- | Id | Operation | Name | Rows | Time | ------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:13 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:13 | | 3 | NESTED LOOPS | | 10 | 00:00:13 | |* 4 | INDEX RANGE SCAN| I_C | 506 | 00:00:07 | |* 5 | INDEX RANGE SCAN| I_S | 1 | 00:00:01 | -------------------------------------------------------

  36. Top-N with joins: Bad ----------------------------------------------------------- | Id | Operation | Name | Rows | Time | ----------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 00:00:07 | |* 1 | COUNT STOPKEY | | | | | 2 | VIEW | | 10 | 00:00:07 | |* 3 | SORT ORDER BY STOPKEY| | 10 | 00:00:07 | |* 4 | HASH JOIN | | 10 | 00:00:07 | |* 5 | INDEX RANGE SCAN | I_C | 506 | 00:00:07 | |* 6 | INDEX RANGE SCAN | I_S | 1 | 00:00:01 | -----------------------------------------------------------

  37. Gotchas? TMI “Too many indexes”

  38. Thank you!

  39. Query conditions WHERE state = 'Florida' WHERE state != 'Florida' where state != ‘FL’ where state=‘FL’ AL AK AZ CO FL MA WA State Population

  40. Watch out for DESC/ASC CREATE INDEX i_s_pop ON cities(state, population); WHERE state >= 'Florida' ORDER BY state, population DESC ) WHERE rownum <= 10 WHERE state >= 'Florida' ORDER BY state, population ) WHERE rownum <= 10 Statistics 107408 consistent gets Statistics 12consistent gets AL AK FL GA HI MA WA … … + SORT NO SORT

More Related