580 likes | 899 Views
Indexing and Fragmentation Strategies Informix Chat with the Labs December 8, 2005. Mark Scranton Worldwide Informix Technical Strategist mark.scranton@us.ibm.com www.markscranton.com. ANNOUNCING :
E N D
Indexing and Fragmentation StrategiesInformix Chat with the LabsDecember 8, 2005 Mark Scranton Worldwide Informix Technical Strategist mark.scranton@us.ibm.com www.markscranton.com
ANNOUNCING: IDUG / IIUG 2006 North America ConferenceMay 7th-11th – Tampa Convention Center Tampa, Florida, USA • Attend in-depth Informix-specific educational seminars • Hear technical presentations by Informix R&D staff and fellow Informix users • Take advantage of networking opportunities • Visit products & services exhibitions For more information, go to www.iiug.org/confTo register, go to http://conferences.idug.org/namerica/2006/index.cfm * Note: All registration is handled by IDUG – The International DB2 User Group (IDUG). page 2
Who Am I? PUBLICATIONS • Contributor: “The Informix Handbook” • Author: “Bringing IDS Internals to the Surface • IBM Redbook(s) WEBSITE - www.markscranton.com • Tips, tricks, monthly updates. • Presentations, white papers, scripts MISCELLANEOUS • Advocacy Director - International Informix Users Group (IIUG – www.iiug.org) • Recipient – IIUG “Directors Award” for 2003 2005 INFORMIX ACTIVITY • Infobahns in 10 countries; ~ 30 cities • User Groups in 40 US cities WORK HISTORY • 1995-Oct 2004Informix & IBM Education Group • focused exclusively on IDS and XPS • education and consulting • user conferences & user groups • Oct 2004 – presentWorldwide Informix Technical Strategist • User groups/conferences • Technical proofs/benchmarks • Product futures & direction • Management “convincer” • Customer Visits • Competitive situations
management convincing local Informix user groups technical roundtables client visits I am available for…
includes: • late breaking news! • new technical content! • IFMX-related events! • current wind conditions wherever I am! Get On The List! informix-flash@iiug.org send email to informix-flash@iiug.org with SUBSCRIBE in the BODY (not SUBJECT)
Announcing: “The IDS 10.0 Cities Tour 2006” • targeting 10 large regional US cities • all-day in-depth technical presentations • including well-known names in select cities • best practices; IDS internals; performance considerations • Informix education discounts at each city and giveaways
Preface • “Indexing” and “Fragmentation” each could take days to cover exhaustively • each are critical to achieving top IDS performance for medium-to-large sites • a strong IDS foundational knowledge is necessary to cover these topics in-depth
Preface • in the next hour, I will cover: • some fundamentals of fragmentation and indexing • I will handle the topics separately • significant changes that have occurred in IDS v9.4 and v10.0 NOTE: any reference to a “fragment” or “structure” could mean a data fragment (table data), or and index fragment
OLTP characteristics: high volume of short transactions each transaction accesses a few rows index access method is used. For this environment: Fragment the data by round robin or expression. For large tables that receive a large percentage of transaction activity fragment the indexes using an expression based fragmentation strategy. IDS Fragmentation Review : OLTP
DW characteristics: low volume of long running queries queries access most of the rows in each table very few indexes are generally required preferred access method is sequential scan preferred join method is the hash join For this environment: fragment elimination parallel scan the needed fragments IDS Fragmentation Review : Data Warehousing
IDS Fragmentation • Tips: • <database>:sysfragments system catalog has a ton of information on the fragments. • a “detached index” becomes an entry in sysfragments versus sysindexes. • round robin fragmentation is terribly easy to implement, but has very few benefits. • expression-based fragmentation is much more difficult to implement, but the benefits can be superb.
Fragmentation Fact or Fiction True or False The primary consideration for when you should fragment a table is when it reaches X rows. (answer on next slide…)
Fragmentation Fact or Fiction Answer: False While table size is important, the first two considerations must be: query behavior & characteristics: fixed/canned or ad-hoc. knowledge of the data – well-known or always unknown. these two together will determine the fragmentation scheme, ie: round-robin or expression.
The First Requirement Do you know thy queries? Do you know thy data? (ok – the First Two requirements!) • do not “wander into” fragmentation lightly. • the more complex the environment, the more homework required to setup effective fragmentation.
Fragmentation and Extents dbspaces tab_adbs1 tab_adbs2 tab_adbs3 CREATE TABLE table_a (x INTEGER, y INTEGER, z CHAR (25)) FRAGMENT BY EXPRESSION x <= 10 and x >= 1 in tab_adbs1,x <= 20 and x > 10 in tab_adbs2,x <= 30 and x > 20 in tab_adbs3EXTENT SIZE120000 NEXT SIZE 60000; initial extent for each fragment is 12M
Fragmentation and Tablespaces application view: one logical table tab_adbs1 tab_adbs2 tab_adbs3 tablespace = fragment = partition tblsnum fragid partnum engine view: 3 structures
Fragmentation and Tablespaces application view: one logical table Fun Facts • each fragment has it’s own partition page in the tblspace tblspace for that dbspace. • each fragment can hit max extents or max table size. • the PARTNUM in <database>:systables will be “0” (zero) for the fragments • partnum or fragid stored in <database>:sysfragments tab_adbs1 tab_adbs2 tab_adbs3 tablespace = fragment = partition tblsnum fragid partnum engine view: 3 structures
pre-10.0: only one fragment per table in a single dbspace. 10.0+: multiple fragments per table in a single dbspace. an automatic partitioning* feature is being considered for vNext+ – this will allow it. IDS v10.0 Enhancement * will allow automatic partitioning when a structure reaches max size or pages.
Fragmentation Objectives ParallelismFragments are accessed in parallel, decreasing scan or insert time. scan threads * fragments Fragment EliminationUnneeded fragments are eliminated, decreasing scan or insert time, and reducing disk contention. scan threads * X X X X fragments Fragment Elimination & ParallelismBoth goals are achieved. scan threads * X X fragments * INSERTs, UPDATEs, SELECTs can also be done in parallel
cannot be done in: !=, IS NULL, IS NOT NULL can be done in: the fetch portion of INSERT, UPDATE, SELECT or DELETE - when the SQL statements are optimized nested-loop joins – after key value from outer table is retrieved, elimination can occur when searching the inner table IN, =, <, >, =>, <=, AND, OR, NOT, MATCH, LIKE range expressions combined with !=, IS NULL, IS NOT NULL Fragmentation Objectives : Fragment Elimination
Fragmentation Objectives : Fragment Elimination • Below shows conditions when the optimizer can or cannot eliminate fragments NOTE: there are more slides on this topic in the “Reference Material” section.
Fragmentation Objectives: Parallelism Parallelism in the Workplace • X number of fragments accessed in parallel. • can cause device contention • but the completion speed of the operation could outweigh that concern • default access scheme for “round robin” fragmentation • fragments cannot be eliminated with round robin 1 toy; 2 grandchildren - being entertained in parallel*. * elimination is not appropriate.
Fragmentation Fact or Fiction True or False If you fragment your table data, and create an index on that table, it becomes fragmented by default. (answer on next slide…)
Fragmentation Fact or Fiction Answer: True If you issue a CREATE INDEX… without specifying a storage clause/fragmentation scheme, the index is fragmented to follow the data into the respective dbspaces. Note that the index pages are not interleaved with the data pages in the table extents – they have their own extents within the appropriate dbspace. This will be covered in the next section.
Indexing: Fundamentals – Attached & Detached • 7.3 and before: default was “attached”. • data pages are interleaved with index pages within an extent. • index fragment(s) will always be in the same dbspace(s) as the table fragment(s). • an index fragment will only point to data in the table fragment occupying the same dbspace • 9.2+: default is “detached” • index pages are in their own extent(s) • index fragments can be in the same or different dbspace than the data • there is some confusion about the meaning of “detached”
Detached/Attached Indexes create table …; onpload…; create index …; v7 v9.2+ default behavior dbspace1 dbspace2 BM data BM data BM index index data index data data index index index data data data data index index index index index data data data data index index index index extent extent dataextent “attached” “attached or detached”???
Historical View of Detached create table …; onpload…; create index …in <dbspace>; create table …; onpload…; create index …fragment by…; OR v7 AND v9.2+ dbspace1 dbspace2 BM data BM index index data data index index index data data index index index data data index index index index extent extent “attached” “detached”
Index Fragmentation create table …fragment by…; onpload…; create index …; v7 AND v9 dbspace1 BM data BM index index data data index index index data data index index index data data index index index index extent data extent “attached or detached”???
Attached Index on a Fragmented Table • Large table DSS or OLTP environment. • Attractive index parallel scans. • Attractive index fragment elimination and smaller btrees. • Attractive scans on data pages in parallel. • Balanced I/O for indexes and data pages.
Detached Fragmented Index on a Non-fragmented Table • OLTP environment with high index hits vs. data page hits (key only reads). • Attractive index scans in parallel • Attractive index lookups with fragment elimination and smaller btrees. • Unattractive scans on data pages in series.
Detached Index on a Fragmented Table • DSS environment with some selective queries. • Attractive scans on data pages in parallel. • Unattractive index read in series.
Detached Fragmented Index on a Fragmented Table • Mixed OLTP and DSS environments with data fragmented for DSS and index fragmented of OLTP or Selective queries and non-selective queries on different columns in a DSS environment. • Attractive index parallel scans. • Attractive index fragment elimination and smaller btrees. • Attractive scans on data pages in parallel. • Balanced I/O for indexes and data pages.
Indexing: Historical Issues w/ Btree Cleaner • v7.x – 9.3 • pages that were freed and reused could confuse the B-tree cleaner • complex code required to invalidate requests • single list caused contention • single B-tree cleaner can get overwhelmed with large workloads • no priority in cleaner requests • long lists of committed deleted items left a bloated index • a single btree cleaner would cause bloated indexes RESULT: frequent rebuilds were necessary for efficiency
Enter Btree Scanners (9.4) • The workload for cleaning indexes will be prioritized • the index which causes the server to do the most work will be the next index cleaned • An index will have its leaf level examined looking for deleted items • Dynamic configuration of threads to allow for configurable workloads • can be added/dropped on-the-fly and tuned.
Indexing: v10.0 enhancements • Configurable Page Sizes • allows wider indexes • index rows cannot be split across pages • page sizes from 2K through 16K • 3000 byte index limit allows: • wider indexes • expanded UNICODE support
IDS v10.0 Enhancement – Online Index Build • The [ CREATE | DROP ] INDEX ... ONLINE statement allows the creation/dropping of an index without having an exclusive lockplaced on the table during the duration of the index build. • You can use the CREATE INDEX … ONLINE statement even when reads or updates are occurring on the table. This means index creation can begin immediately. • If you use this syntax to create an index on a table that other users are accessing, the index is not available until no user is updating the table. • After you issue the new syntax to drop an index, no one can reference the index, but current DML operations can use the index until they terminate. • Dropping the index is deferred until no user is using the index.
The Ever-Changing Engine Two+ Engines – specific to OLTP or DW by engine or shift. Mostly due to limited resources. OLTP Engine DW Engine Old School The Hybrid Engine - many engines now are not exclusively OLTP or DW. Now an abundance of resources are available. OLTP DW New School
What’s New with Queries? • OLTP queries • return more rows than before • sequential scans may be a preferred method due to result set size • more resources required for OLTP • DW queries • many clients are moving their DWing to IDS • IDS can handle many environments that only XPS could before
Warning! Warning! • Some true stories from the road… • watch data growth as disk is more plentiful • once had a client with 66,000+ extents • don’t congest your engine as you add horsepower • had a client that was trying to run 18,000+ reports in what was originally an OLTP engine
IDS 10.0 Enhancement : Configurable Page Sizes Buffer Cache 2K pages Buffer Cache 16K pages Buffer Cache 8K pages MEMORY rootdbs 2K pg dbspace3 2K pg dbspace1 16K pg dbspace2 16K pg dbspace4 8K pg DISK Benefit: will allow appropriate cache sizing and page sizing for large table/indexes.
IDS 10.0 Enhancement : External Optimizer Directives This associates AVOID_INDEX and FULL directives with the specified query. SAVE EXTERNAL DIRECTIVES /*+ AVOID_INDEX (table1 index1)*/ , /*+ FULL(table1) */ ACTIVE FOR SELECT col1, col2 FROM table1, table2 WHERE table1.col1 = table2.col1 The inline INDEX directive is ignored by the optimizer when the external directives are applied to a query that matches the SELECT statement. Benefit: will allow influencing of canned or closed queries – both OLTP or DW.
IDS v10.0 Enhancement : Memory Allocation for non-PDQ Queries • You can specify how much memory is allocated to non-PDQ queries. • The default of 128K can be insufficient for queries that specify ORDER BY, GROUP BY, hash joins, or other memory-intensive options. • Use the new configuration parameter, DS_NONPDQ_QUERY_MEM, to specify more memory than the 128K that is allocated to non-PDQ queries by default. Benefit: will allow DBA to give appropriate memory to OLTP queries (non-PDQ) without setting PDQ or disrupting the PDQ environment.