A Method for Online Reorganization of a Database

A Method for Online Reorganization of a Database Gary Sockut (joint work with Thomas Beavin & Chung-Chia Chang) Work performed at IBM Silicon Valley Laboratory, San Jose, CA

Introduction: reorganization Reorganization: Δ way arranged (logical / physical). Any DBMS & database might need: • Δ logical database definition:generalize 1-many -► many-many, split column. • Δ physical database definition:construct index, split partition. • Restore physical arrangement of instances without Δ definition:* Compact space.* Collect garbage.* Restore clustering (store near each other; criteria). ▼ I/O. Writing degrades; reorganization restores. This method.

Introduction: offline vs. online reorganization Traditional: reorganize offline:  Reorganize online (concurrently with usage or incrementally within users’ transactions). Importance ▲. highly available (24 x 7) (web commerce, armed forces) USAGE BATCH WINDOW (offline OK) USAGE TIME REORGANIZATION very large 2 categories:unacceptable

Introduction: online reorganization Method for restoring clustering online. IBM’s Database 2 (DB2) relational DBMS. Rest of presentation: • Storage structures • Overview (unload, reload, process log); problem; solution • Steps of reorganization & more details • Comparison: previous research

Storage structures • Rows of tables in data records in file pages: • Not unique key.  File pages, indexes, entries in loguse record identifier (RID) to identify records: • Page # & # within page. ID map. • Δonly during reorganization. • Header of page: position in log (log record sequence number (LRSN)) current when most recently written. file page: data record data record LRSN data data * * ID map

Storage structures • Variable-length: grow: regular -► pointer & overflow: regulardata record pointerdata record file page: LRSN data RID * * ID map regulardata record overflowdata record file page: ID map LRSN data data * * bad!

Storage structures: writing Users write rows -► DBMS writes data records (& log entries). Usually, Insert / Update / Delete row -► 1 regular data record. Exceptions on Update: • Has regular & new data too large for regular's page -►I overflow & U regular to pointer. • Has overflow & new data fits on overflow's page -►U overflow to overflow. • Has overflow & new data too large for overflow's page -►* Now room on pointer's page -► U pointer to regular & D overflow.* Still no room on pointer's page -► I overflow, D overflow, & U pointer to pointer. Exception on Delete: has overflow -► D pointer & overflow. Backout (undo if failure) reverses.

Storage structures: reorganization • Database administrator invokes reorganization:* Restores clustering (clustering key: not necessarily unique).* Removes overflow.* Distributes free space. Improves performance. • Area being reorganized: set of 1 or more tablesor set of 1 or more partitions of 1 table (parameter). • Offline: • Read-only (R/O): unload data, sort. • No access: reload. • Online: Read/write (R/W): most. --►

Overview of method for online reorganization An inspiration: Fuzzy dumping: unload data (backup) while letting users write: • Dumping: (1) record current LRSN for log; (2)unload. • Recovery: (1) reload; (2) bring up to date:apply log entries (start: recorded LRSN).Ignore log entry whose LRSN ≤LRSN of indicated page. data-base back-up data-base back-up 1 2 log -► Fuzzy reorganization.

Overview of method for online reorganization UNLOAD,SORT, RELOAD READ A) Record current LRSN; unload, sort, reload (reorganize). (R/W): users’data old copyof area new copyof area WRITE name 0 name 1 log B) Process log (read; apply). (n iterations R/W; 1 R/O): READ users’data old copyof area new copyof area WRITE PROCESS name 0 name 1 log C) Switch users' accessing to new. (brief offline; R/W): READ users’data new copyof area WRITE name 1 name 0 log

Problem & solution Problem: reorganization Δ RIDs: • Log entries: old RIDs. • Apply: identify data record in new copy (new RID). Solution: • Temporary table maps old & new RIDs, stores LRSNs. • Translate log entries before applying. Novelty: interaction: maintain table & process log.

Main steps of reorganization (more detail) Unload data, sort, reload into new (includes indexes), create backup copy. Users R/W old. Table maps: old & new RIDs. Record current LRSN. • Record current LRSN for log. Process log: iterate:* Read subset of log between recorded LRSNs (users' writing): translate; apply to new. R/W.* Record current LRSN. Iterate or next step (criteria).* Last iteration: append changed pages to backup. Quiesce writing (block; wait until finished). Record LRSN. Process log. Append changed pages to backup. R/O. Quiesce all access. Switch future accesses: exchange names of files (data, indexes). Users: no access. Start R/W new. Delete old. --►

Unloading, sorting, & reloading (R/W) 1. Unload: scan sequentially:* Regular or overflow: unload data, old RID, & LRSN of page.* Pointer: add entry to mapping table. 2. Sort by clustering key. 3. Reload: also add entry to mapping table. 1. UNLOAD 3. RELOAD old copyof area new copyof area sort file oldRID newRID 2. SORT mappingtable 1. ADD ENTRY 3. ADD ENTRY

Processing of log (n R/W & 1 R/O) 5 phases 1. COPY pointers. sort: LRSN copy of log (buffer) log 5. APPLY(insertion: actual) 2. SORT BY OLD RID (speed access to mapping table) pointers. sort: old RID copy of log (buffer) 3. TRANSLATE RIDS (insertion: estimate, for sorting) mappingtable pointers. sort: old RID copy of log (buffer) 4. SORT BY NEW RID (speed access to new copy of area) pointers. sort: new RID copy of log (buffer) new copyof area pointers. sort: new RID copy of log (buffer) mapping table 3, 5, control of iterations --►

Phase 3: translation of RIDs DBMS's log application (recovery) ignores log entry if LRSN ≤ page’s. Translation ignores log entry if LRSN ≤ mapping table’s. Insertion: • R or O: estimate new RID; store in log entry (buffer);insert entry in mapping table (old, estimated new). • P: delete log entry; insert entry in mapping table (old, no new). • Deletion: • R or O: store new RID in log entry; delete mapping table’s entry. • P: delete log entry; delete mapping table’s entry. • Update: • R to R or O to O: store new RID in log entry. • P to P: delete log entry. • P to R: ~I: estimate new RID. Log entry: store estimated new RID; U -► I. Mapping table: store estimated new RID. • R to P: ~D: log entry: store new RID; U -► D.

Phase 5: application Scan set of pointers to log entries (sorted by new RID). For each RID value: • Find all pointers (contiguous). ≥ 1 D log entry -► delete certain log entries: * 1st entry is I -► delete last D & all preceding. * 1st entry is D or U -► keep last D; delete all preceding. Omit log entries for which no entries in mapping table; omit I U U D. Apply sequentially: I: * Insert in new copy; obtain actual new RID. * In mapping table & in log entries for current RID, estimated new RID -► actual. U or D: treat like DBMS's handling of user's.

Control of iterations of log processing time Iterate (allowing R/W) until: • Estimated time for next ≤ parameter for maximumR/O -► next is last (R/O). Parameter: trade-off. --► • Estimated completion for next > deadline(parameter) -► cancel reorganization. • Amount of log for next not sufficiently < current(not catching up) -► send message to operator:After delay (parameter), DBMS will continue,quiesce writing, or cancel (parameter).During delay, database administrator canΔ parameters, adjust priorities,quiesce writing, cancel, or let DBMS take action. Display status. HI LO iteration

Scheduling of online reorganization • High tolerance of delay -► R/O & offline tolerable. • Low rate of writing -► easy to catch up to log. • No long-running transactions -► quick quiescing. Trade-off.

Comparison with previous research Calculation of clustering: not novel; Sort by clustering key (index); assign to pages. Compare mapping tables, fuzzy reorganization: 1) Mapping tables: • Omiecinski et al.: reorganization in place;map RIDs to translate entries in leaves of indexes. • Wiener & Naughton: loading data into object database;map surrogate object identifiers (source file) into object IDs. 2) Several authors mention fuzzy reorganization (no detail). Unique identifier does not Δ.  No mapping table. Many customers dislike requirement. Fuzzy reorganization when RIDs Δ --►

Comparison with previous research old copy new copy data RID data O'Toole et al.: fuzzy garbage collection for persistent data. Forwarding field: In database context (not O'Toole), mapping table’s advantages: • Reorganization copies, user deletes, DBMS reuses space (not O'Toole). Mapping table safe; forwarding field gone. • Mapping table entry < data record.  Fewer pages. Less I/O to read/write forwarding information. • Less locking for old (unload old, reload/map, process log): Mapping table: (R, no, no). Forwarding field: (R, W, R/W?). • Avoid extra space (permanent) for forwarding field(O'Toole: space already existed). Forwarding field’s advantages: • One field (already existed: O'Toole), not two (temporary). • Seems simpler.

Possible (but rejected) alternatives reorganization • Reorganize only index online:support, but not enough. Reorganize online in place (not by copying): inaccuracy. --► Complexity and/or slowness, or low-concurrency locking. Δ▬►user’spositionin scan record’sposition record’sposition Reorganize partition (index) (support):fine-grained; reorganize offline:* Correct other indexes.* Slow routing, increase space for partition descriptors.* Less uniform growth/shrinkage -► increase total free space: • coarse: | highgrowthshrinkagelow growth | • fine: | highgrowth | shrinkage | low growth |

Publications Web site: Google GARY SOCKUT (http://alum.mit.edu/www/ghs); click “publications”: -► pdf. • This work: G. H. Sockut, T. A. Beavin, & C.-C. Chang, “A Method for On-line Reorganization of a Database,” IBM Systems Journal, Vol. 36, No. 3, 1997, pp. 411-436; erratum in Vol. 37, No. 1, 1998, p. 152. Web site: slides. • Survey on online reorganization (not just IBM, not just clustering, not just fuzzy): G. H. Sockut & B. R. Iyer, “Online Reorganization of Databases,” Computing Surveys, Vol. 41, No. 3, ACM, Article 14, July 2009, 136 pages. Web site: table of contents.

Summary Reorganize online: • Copy data from old to new in reorganized form; R/W.Table maps between old & new RIDs. • Apply log to new; map. • Switch users' accessing to new.

Appendices

Appendix: effect of fine-grained partitioning Population densities in the area covered by the 50 states: ~100K(demographia.com) urban area with many high-rise apartment buildings (Manhattan Community District 8: ~ Upper East Side) ~1K (census) NewJersey population density(people / square mile) finer ~1 (census) Alaska uninhabited area 0 fine (square mile) coarse (state) granularity of partitioning

A Method for Online Reorganization of a Database