Access Methods for Next-Generation Database Systems

Access Methods for Next-Generation Database Systems Marcel Kornacker UC Berkeley

Overview and Motivation • this talk’s topic: access method (AM) extensibility • how to support novel AMs in extensible ORDBMSs • not: yet another point, spatial, metric, ... AM • outline: • AM extensibility architecture for ORDBMSs • concurrency & recovery • AM performance analysis

Overview and Motivation • why bother with AMs: • we have B-trees • our customers don’t care • well... • new apps need ORDBMS support: GIS, multimedia, genomic sequence databases, etc. • customers do care about fast access to data • B-trees won’t help

Overview and Motivation • AM extensibility: what has been done • slew of papers about novel AMs • slew of papers about extensible DBMSs (in 80s!) • what needs to be done: • storage-level techniques for ORDBMSs:reconciling functionality with performance and reliability • AMs crucial to performance, deserve special attention

Outline • Overview and Motivation • High-Performance Extensible Indexing with Generalized Search Trees • AM Support in Commercial DBMSs • GiST Overview • IUS Implementation Overview • Concurrency and Recovery • Access Method Performance Analysis

AM Support in Commercial DBMSs • OR modeling and extensibility very successful in commercial DBMS, but: • AM extensibility has not received same degree of attention • DBMS vendors now struggling to add/improve novel (spatial) AMs • State of art: IUS virtual index interface/Oracle extensible indexing interface: • iterator interface: open(), getnext(), close(), insert(), delete() • AM handles internally: locking, recovery, page management, … (same as built-in AMs)

AM Support in Commercial DBMSs (2) • What’s wrong with this interface • concurrency and recovery need to be (re-) implemented for each new AM • difficult to implement: AM developer = domain expert, rarely also DBMS internals expert • would prefer to deal purely with AM specifics: Generalized Search Tree

Generalized Search Tree Overview • Generalized Search Tree (GiST) = template index structure • extensible set of data types and queries • customize tree behavior through extension methods • examples: B-trees, R-trees, … • details: Hellerstein, Naughton, Pfeffer, VLDB ‘95 • GiST provides • basic structure: height-balanced tree • template algorithms: search, insert and delete • no assumptions about keys and how they’re arranged • AM developer provides key-specific functions and particular operational properties

Generalized Search Tree Overview GiST AM Parameters Subtree predicate (SP)=User-defined key ... SP1 SP2 SPn Consistent( ) returns true/false Penalty( ) returns insertion cost ….. PickSplit( ) splits page items into two groups . . . Union( ) updates SPs Internal Nodes Leaf Nodes

Generalized Search Tree Overview • More suitable basis for AM extensibility than iterator interface, because: • raises level of abstraction, allows developer to focus on AM specifics/performance-relevant AM properties (clustering, internal predicates) • built-in features (e.g., page handling, tree traversal) need not be re-implemented for specific AMs

Generalized Search Trees Overview • Goal of thesis work: making idea industrial strength: • efficiency: Informix Universal Server impl (VLDB ’99) • concurrency & recovery (SIGMOD ’97) • analysis framework (demo SIGMOD ’98)

IUS GiST • Datatype extensibility: • IUS allows UDTs to be indexed by built-in AMs • require same degree of datatype extensibility for GiST • Good performance: • operations on UDTs implemented via UDFs, which are orders of magnitude slower than regular function calls • GiST needs to avoid large number of UDF calls • Intra-page storage format: • original GiST design assumed R-tree-like page layout • IUS GiST needs to allow customized formats (e.g., B-tree, hB-tree) to support compression or simplified access

IUS GiST - Summary • Extensibility architecture: • GiST core in server • AM extension in DataBlade, implements GiST API • Improved GiST API • page-based, not entry-based: few UDF calls • leave page format to AM extension: very flexible

IUS GiST - Performance • Comparison of GiST-based and built-in R-trees in IUS: • built-in R-tree: datatype extensible • software engineering: 1,400 lines of C for GiST R-tree vs. 10,000 lines of C for built-in R-tree (GiST core: about 10,000 lines of C) • performance: identical # of I/OS, GiST uses 14 to 40% less CPU time • Reasons for GiST performance advantage: • far fewer UDF calls for GiST R-tree: on average 1 per page (built-in: 1 per page entry) • but: higher set-up cost for GiST: needs to set up descriptors for 11 interface UDFs (built-in: only needs 7 UDFs)

Outline • Overview and Motivation • High-Performance Extensible Indexing with Generalized Search Trees • Concurrency and Recovery • Physical CC - concurrent index operations • Logical CC - transaction isolation • Recovery and how it affects concurrency • Access Method Performance Analysis

Concurrent Index Ops • Problem: concurrent structure modifications(example: B-tree) • Lock-coupling strategy with repositioning (ARIES/IM): requires key ordering • B-link tree strategy: compensation during traversal

Concurrent Index Ops • Navigating linked nodes: • how to detect node split? • when to stop going right? • Structural GiST extensions • global sequence counter • sequence number (NSN) for each node

Concurrency Extensions • Remember and compare global counter with NSNs • NSNs allow split compensation independently of key properties • Can be implemented very efficiently in typical WAL environments

A Word on Latch-Coupling • Popular technique for B-trees (ARIES/IM): • hold parent latched while going to child • avoid invalid pointers after node deletion • Won’t work for GiST: • GiST search might traverse multiple subtrees • either: keep parent latched while traversing all children (low conc.) • or: reposition in parent after traversing each child • but: repositioning requires partitioning

Transaction Isolation • SQL Isolation Levels: • most of it: locks on base table items (unrelated to index) • hard part: “serializable” isolation level (i.e., preventing phantoms) • Phantom Problem: • SELECT specifies logical range • need to prevent insertions into that range • can’t lock non-existing items

Transaction Isolation: B-Trees • B-Trees: Example of Key-Range Locking • partition data space into intervals • each leaf item corresponds to interval • scan from 5 to 9: lock data in range, lock next key • insert 9: check interval 8-10 (check item 10)

Transaction Isolation: GiSTs • But: this doesn’t work for GiSTs! • Example: • 2-dim point keysand search rectangle • what are next keys?where do we findthem in tree?

Predicate Locking • Idea: • readers register shared predicate (search qual.), check exclusive predicates • updaters register exclusive predicate (updated item), check shared predicates • Compared to key-range locking • perfectly accurate, maximal concurrency • expensive: evaluate lots of predicates • no gradual expansion:(1) lock entire range, even when cursor stops early (2) nearest-neighbor search locks entire tree

Hybrid Locking • Instead: novel hybrid mechanism • 2-phase locking of retrieved, inserted and deleted data records • restricted pred locking for phantom avoidance (‘‘covering the holes’’) • Restricted predicate locking • search predicate attached to every visited node during traversal • structure mods replicate predicate attachments • no insert/delete predicates • insert only checks target leaf’s predicates

Hybrid Locking • Example

Hybrid Locking • In comparison to predicate locking: • retains perfect accuracy • fewer predicates to check (only those on leaf) • search and delete don’t check predicates • almost gradual expansion • In comparison to key-range locking: • structure mods more expensive (replicating predicate attachments) • high implementation complexity (replicating ...): difficult/expensive to implement

Node Locking • Simplified hybrid locking: • replace pred. attachments with node locks • block structure mods that would need to replicate node locks • Comparison to hybrid locking: • diminished accuracy/concurrency • structure mods cheap • higher implementation complexity

AM Recovery • Purpose: • bring AM structure into working condition • restore AM contents to reflect committed xacts • Reminder: WAL recovery • every update writes log record with redo and undo info • rollback: apply undo portion in reverse chronological order • restart, phase 1: apply redo portion in chronological order • restart, phase 2: undo all uncommitted xacts

GiST Recovery • Keys to high concurrency: • separate updates into contents change and structure modification (operation, SMO) • “commit” SMOs separately and immediately • “logical” undo of contents change:re-locate leaf, compensate update, only lock data, not structure • Logical undo: • cannot re-traverse tree structure, but can follow rightlinks • in Aries/IM, need tree-global latch to avoid this situation

Summary & Conclusion • GiST concurrency and recovery techniques: • Concurrency: adaption of link technique • Isolation: hybrid predicate/2-phase locking • Recovery: WAL-based a la ARIES • invisible to AM developer:can write industrial-strength AMs without knowledge of server internals

Outline • Overview and Motivation • High-Performance Extensible Indexing with Generalized Search Trees • Concurrency and Recovery • Access Method Performance Analysis • Motivation • Goals of Framework • Analysis Framework with Examples • amdb • Conclusion

Motivation • Access method (AM) design and tuning is a black art • Which AM do I use to index my non-traditional data type? • What are contributions of individual design ideas? • How to explain performance differences between AMs? • How to assess AMs on their own? • Current practice of performance analysis little help: • aggregate runtime or I/Os numbers provide no explanations • measuring semantic properties (e.g., spatial overlap) is domain-specific, not useful for general analysis framework

Goals of Analysis Framework (1) • Measure performance in context of specific workload (data and queries) • recognize workload as part of analysis input • compare workloads by running against same AM • allows tuning of AM for specific workload • Performance metrics characterize observed performance (I/Os) • independent of data or query semantics • reflects purpose of performance tuning

Goals of Analysis Framework (2) • Metrics express performance loss • loss = difference between observed and optimal performance • shows potential for performance improvement • optimal performance obtained by executing workload in optimal tree • fixed point of reference allows AM to be assessed on its own • Loss metrics for each query, node of input tree and structure-shaping aspects of implementation • breakdown allows performance flaws to be traced back, facilitates assessment of individual design aspects

Analysis Framework: Overview (1) • Performance-relevant aspects of tree: • Clustering: determines amount of data that query accesses beyond result set • Page Utilization: determines number of pages that data occupy • SPs: excess coverage (covers more data space than is present in subtree) leads to extra traversals • subdivide loss along those factors • break-down provides more detailed clues about cause of performance loss

Analysis Framework: Overview (2) • Query Metrics: • run each query in actual and optimal tree to obtain performance loss • workload metrics: sum over all queries • Node metrics: • node’s contribution to aggregate loss • obtained by computing per-node metrics for each query and aggregating over workload • Implementation metrics: • measure how much pickSplit() and penalty() deteriorate tree • obtained by running sample splits and insertions

amdb • Analysis, visualization and debugging tool that implements analysis framework • Accepts AMs written for libgist • Available at http://gist.cs.berkeley.edu

amdb Features • Tracing of insertions, deletions, and searches • Debugging operations: breakpoints on node splits, updates, traversals, and other events • Global and structural views of tree allow navigation and provide visual summary of statistics • Graphical and textual view of node contents • Analysis of workloads and tree structure

Node Visualization Node View Displays bounding predicates (SPs) and items within nodes. Highlights BPs on current traversal path. Split Visualization Shows how SPs or data items are divided with PickSplit( ) Node Contents Provides textual description of node

Leaf-Level Statistics Global View Provides summary of node statistics for entire tree Tree View Also displays node stats Total or per query breakdown I/O counts and corresponding overheads under various scenarios { Breakdown of losses against optimal clustering {

Construction of Optimal Tree • Optimal tree: optimal clustering, optimal page utilization, no excess coverage • Optimal leaf level: • partition items to minimize number of page accesses • partition size = target page utilization • workload can be modeled as hypergraph, approximate clustering with hypergraph partitioning algorithm • Optimal internal levels: • cannot be constructed analogously • but: we assume target page utilization and no excess coverage

- - - - X X X X Performance Metrics - Sample Query Optimal Clustering: clustering loss exc. cov. loss utilization loss Actual Tree: X X X X X X X - - - - - - - - X X - X - - X

Example 1: Unindexability Test • Unindexable workload: aggregate leaf accesses in optimal tree take longer than sequential scan for each query • typical ratio of sequential/random I/Os: 14:1 • Conclusion: uniformly-distributed multi-dimensional point sets mostly unindexable

Example 2: Comparison of SP designs • Goal: assess effect of SP on nearest-neighbor AMs • R*-trees: bounding rectangles, SS-trees: bounding spheres, SR-trees: combination of BRs and BSs • Experiment: • bulk-load 3 trees with identical data (only internal levels differ) • data: 8-dim points, arranged into sphere-shaped clusters • measure excess coverage loss • Results: • R*- and SR-tree identical, SS-tree order of magnitude worse at leaf level • SR-tree paper came to contrary conclusion, because it compared insertion-loaded trees

Summary (1) • Analysis framework • workload (= data and queries) is part of input • comparison of observed with optimal performance • metrics express performance loss, subdivided into clustering, utilization, excess coverage loss • Advantages over current practice: • fixed point of reference allows AM to be assessed on its own • metrics more meaningful than aggregate I/O numbers, facilitate evaluation of individual design ideas • metrics are independent of data or query semantics

Summary (2) • amdb • implements framework, along with visualization and debugging features • utilizes hypergraph partitioning to approximate opt. clustering • user experience verifies usefulness of framework and tool

Conclusion • GiST-based AM extensibility is effective: • reduces implementation complexity w/o performance/robustness penalty • Focus on template data structure leads to general solutions: • generally applicable concurrency & recovery protocols, performance analysis framework

Backup slides

Generalized Search Tree Overview • Search: • traverse all subtrees for which Consistent(SP, qual) is true; return all leaf items for which Consistent(item, qual) is true • Insert: • find leaf by following single path from root, guided by Penalty() • if leaf is full, call PickSplit() to determine split info, then perform split (recursively) • if SP needs to be updated, call Union(old SP, new item) to determine new SP

Generalized Search Tree Overview • GiST = simple abstraction of search tree • simple algorithms for basic functionality (search, insert, delete); easy to comprehend and extend • full control over performance-relevant properties of tree:clustering and page util (PickSplit() and Penalty()), SPs • captures essence of indexing:organize data into clusters, build directory structure to locate clusters

Access Methods for Next-Generation Database Systems

Access Methods for Next-Generation Database Systems

Presentation Transcript

Access: The Next Generation The Next Generation Access Pilot. Let s do something great together.

Next Generation Information Systems

Resource Database Assembly: The Next Generation

Requirements for Next Generation mmWave Systems

Next Generation Response Systems for SDSU

Next Generation Simulation Systems:

Next Generation Type Systems

Next Generation Geoscience Visualization Systems

Next-Generation Optical Access Networks

MI-Access Assessments: The Next Generation

Access Authentication to IMS Systems in Next Generation Networks

Next Generation Intelligent Systems

Next-generation modeling for high latitude systems

Access to Next Generation Broadband

LLRF Systems for Next Generation Light Sources

Next Generation Broadband Access

Next-Generation Bioinformatics Systems

Next Generation Access

Next Generation Simulation Systems:

Review: ACCESS Database Systems

Next-Generation Bioinformatics Systems