230 likes | 325 Views
Data Management and Exploration Prof. Dr. Thomas Seidl. Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases. Jost Enderle , Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim. Outline.
E N D
Data Management and Exploration Prof. Dr. Thomas Seidl Efficiently Processing Querieson Interval-and-Value Tuplesin Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim
Outline • Interval-and-Value (IaV) Data and Applications • Relational Interval Tree (RI-tree) • Managing Interval-and-Value Tuples Using RI-tree • Experimental Results Queries on Interval-and-Value Tuples in RDBs
Interval-and-Value Data: Example Contracts table: storing period and budget of contracts CREATE TABLE contracts (// key:c_no VARCHAR(10),// simple-valued attribute:c_budget DECIMAL(10,2),// interval:c_period ROW ( c_start DATE, c_end DATE)) Queries on Interval-and-Value Tuples in RDBs
Interval-and-Value Data: Query • Sample query on contracts table // Find all contractsSELECT c_no FROM contracts// within certain budget rangeWHERE c_budget BETWEEN 500 AND 2000 // running during certain time interval AND c_period OVERLAPS (DATE ‘2003-03-01’, DATE ‘2004-01-31’) • Special Cases of this general Range-Interval query: • Value-Interval Query // value range is a single point • Range-Stabbing Query // query interval is a single point • Value-Stabbing Query // both restrictions hold Queries on Interval-and-Value Tuples in RDBs
No DB Motivation of Relational Indexing • Main Memory Structures • no persistency, no disk block structure • Secondary Storage Structures + persistency, high block-oriented efficiency • integration into DBMS kernel typically not supported (GiST?) • Relational Storage Structures + basic idea: don‘t extend, just use RDBMS (virtual storage machine) + sound formal fundament, little implementation effort + immediate industrial strength (availability, robustness, ACID, …) + high efficiency by exploiting built-in indexing structures (B+-tree) Disk SQL Queries on Interval-and-Value Tuples in RDBs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 C4 C3 C2 root = 2h-1 8 C1 4 12 34 52 132 2 6 10 14 52 154 13 73 101 151 1 3 5 7 9 11 13 15 4, 1, C3 8, 3, C4 8, 5, C2 12, 10, C1 4, 1, C3 8, 5, C2 12, 10, C1 8, 5, C2 12, 10, C1 4, 7, C3 8, 13, C2 8, 15, C4 12, 15, C1 12, 10, C1 4, 7, C3 8, 13, C2 12, 15, C1 8, 13, C2 12, 15, C1 12, 15, C1 Relational Interval Tree [Kriegel, Pötke, Seidl: VLDB 2000] based on [Edelsbrunner 1980] • Two relational indexes (B+-trees) store the interval bounds lowerIndex (node,start,id): upperIndex (node,end,id): • Supported by any RDBMS: No modification of built-in B+-trees • Optimal complexities for space, updates, and intersection queries Queries on Interval-and-Value Tuples in RDBs
Single Interval Query Processing Two steps to process an interval query 1. Transform interval query into a set of range queries • The generated queries are collected in transient tables (no I/Os) 2. Perform a single SQL query • Join the transient query tables with the relational indexes start end Queries on Interval-and-Value Tuples in RDBs
48 16 8 40 24 56 32 4 36 12 44 20 52 60 28 54 22 58 26 2 34 6 38 42 10 14 46 18 50 62 30 1 33 35 3 37 5 39 7 9 41 11 43 45 13 47 15 49 17 51 19 21 53 23 55 57 25 27 59 29 61 31 63 Preprocessing: Generate Query Ranges start end • Generate a set of range queries for lowerIndex and upperIndex • At nodes left of start:report entries i with i.end ³ start (32,48,52) • At nodes right of end: report entries i with i.start £ end (56) • For nodes between start and end: report all entries (54 - 55) upperIndex 32 48 52 lowerIndex 54 to 55 56 Queries on Interval-and-Value Tuples in RDBs
Processing by a Single SQL Query • Join transient query tables with B+-tree indexes SELECT idFROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :start UNION ALL SELECT idFROM lowerIndex AS i JOIN :rightQueries USING (node)WHERE i.start <= :end UNION ALL SELECT idFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :end • No duplicates are produced → UNION ALL • Blocked output of index range scans is guaranteed Queries on Interval-and-Value Tuples in RDBs
Extending the RI-tree for IaV Support (1) • Add value predicate to RI-tree query SELECT id // lower subqueryFROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :startAND i.value BETWEEN :Value1 and :Value2 UNION ALL ... // upper subquery UNION ALL SELECT id // inner subqueryFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :endAND value BETWEEN :Value1 and :Value2 • Integrate simple value attribute into lower-/upperIndex • old schema: (node, bound, id) • new schema: ? → depends on type of query to support Queries on Interval-and-Value Tuples in RDBs
Extending the RI-tree for IaV Support (2) • Viable schemas for new lower-/upperIndexes • (value, node, bound, id) • (node, value, bound, id) estimate access cost for each query type • (node, bound, value, id) • Observations (see paper for details): • Value queries best supported by (value, node, bound, id) index • simple attribute predicates = point queries • evaluation requires same number of disk accesses as original proceeding • Range Queries: choice of index not obvious • inner subquery of Range-Stabbing Queries best supported by(node, value, bound, id) • otherwise: depends on stored data and values of query variables • Question: Can Range Queries be further enhanced? Queries on Interval-and-Value Tuples in RDBs
Improving Range Query Processing (1) • Problem of composite indexes for multiple attributes • queries may contain range predicates on two or more of the indexed attributes • tuples satisfying first predicate lie in contiguous disk area • tuples satisfying both/all predicates are scattered within this area • Common solution: using space-filling curves • mapping multi-dimensional data to one-dimensional values • similar values of original data are mapped on similar index data • ranges of indexed attributes will be found in adjacent disk areas • Application on RI-tree scenario • combining some attributes of lower-/upperIndex • depends on type of query to support Queries on Interval-and-Value Tuples in RDBs
Improving Range Query Processing (2) Identifying viable schemas for new lower-/upperIndexes • find subqueries containing several range predicates • for Range Queries: lower and upper subqueries (bound, value) • for Range-Interval Queries:inner subquery (node, value) • combine respective attributes (x,y)within space-filling curve {x,y} • useful combinations forlower-/upperIndex: • (node, {value, bound}) • ({node, value}, bound) Queries on Interval-and-Value Tuples in RDBs
Improving Range Query Processing (3) • Observations: • lower and upper subqueries of Range Queries will profit by a(node, {value, bound}) index • inner subquery of Range-Interval Queries will profit by a({node, value}, bound) index • Value Queries will not profit by “space-filling indexes” • Intermediate result • space-filling indexes can reduce disk accesses in certain cases • there is no “universal” index supporting all queries to the same extent • different subqueries will profit by different indexes Queries on Interval-and-Value Tuples in RDBs
Employing index mixes Identifying best indexes for each query type • Value Queries: best supported by (value, node, bound, id) index • Range Queries: depends on data and space-filling curve (if used) • different subqueries best supported by different indexes • subqueries may be evaluated separately using best index • drawback: higher cost for index updates and storage requirements Queries on Interval-and-Value Tuples in RDBs
Adapting the RI-tree Algorithms (1) Example: Evaluate a contracts query using „space-filling index“ Contracts table: • Node and Z-order value calculated for each tuple • B-tree index on (node, Z(budget, start), no) Queries on Interval-and-Value Tuples in RDBs
Adapting the RI-tree Algorithms (2) Range-Interval Query: value range (1,12); interval (3,6) Evaluation ofupper subquerywith Z-order index Queries on Interval-and-Value Tuples in RDBs
Access Cost with Varying Table Sizes Value-Stabbing Queries Value-Interval Queries Queries on Interval-and-Value Tuples in RDBs
Access Cost with Varying Table Sizes Range-Stabbing Queries Range-Interval Queries Queries on Interval-and-Value Tuples in RDBs
Access cost for varying length of ranges Stabbing Queries Interval Queries Queries on Interval-and-Value Tuples in RDBs
Access cost for varying length of ranges Range Queries Queries on Interval-and-Value Tuples in RDBs
Comparison with competing techniques Queries on Interval-and-Value Tuples in RDBs
Conclusions • Processing Interval-and-Value Tuples in SQL databases • Extensions of the Relational Interval Tree • Various types of queries • Range vs. Value Queries • Interval vs. Stabbing Queries • Experiments demonstrate high performance • Future work: • Extend proposed techniques to more complex queries (joins) • Cost models to predict benefits for evolving query workload Queries on Interval-and-Value Tuples in RDBs