290 likes | 452 Views
Database Group. A Cost Model for Interval Intersection Queries on RI-Trees. SSDBM 2002 , Edinburgh. Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl. Institute for Computer Science University of Munich, Germany. Outline of the Talk. 1. Introduction 2. RI-Tree 3. Cost Model
E N D
Database Group A Cost Model for Interval Intersection Queries on RI-Trees SSDBM 2002, Edinburgh Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl Institute for Computer Science University of Munich, Germany
Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work
Box query Window query Interval query t • 3D Objects: • CAD documents • Digital mockup • Haptic rendering • … • 2D Objects: • Geographic data • VLSI design • Bitemporal data • … • 1D Objects: • Temporal data • Approximate values • Interval constraints • … t Extended Objects in Databases
Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework maintenance query processing index_create() index_drop() index_insert() index_delete() index_update() index_open() index_fetch() index_close() Integration of Access Methods
Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. User-defined Index Structure Relational Implementation Mapping to built-in indexes (B+-trees); SQL-based query processing Physical Implementation Block-Manager, Caches, Locking, Logging, … Integration of Access Methods Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework
Extensible Optimization Framework Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. User-defined Index Structure optimization Relational Implementation Mapping to built-in indexes (B+-trees); SQL-based query processing stats_collect() stats_delete() predicate_sel() index_io_cost() Physical Implementation Block-Manager, Caches, Locking, Logging, … Integration of Access Methods Declarative Embedding Object-relational DML and DDL
Extensible Optimization Framework Object-relational interface for selectivity estimation and cost prediction functions. User-defined Cost Model Relational Implementation Mapping to built-in statistics facilities; SQL-based evaluation of cost model Physical Implementation Block-Manager, Caches, Locking, Logging, … Integration of Access Methods Declarative Embedding Object-relational DML and DDL Extensible Indexing Framework Object-relational interface for index maintenance and querying functions. User-defined Index Structure Relational Implementation Mapping to built-in indexes (B+-trees); SQL-based query processing
Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 alice chris bob dave 8 3a 12c 15a 4 12 5c 15a 1b 7b 13d 13d 2 6 10 14 1 3 5 7 9 11 13 15 • Foundation: Interval Tree[Edelsbrunner 1980] • primary structure: binary search tree on possible endpoints • secondary structure: sorted lists of stored endpoints each interval is registered at exactly one node Relational Interval Tree (RI-Tree) [Kriegel, Pötke, Seidl VLDB 2000]
root = 2h–1 8 4 12 2 6 10 14 8 1 3 5 7 9 11 13 15 4 12 2 6 10 14 1 3 5 7 9 11 13 15 1 2h – 1 first step: virtualize the primary structure • no materialization of the binary tree • storage cost O(1): parameter root • fixed data space: root = 2h–1 covers [1..2h – 1] RI-Tree: Virtual Primary Structure
8 12c 4 12 5c 15a 2 6 10 14 1 3 5 7 9 11 13 15 3a 15a 1b 7b 13d 13d node lower id node upper id 4 8 8 13 1 3 5 13 b a c d 4 8 8 13 7 12 15 13 b c a d lowerIndex (node,lower,id) upperIndex (node,upper,id) RI-Tree: Relational Secondary Structure second step: manage secondary structure by two B+-trees • storage of n intervals:O(n/b) disk blocks of size b • insert and delete:O(logbn) disk block accesses in the indexes
16 = root h = 5 4 20 28 24 = fork 3 26 2 22 1 23 25 RI-Tree: Interval Intersection Query t 22 = lower upper = 25
h = 5 4 3 2 1 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower RI-Tree: Interval Intersection Query t 22 = lower upper = 25 16 = root 20
h = 5 4 3 2 1 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower • union all • select id fromupperIndex i • where i.node betweent.lowerandt.upper RI-Tree: Interval Intersection Query t 22 = lower upper = 25 24 = fork 22 23 25
h = 5 4 3 2 1 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower • union all • select id fromupperIndex i • where i.node betweent.lowerandt.upper • union all • select id fromlowerIndex i, rightNodes right • where i.node = right.node and i.lower <= t.upper RI-Tree: Interval Intersection Query t 22 = lower upper = 25 28 26
16 = root h = 5 4 20 28 24 = fork 3 26 2 22 1 23 25 • select id fromupperIndex i, leftNodes left • where i.node = left.node and i.upper >= t.lower • union all • select id fromupperIndex i • where i.node betweent.lowerandt.upper • union all • select id fromlowerIndex i, rightNodes right • where i.node = right.node and i.lower <= t.upper RI-Tree: Interval Intersection Query t 22 = lower upper = 25 I/O complexity:O(h·logbn + r/b)
Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work
t t t h = 5 root root root 4 3 2 1 upperIndex(node, upper, id) Gapsleft(t) lowerIndex(node, lower, id) Gapsright(t) B O( h·logbn + r/b ) outputI/O(T,t) = s (T,t)·B joinI/O(T,t) = I/O Cost Model for Interval Intersections T
Quantile-based: • (equi-count histogram) analogously to rleft + better adaption to the data distribution + exploits built-in statistics of the ORDBMS Selectivity Estimation • Histogram-based: • (equi-width histogram) – replication of intervals intersection multiple buckets – statistics management requires user-defined code
Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work
Experimental EvaluationDatasets UNI REAL
Outline of the Talk 1. Introduction 2. RI-Tree 3. Cost Model 4. Evaluation 5. Conclusions and Future Work
Conclusions and Future Work Conclusions: • Relational access methods:– employ an ORDBMS as virtual machine– extensible indexing and optimizing framework • Indexing extended objects:– Relational Interval Tree • Development of cost models:– estimation of selectivity and I/O cost Future Work: • Cost models:– general interval relationships– interval sequences
? ? ? Any questions? ? ? ? ?