Indexing and Range Queries in Spatio- T emporal Databases

Indexing and Range Queries in Spatio-Temporal Databases Danzhou Liu, Wei Cui, Yun Fan School of Computer Science University of Central Florida

Outline • Introduction • The R*-tree • The TPR-tree • The TPR*-tree • Experiments • Conclusions

Introduction • Spatio-temporal databases • record moving objects’ geographical locations (sometimes also shapes) at various timestamps. • support queries that explore their historical and future (predictive) behaviors. Applications. • applications: flight control systems, weather forecast and mobile computing • The database stores the motion functions of moving objects. • For each object o, its motion function gives its location o(t) at any future time t. • A predictive window query • specifies a query region qRand a future time interval qT • retrieves the set of all objects that will fall in qR during qT. • our goal: index moving objects so that a predictive window query can be answered with as few disk I/Os as possible. • Examples • Find all airplanes that will be over Florida in the next 10 minutes. • Report all vessels that will enter the United States in the next hour.

Motion Function • We consider linear motion. • For each object, the database stores • Its minimum bounding rectangle (MBR) at the reference time 0 • Its current velocity bounding rectangle (VBR) • Examples: MBR(a)={2,4,3,4}, VBR(a)={1,1,1,1}; MBR(c)={8,9,3,4}, VBR(c)={-2,0,0,2}; • An update is necessary only when an object’s VBR changes.

R*-tree • The R*-tree aims at minimizing: • the area • The perimeter of each MBR • The overlap between two MBRs (e.g., N1, N2) in the same node • The distance between the centroid of an MBR and that of the node containing it

R*-tree Insertion

The Time Parameterized R-Tree (TPR-Tree) • Extends the R-tree by introducing the velocity bounding rectangle (VBR) in all entries. • Queries are compared with conservative MBRs of non-leaf entries.N1v={-2,1,-2,1} and N2v={-2,0,-1,2}

TPR*-Tree • Our goal • index moving objects so that a predictive window query can be answered with as few disk I/Os as possible. • A mathematical model that estimates the cost of answering a predictive window query using TPR-like structures. • Number of node accesses. • Application of the model to derive the optimal performance. • The TPR-tree is much worse than the optimal structure. • Exam the algorithms of the TPR-tree, identify their deficiencies, and propose new ones. • The TPR*-tree.

TPR deficiency 1: Choosing sub-tree to insert • To insert an entry, the TPR-tree picks the sub-tree incurring the minimum penalty (smallest MBR/VBR enlargement). • May result in inserting an entry into a bad sub-tree; this problem is increasingly serious as time evolves.

TPR* solution: Choose path • Aims at finding the best insertion path globally, namely, among all possible paths. • Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm). Maintain a heap: [(g),0], [(h),0], [(i),20] the path expanded so far the accumulated penalty so far

TPR* solution: Choose path • Aims at finding the best insertion path globally, namely, among all possible paths. • Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm). Visit node g: [(h),0], [(a,g),3], [(i),20], [(b,g),32] complete paths already although nodes a and b are not visited

TPR* solution: Choose path • Aims at finding the best insertion path globally, namely, among all possible paths. • Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm). Visit node h: [(a,g),3], [(d,h),9], [(c,h),17], [(i),20], [(b,g),32] The algorithm stops now.

TPR deficiency 2: Which entries to re-insert • When a node overflows, some of its entries are re-inserted to defer node split (the ones that diverge most from the node centroid). • The entries chosen by the TPR-tree are very likely to be re-inserted back to the same node, so that a node split is still necessary.

TPR* solution: Pick worst • Aims at selecting entries that can most effectively “shrink” the MBR or VBR of the node for re-insertion. • The first step picks an appropriate dimension (either spatial or velocity) based purely on estimation using our cost model (see the paper for details). • The second step performs sorting on this dimension and decides the entries to be removed . • Example: If the axis chosen in the first step is the x-axis, then the sorting list is {b,d,a,c}. Either b or c is removed.

TPR deficiency 3: Tightening MBR in deletion • Entry deletion requires first finding the entry, which accesses many nodes of the tree. The TPR-tree uses this fact to tighten the MBR of non-leaf entries. • Assume nodes h and i are accessed before e is found; then the TPR-tree will tighten the MBR of i only (enclosing g and f).

TPR* solution: Active tightening • Tightening more entries for free. • Assume nodes h and i are accessed before e is found; then the TPR*-tree will tighten the MBR of both h and i.

TPR* solution: Active tightening (Cont.) • Another example: Assume the shaded nodes are accessed to find e. • The active tightening can tighten the MBR of n5, n6, n3, and n4. • But notn1 and n2.

Challenge of Migration • 3 Operating Systems: • Microsoft Windows • Sun Solaris • Redhat Fedora Core 1 • 2 Compilers: CL, GCC (2.9.5, 3.3.2) • Difference of Code Conversion • How close the compilers to the standard? • Compatibility of Library

Experiments: Settings (query and tree) • Dataset • 50,000 sampled objects’ MBRs are taken from a real spatial dataset NJ [Tiger] • each object is associated with a VBR such that on each dimension • The velocity extent is zero (i.e., the object does not changespatial extents during its movement) • the velocity value distribution is randomed in range [0,8] • the velocity can be positive or negative with equal probability. • We compare TPR*- with TPR-trees. • Disk page size=1k bytes (node capacity=27 for both trees). • For each object update, perform a deletion followed by an insertion on each tree. • Each predictive query is a moving rectangle, and has these parameters: • qRlen: The length of the query’s MBR • qVlen: The length of the query’s VBR • qTlen: The number of timestamps covered.

TPR-tree

TPR*-tree

Conclusions • The TPR-tree combines the idea of conservative MBR directly with the tree construction algorithms of R*-trees. • The TPR*-tree improves it by designing algorithms that take into account the special features for moving objects. • Cost model for performance analysis • The optimal performance of a “hypothetically best structure” • Reduce disk I/Os for predictive queries

Q&A

Thanks!

Indexing and Range Queries in Spatio- T emporal Databases

Indexing and Range Queries in Spatio- T emporal Databases

Presentation Transcript

STRG-Index: Spatio-Temporal Region Graph Indexing for Large Video Databases

Databases – Queries and Database Practice Queries

Spatio-Temporal Databases

Indexing and Binning Large Databases

Spatio-Temporal Queries and Indexing

Spatial Databases - Indexing

STRG-Index: Spatio-Temporal Region Graph Indexing for Large Video Databases

Spatio-Temporal Databases

SPATIO-TEMPORAL DATABASES

Spatial Databases: Spatio-Temporal Databases

SPATIO-TEMPORAL DATABASES

Indexing Spatio-Temporal Data Warehouses

Indexing in Spatial Databases and Query Processing

Spatio-temporal Pattern Queries

Spatio-temporal Databases

Data, Databases, and Queries

Data, Databases, and Queries

An Improved Indexing Scheme for Range Queries

Spatial Databases - Indexing

Spatio-Temporal Databases

Indexing Spatio-Temporal Data Warehouses