380 likes | 514 Views
CSIS7101 – Advanced Database Technologies. Spatio -T emporal D ata (Part 3) The MV3-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries. Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing Arthur. Introduction. Spatial-Temporal Database Management Systems (STDBMS)
E N D
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 3) The MV3-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing Arthur
Introduction • Spatial-Temporal Database Management Systems (STDBMS) • Mobile Phone Systmes • Track users efficiently • Provide better communication services • Traffic Supervision Systems • Monitor vehicle locations • Motion patterns • Urban Planning • Record the development of landscapes over the years • Retrieve urban situations at any given time in the past Spatio-Temporal Data (Part 3)
Spatio-Temporal Queries • Traditional STDBMS • Focus on static objects • Attempting to update the database whenever the objects change their positions which will cause the STDBMS to spend most of the time just handling the updates • It would result in huge space requirement • To deal with objects that have dynamic behavior, new querying languages, modeling methods, novel attribute representation and specialized access methods are needed • STR-trees and TB-trees • Focus on efficient trajectory retrieval • TPR-trees • Focus on predicting objects’ future locations by storing their current positions and velocities Spatio-Temporal Data (Part 3)
Spatio-Temporal Queries (Continue) • To deal with historical information retrieval, windows queries about objects that move in discrete time is commonly used. • Timeslice or Timestamp Queries • Retrieve all objects that intersect a windows at a specific timestamp • Interval Queries • Include several consecutive timestamps Spatio-Temporal Data (Part 3)
Historical Information Retrieval • Types of Indexing • MR-trees and HR-trees • Maintain a separate R-tree for each timestamp, but allow consecutive trees to share branches • Advantages • Efficient for timestamp queries, as search degenerates into a static spatial window query for which R-trees are very efficient • Disadvantages • Extensive duplication of objects (even if they do not move) which lead to huge space requirements for most typical application • Poor performance on interval queries Spatio-Temporal Data (Part 3)
Historical Information Retrieval (Continue) • 3-Dimensional R-trees (the 3rd dimension corresponding to time) • An object which does not change its position during a certain period of time is modeled as a 3D box, bounding both its spatial and temporal attributes • A moving object is modeled by multiple boxes, each corresponding to a different version. • Advantages • The temporal attribute is integrated tightly with the spatial attributes thus interval queries can be answered efficiently • Redundant duplication is avoided thus space usage can be reduced. Spatio-Temporal Data (Part 3)
time time x x y y Historical Information Retrieval (Continue) • Disadvantages • Poor performance on timestamp queries as the query time depends on the total number of entries in history rather than the live entries at the query timestamp. Object states at the same location from t1 to t2 At location B t3 t2 At location A t2 Object moves from location A to B at time t2 t1 t1 Spatio-Temporal Data (Part 3)
Induction of MV3R-tree • MV3R-tree utilizes the concepts of multi-version R-tree (MVR-tree) and a small auxiliary 3D R-tree • The auxiliary 3D R-tree builts on the leave of the MVR-tree • Aims at timestamp and interval window queries • For retrieving the past locations of discretely moving objects • Enhancing the performance of multi-version framework for multi-dimensional access methods Spatio-Temporal Data (Part 3)
Induction of MV3R-tree (Continue) • MVR-tree • involves several heuristics that take into account the features of R-trees to improve performance significantly • The auxiliary 3D R-trees • outperform traditional 3D R-trees for most queries • MV3R-tree space consumption • up to an order of magnitude smaller than that of an HR-tree, while maintaining comparable timestamp query performance Spatio-Temporal Data (Part 3)
Overview of MVB-trees, HR-trees & 3D R-trees • Multi-version B-trees • Extensions of B-trees • Index the evolution of one-dimensional data in transaction time temporal databases • Insertions and deletions can only happen at current time • Entry in the form of <key, tstart, tend, pointer> • For root and intermediate entries, pointers points to a next level node • For leaf entries, pointers points to the actual record with the corresponding key value. • An object is said to be alive at time t if : tstart ≤ t < tend Spatio-Temporal Data (Part 3)
Overview of MVB-trees (Continue) • Temporal attributes tstart and tend denotes the time that the record was inserted and deleted in the database respectively • Deletions are logical, i.e. actual records are not physically removed from database • For currently live entries, tend would be denotes as * where * means “NOWTIME” • Can have multiple roots and each root has a jurisdiction interval • For each timestamp t, each node, except the roots, is required that either none or at least b.Pversion entries are alive at t • Pversion = tree parameter • b = node capacity Spatio-Temporal Data (Part 3)
Root < 5, 1, *, A> <43, 1, *, B> <72, 1, *, C> Key 43 in leaf node B is created at time 1 is still alive Key 95 in leaf node C is created at time 1and is deleted at time 3 A B C < 5, 1, *> < 8, 1, *> <13, 1, *> <25, 1, 3> <27, 1, 3> <29, 1, 3> <43, 1, *> <48, 1, *> <52, 1, 2> <59, 1, 3> <68, 1, 3> < 72, 1, *> < 78, 1, *> < 83, 1, *> < 95, 1, 3> < 99, 1, *> <102, 1, *> Overview of MVB-trees (Continue) Examples: Pversion = 1/3, b = 6, minimum entries = 1/3 * 6 = 2 Point to leaf node C. Created at time 1 and is alive Point to leaf node A. Created at time 1 and is alive Point to leaf node B. Created at time 1 and is alive Spatio-Temporal Data (Part 3)
Root Root < 5, 1, *, A> <43, 1, *, B> <72, 1, *, C> <43, 1, *, B> <72, 1, *, C> A B C < 5, 1, *> < 8, 1, *> <13, 1, *> <25, 1, 3> <27, 1, 3> <29, 1, 3> <43, 1, *> <48, 1, *> <52, 1, 2> <59, 1, 3> <68, 1, 3> < 72, 1, *> < 78, 1, *> < 83, 1, *> < 95, 1, 3> < 99, 1, *> <102, 1, *> A B C D < 5, 1, 4> < 8, 1, 4> <13, 1, 4> <25, 1, 3> <27, 1, 3> <29, 1, 3> <43, 1, *> <48, 1, *> <52, 1, 2> <59, 1, 3> <68, 1, 3> < 72, 1, *> < 78, 1, *> < 83, 1, *> < 95, 1, 3> < 99, 1, *> <102, 1, *> < 5, 4, *> < 8, 4, *> <13, 4, *> <28, 4, *> Overview of MVB-trees (Continue) • Insertions A “dies” meaning that it will not be modified in the future < 5, 1, 4, A> A new node D is created to store live entries of A < 5, 4, *, D> Insertion of <28, 4, *> at timestamp 4 cause node A overflow Spatio-Temporal Data (Part 3)
Root Root < 5, 1, *, A> <43, 1, *, B> <72, 1, *, C> < 5, 1, *, A> E D A A B B C C < 5, 1, *> < 8, 1, *> <13, 1, *> <25, 1, 3> <27, 1, 3> <29, 1, 3> < 83, 4, *> < 99, 4, *> <102, 4, *> < 5, 1, *> < 8, 1, *> <13, 1, *> <25, 1, 3> <27, 1, 3> <29, 1, 3> <43, 4, *> <72, 4, *> <78, 4, *> <43, 1, *> <48, 1, *> <52, 1, 2> <59, 1, 3> <68, 1, 3> <43, 1, 4> <48, 1, 4> <52, 1, 2> <59, 1, 3> <68, 1, 3> < 72, 1, 4> < 78, 1, 4> < 83, 1, 4> < 95, 1, 3> < 99, 1, 4> <102, 1, 4> < 72, 1, *> < 78, 1, *> < 83, 1, *> < 95, 1, 3> < 99, 1, *> <102, 1, *> Overview of MVB-trees (Continue) • Deletion Nodes B and C died. Node D and E created. <43, 1, 4, B> <72, 1, 4, C> Deletion of <48, 1, *> at timestamp 4 causes node B underflow. Key split is performed. <43, 4, *, D> <83, 4, *, E> A sibling is chosen, say node C. Live entries are copied to new nodes (D & E). Spatio-Temporal Data (Part 3)
Overview of MVB-trees (Continue) • Summary • Insertions and deletions may cause overflow and underflow respectively, thus create version splits. • Version splits create data redundancy for those entries duplicated. Such redundancy harms interval query performance as both the original and duplicated versions may need to be retrieved. • Underflow – Strong and Weak Versions • Strong version underflow happens after a version split. • Weak version underflow occurs when the weak version condition is violated. Spatio-Temporal Data (Part 3)
Overview of MVB-trees (Continue) • MVB-trees require O(N/b) space, where N is the number of updates ever made to the database and b is the block capacity. • Answering a timestamp range query requires O(logbM + r/b) I/O’s, where M is the number of live object at the queried timestamp, and r is the number of output objects. Spatio-Temporal Data (Part 3)
Overview of MVB-trees, HR-trees & 3D R-trees • Historical R-trees • Based on the overlapping technique, another framework for transforming a single version data structure into a transaction time access method • The structure maintains an R-tree for each timestamp, but common branches of consecutive trees are stored only once in order to save space • A timestamp query is directed to the corresponding R-tree and search is performed inside the tree body. Thus, the query degenerates into an ordinary window query and is handled very efficiently Spatio-Temporal Data (Part 3)
Timstamp 0 Timstamp 1 R0 R1 Object e deleted from node E at timestamp 1 A0 B0 C0 B1 C1 D0 E0 D1 E1 c0 a0 a0 c0 d0 b0 b0 d0 a0 e0 e0 e1 Object e added to node D at timestamp 1 Overview of Historical R-trees (Continue) • An interval query should search the corresponding trees of all the timestamps involved. • Example: object e changes position at timestamp 1 As no object in node A changes from timestamp 0 to 1, it can be shared by other trees. e1 e0 Spatio-Temporal Data (Part 3)
Overview of MVB-trees, HR-trees & 3D R-trees • 3D R-trees • They view time as just another dimension and integrate it in the tree construction • The movements of 2D objects can be modeled as distinct boxes in three dimensional space • Temporal projection denotes the period when the corresponding object remains static • Spatial projections of the box correspond to the object’s position and extents during the period Spatio-Temporal Data (Part 3)
Overview of 3D R-trees (Continue) • No mechanism to ensure that each node has a minimum number of live entries at a given short-interval queries • Poor in timestamp and short-interval query performance • A single tree for the whole history • A node may have a lot of dead space at a timestamp t, meaning that there is a high chance that the query window intersects the bounding box but no object inside it • Where there are many objects with long lifespans, the problem becomes more serious because these objects will force the node that contain them to have long lifespans as well • It depends on the total number of records, rather than on the number of records alive at the queried timestamps Spatio-Temporal Data (Part 3)
time Only one live entry may be retrieved at a timestamp t. Ttimestamp t x y Overview of 3D R-trees (Continue) • Good performance in long interval queries • No redundancy • R-trees optimize queries with similar extents along all dimensions Different objects at different locations at different time interval A high chance that the query window intersects the bounding box but no object inside Spatio-Temporal Data (Part 3)
MV3R-Trees • Why needs MV3R-Trees • Currently, there are no such structure that can effectively handle both timestamp and interval queries. • Why is MV3R-Trees good • Reduce the structure size but improve query performance • Are applicable to other multi-dimensional access methods when they are converted to corresponding multi-version structures Spatio-Temporal Data (Part 3)
MV3R-Trees (Continue) • How MV3R-Trees work • Contain a multi-version R-Tree (MVR-tree) and a small auxiliary 3D R-tree built on the leaf nodes of the MVR-tree • MVR-tree can contain multiple R-trees, which refer to as “logical trees”. Each entry has the form as with MVB-trees: <S, tstart, tend, pointer>. S denotes the spatial minimum bounding rectangle (MBR) as defined in R-trees. • MVR-tree inherit the concept of weak version condition. Spatio-Temporal Data (Part 3)
Strong Version Overflow Block Overflow Version Split Key Split Insertion A The lifespan of A1 does not include timestamp 10, its MBR does not cover C6. The MBR of A1 may be tightedned C . . . <A1, 1, 10, C> . . . Insertion of object C6 at timestamp 10 A <C1, 1, 3> <C2, 1, 3> <C3, 2, 8> <C4, 2, 8> <C5, 5, *> . . . <A1, 1, *, C> . . . Version Split at timestamp 10 B . . . <B1, 10, *, C> . . . MBR of B1 is small than A1 because only C5 and C6 is bounded by B1 at timestamp 10 MV3R-Trees (Insertion and overflow) • Intermediate nodes insertion • Allow redundancy in order to maintain good performance for timestamp and short-interval queries • Process of insertion for intermediate nodes: <C6, 10, *> Spatio-Temporal Data (Part 3)
MV3R-Trees (Insertion and overflow) • Do not consider at all strong version underflows because • Underflow in MVR-trees happens much less frequently than overflows • Handle underflows by entry re-insertion, which may trigger block overflows in several other nodes • Version splits need to take into account the spatial extents of the nodes. Spatio-Temporal Data (Part 3)
MV3R-Trees (Insertion and overflow) • Leaf nodes insertion • Try to avert version splits thus reduce redundancy to reduce storage space but maintain timestamp query performance • Small number of leaf nodes will facilitate interval query processing using the auxiliary 3D R-tree • Process of insertion for leaf nodes: • To avoid version splits, try the following alternatives in order: • General Key Split • Insert in node after reinserting one of its entries • Insert in another node • Version split Spatio-Temporal Data (Part 3)
General Key Split • Insert in node after reinserting one of its entries • Insert in another node • Version Split Strong Version Overflow Block Overflow Key Split Insertion MV3R-Trees (Insertion and overflow) • General Key Split • If a new entry is to be inserted, the entries can be distributed to two nodes so that for each timestamp in a time range, there exist at least b.Pversion entries alive. Thus version split, which generate version redundancy for entries, can be avoided. However, it may be difficult or impossible. • Two new nodes should have small overlap. • General key split is different from ordinary key split because ordinary key split is applied when all the entries in the node to be split are alive and their tstart equal current time. • Reinsert an Existing Entry of the Node • Any leaf node can store a re-inserted entry provided that: • Its lifespan must cover that of the entry • It should be dead if the entry is dead • Its area should not be enlarged much, in order to ensure good performance for timestamp queries Spatio-Temporal Data (Part 3)
MV3R-Trees (Insertion and overflow) • Dead entries always come before live ones as live entries will be reinserted only into live nodes, which may also overflow and induce the same problem in the future • Among the dead and live entries, sorting is based on the area decrease of the nod MBR caused by the entry deletion • Reinsert a single entry even if it is possible to reinsert more because • Reinsertion saves space but does not achieve structure improvement • Reinsertion of a single entry already achieves the objective of averting the version split • Insert in Another Node • Tries to insert the new entry into another node that is not full • Backtrack to the upper level and try to insert the entry into another branch • Only consider branches that will incur small area enlargement • The area enlargement of candidate branches can only exceed that of the best branch by a certain percentage Spatio-Temporal Data (Part 3)
MV3R-Trees (Insertion and overflow) • Conclusion • General Key Split • Does not require reading any more pages. It is the most efficient method in terms of update cost • It reduces the number of entries in the new nodes so that they will not overflow again in the near future • Reinsert an Existing Entry of the Node • It can search multiple branches • Update costs are compensated by the space savings • Insert in Another Node • Requires backtracking up to level 2 • Update costs are compensated by the space savings Spatio-Temporal Data (Part 3)
MV3R-Trees (Deletion and Underflow) • It is handled in a way similar to R*-trees if a deletion does not incur structural changes • An entry is physically deleted, only if its tstart is equal to the current time (multiple updates may happen at the same timestamp) • Intermediate and leaf nodes deletions are handled in the similar way as of insertion. • Intermediate node deletion • Suppose an underflow occurs at the current timestamp t, the live entries of tend to be set to t. Then these entries are re-inserted into the most recent logical R-tree after setting tstart = t. • Apply the R*-tree algorithms Spatio-Temporal Data (Part 3)
MV3R-Trees (Deletion and Underflow) • Leaf node deletion • To avoid redundancy caused by reinsertion, a live entry from a sibling node will be borrowed • The borrowed (moved) entry should have the properties: • It must be alive and its lifespan must be covered by the original sibling node, say node A • After the removal of the entry, the version condition of the borrowed sibling node, say node B, must still be satisfied • Inserting this entry to the original sibling node, node A, will not cause its MBR to increase above a threshold Spatio-Temporal Data (Part 3)
Timestamp 1 A B <A1, 1, *> <A2, 1, *> <A3, 1, *> <B1, 1, *> <B2, 1, *> <B3, 1, *> Objects B2 and B3 cannot be moved because their deletion from B will cause weak version underflow for timestamp 1 Timestamp 2 Deletion of object A1 at timestamp 2 which cause underflow of node A A B A B <A1, 1, 2> <A2, 1, *> <A3, 1, *> <B1, 1, 2> <B2, 1, *> <B3, 1, *> <B4, 2, *> <B5, 2, *> <B6, 2, *> <A1, 1, 2> <A2, 1, *> <A3, 1, *> <B1, 1, 2> <B2, 1, *> <B3, 1, *> <B5, 2, *> <B6, 2, *> Insertion of objects B4, B5 and B6 at timestamp 2 MV3R-Trees (Deletion and Underflow) • Example: S (root) . . . <S1, 1, *, A> <S2, 1, *, B> . . . <B4, 2, *> <A4, 2, *> Object B4 has been borrowed by A Spatio-Temporal Data (Part 3)
MV3R-Trees (Auxiliary 3D R-tree) • Built on the leaves of the MVR-tree in order to process interval queries • For a given moderate node capacity, the number of leaf nodes in an MVR-tree is much lower than the actual number of objects. • Smaller in size as compare to a complete 3D R-tree • Adding auxiliary 3D R-tree not only improves interval query performance, but may also provides flexibility in other scenarios • Construction of auxiliary 3D R-tree • Whenever a leaf node of the MVR-tree is updated, the change is propagated to its entry in the 3D R-tree Spatio-Temporal Data (Part 3)
Query Processing with MV3R-trees • MVR-tree • Timestamp query involves retrieval of the root whose jurisdiction interval covers the queried timestamp, and then search is performed similarly to R-trees • 3D R-tree • For interval queries, multiple trees may need to be searched • Should avoid duplicate visits to the same node via different parents, otherwise, result in severe IO cost • Duplicate pointers to a node are created in version splits or entry reinsertions • In both cases, the two entries pointing to the same node have disjoint lifespans • For short interval, it will be used whenever the temporal query length exceeds a certain threshold • Its performance deteriorates gradually as the tree grows Spatio-Temporal Data (Part 3)
Cube of B1 A B . . . <A1, 1, 10, C> . . . . . . <B1, 10, 20, C> . . . C <C1, 1, 8, D> <C2, 1, 12, E> <C3, 10, 20, F> <C4, 10, 20, G> time Cube of C4 Query Cube Cube of C1 Cube of C3 Cube of C2 x Cube of A1 y Query Processing with MV3R-trees • Example 1. A1 and B1 are temporally adjacent 2. A1 spatially covers C1 and C2 3. B1 spatially covers C2, C3 and C4 4. C2 and C3 intersect the query box so their subtrees (node E and F) should be search 5. Since node C may be reached twice (by following A1 and B1), we may attemp redundant visits to E and F Spatio-Temporal Data (Part 3)
Conclusion of MV3R-trees • A structure that combines the concepts of MVB-trees and 3D R-trees • MV3R-trees can handle timestamp and interval queries efficiently with relatively small space requirements • MV3R-trees could be further improved by: • Analytical cost models for determining the optimal tree to answer short interval queries • Overflow and underflow handling heuristics that are more efficient in terms of update cost, and can avert more version splits Spatio-Temporal Data (Part 3)
References • Tao, Y., Papadias, D. The MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries, 2000 Spatio-Temporal Data (Part 3)