210 likes | 351 Views
Project Proposals. Simonas Šaltenis Aalborg University. Nykredit Center for Database Research Department of Computer Science , Aalborg University. Outline. An overview of the R-tree and the TPR-tree Project proposals: Update-Efficient TPR-tree Time-parameterized SS-tree. p6. Query.
E N D
Project Proposals Simonas Šaltenis Aalborg University • Nykredit Center for Database Research • Department of Computer Science, Aalborg University
Outline • An overview of the R-tree and the TPR-tree • Project proposals: • Update-Efficient TPR-tree • Time-parameterized SS-tree WIM workshop, Gl. Vrå Slot, December 6-8, 2001
p6 Query R1 p7 p1 R1 R2 p4 R4 R3 p2 R5 p8 p3 R3 R4 R5 R6 R7 p5 p12 R2 R7 p3 p4 p11 p5 p9 p10 p8 p6 p7 p12 p13 p9 p1 p2 p13 p10 R6 p11 Pointers to data tuples Spatial Indexing With the R-Tree • Example WIM workshop, Gl. Vrå Slot, December 6-8, 2001
R-tree Properties • Leaf entry = <n- dimensional point, rid > • Non-leaf entry = < n- dim MBR, ptr to a child node > • MBR – a Minimum Bounding Rectangle of all points in the subtreee pointed to by ptr • R-tree is a balanced tree – all leaves are at same depth from root • Through insertion and deletion algorithms, nodes are kept at least m% full (except root) • m is usually chosen to be 40%. • m is the minimum fill factor, depending on the workload the average fill factor is usually » 70%. WIM workshop, Gl. Vrå Slot, December 6-8, 2001
R-Tree – a Grow-Post tree Grow-Post Trees Bounding predicate (BP) = something that describes entries in a subtree • Building blocks of algorithms: • Consistent(BP, Q) – returns true if results of query Q can be under BP (in the R-tree, MBR intersects Q) • PickSplit(node) – splits a page of entries into two groups ... BP1 BP2 BPn • Penalty(BP, E) – returns an estimate how “worse” BP becomes if E is inserted under it ….. . . • Union( ) – computes a BP of a coleection of entries (in the R-tree, computes an MBR – minimum and maximum in all dimensions ) . Internal Nodes Leaf Nodes WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Range Query in R-trees • Answering range query Q in R-trees • Start at the root • If current node is non- leaf, for eachentry <MBR, ptr>, if Consistent(MBR, Q) ,search subtree identified by ptr • If current node is leaf, for each entry<E, rid>, if E overlaps Q, rid identifiesa point that overlaps Q • Note: We may have to search several subtrees at each node!(In contrast, a B- tree equality search goes to just one leaf.) • Worst-case performance O(n)! • But in practice, R-trees exhibit good query performance for various data sets • What about insertion and deletion? WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Insert Entry E<point, ptr> • Insertion algorithm • cn = root • If cn is leaf stop. • From all entries in cn choose the one ewith the smallest Penalty (e.BP, E). (In R-trees, choose an entry whose MBR needs leastenlargement tocover B; resolve ties by going to smallest area child) • cn = e.ptr, go to3. • Insert einto cn. Call PropogateUp (cn). • PropogateUp(cn) • If cn is overfull, call PickSplit(cn) to produce cn1 and cn2, replace cn’s old entry in its parent bye1 = Union(cn1), e2 = Union(cn2), callPropogateUp on cn’s parent. • Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent. • Create a new root with two entries whenever a root is split. WIM workshop, Gl. Vrå Slot, December 6-8, 2001
R1 R2 p14 Heuristics for Penalty • Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty. p6 p7 p1 R1 R2 p4 R4 R3 p2 R5 p8 p3 R3 R4 R5 R6 R7 p5 p12 R7 p3 p4 p11 p5 p9 p10 p8 p6 p7 p12 p13 p9 p1 p2 p13 p10 R6 p11 Pointers to data tuples WIM workshop, Gl. Vrå Slot, December 6-8, 2001
R1 R2 Heuristics for Penalty • Heuristics of least area enlargement and smallest area are used in the R-tree’s Penalty. p6 p7 p1 R1 R2 p4 R4 R3 p2 R5 p8 p3 R3 R4 R5 R6 R7 p5 p12 R7 p3 p4 p11 p5 p9 p10 p8 p6 p7 p12 p13 p9 p1 p2 p14 p13 p10 p14 R6 p11 Pointers to data tuples WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Deletion in R-trees • Delete entry E • Using the search procedure,find a leaf cnwhere entry E is located • Remove E from cn. Call PropogateUp(cn). • PropogateUp(cn) • If cn is underfull, deallocate the node cnremovecn’s entry in its parent, callPropogateUp on cn’s parent, and reinsert all cn’s entries or merge them into some other node • Otherwise, if e = Union(cn) is different from cn’s old entry in its parent, replace the old entry with e, call PropogateUp on cn’s parent. • No additional heuristics are involved in Delete, underfull nodes are handled using Insert as a subroutine. WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Modeling Continuous Movement • In conventional databases, data is assumed constant unless explicitly modified. • With continuous movement, this is problematic. • Too frequent updates • Outdated, inacurate data WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Modeling Continuous Movement • In conventional databases, data is assumed constant unless explicitly modified. • With continuous movement, this is problematic. • Too frequent updates • Outdated, inacurate data • Instead of storing position values, we store positions as functions of time, yielding time-parameterized positions. • We use linear functions to capture the present and future positions. • Updates are necessary only when the parameters of the functions change. • For example, given , the current and anticiapted, future position of a two-dimensional point can be described by four parameters. WIM workshop, Gl. Vrå Slot, December 6-8, 2001
x o3 6 5 4 o2 o1 3 2 o1 1 o4 t 1 2 3 4 5 6 Queries • Type 1: objects that intersect a given rectangle at • Type 2: objects that intersect a given rectangle sometime from to • Type 3: objects that intersect a given moving rectangle sometime between and • Wecan expect, that most queries will be consentrated in the sliding window [CT, CT+W], i.e. CT <= t, t1, t2 <= CT + W WIM workshop, Gl. Vrå Slot, December 6-8, 2001
At any t > tcwe can get a valid R-tree: TPBR-tree(t) = R-tree Time-Parameterized Rectangles • The TPR-tree is based on the R-tree. • Moving points are bounded with time-parameterized rectangles. • Are bounding from now on. • The R-tree allows overlap. • The tree employs conservative bounding rectangles. WIM workshop, Gl. Vrå Slot, December 6-8, 2001
5 5 5 5 5 5 4 4 4 4 7 7 7 7 7 7 4 4 6 6 6 6 6 6 2 2 2 2 2 2 1 1 1 1 1 1 3 3 3 3 3 3 Insertion: Grouping Points • How to group moving points (Penalty and PickSplit)? • The R-tree’s algorithms minimize characteristics of MBRs such as area, overlap, and margin. • How does that work for moving points? WIM workshop, Gl. Vrå Slot, December 6-8, 2001
We use the regular R*-tree algorithms, but all bounding rectangle characteristics are replaced by their integrals. • What H to use? • H depends on the update rate, and on how far queries may reach into the future (W). where A(t) is, e.g., the area of an MBR Insertion in the TPR-Tree • The bounding rectangle characteristics (area, overlap, and margin) are functions of time. • The goal is to minimize these for all time points from now to now+H. • Minimizing the characteristics for time now + H/2 does not work (e.g., the area of a conservative bounding rectangle is not linear). WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Outline • An overview of the R-tree and the TPR-tree • Project proposals: • Update-Efficient TPR-tree • Time-parameterized SS-tree WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Update-Efficient TPR-tree • Handling hyper-dynamic data • 500,000 objects; on the average each object updates its positional info three times per hour • => ~400 updates per second • Update – deletion followed by an insertion • Observations: • Usually object’s positional information does not change too drastically in-between updates • Most of the update cost is due to a search phase of a deletion (several paths down the tree may be followed) • We assume that the object reports it’s previous positional information, so that we know what to delete. • We need to spend I/Os on making bounding predicates as “tight” as possible, although we may be willing to sacrifice query performance WIM workshop, Gl. Vrå Slot, December 6-8, 2001
In-place Updates • Lazy Update R-tree (LUR-tree): • Hash table (on object id’s) is used to access leaf pages directly (without the search phase of deletion). • Update is one operation: • Go to the hash table with an object’s id, and get the pointer to the leaf page • Update the object’s information in this page or, if object’s information changed too “drastically”, insert it from the top of the tree using the normal insertion procedure WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Problems to Solve • Problems (that you have to try to solve, refining and applying these ideas to the TPR-tree): • How do we update bounding rectangles in ancestor nodes? • Possible solution: hash table storing the full path from the root to the leaf • When do we do a real insertion and when an update in place? • What do we do when nodes are split/merged? (Can we spend so many I/Os maintaining our hash table?) • Possible solution: Lazy updating of the hash table and use of pointers to split-off nodes as in R-link trees. WIM workshop, Gl. Vrå Slot, December 6-8, 2001
Time-Parameterized SS-trees • SS-tree – a Grow-Post tree, where bounding predicates are spheres: • Good for Nearest Neighbor queries • Compact description of a bounding predicate (independent of dimensionality) • Project – explore time-parameterized SS-trees. Issues to be addressed: • Writing the Consistent method • Writing the Penalty method • Experimentally comparing with TPR-tree for range queries and NN queries WIM workshop, Gl. Vrå Slot, December 6-8, 2001