High Concurrent R-tree Operations when Tracking Continuous Movement

High Concurrent R-tree Operations when Tracking Continuous Movement CezarChitac, RobertasKerpys, RalucaMarcuta

Motivation • Need for tracking moving objects in real time: • concurrency • Organize and access positional information • Queries: • search query • range query

Overview • Problem Formulation • Queries vs. Updates • First Approach: Split-Supporting Index • Second Approach: Split-Free Index • Related Work • Project Status • Conclusions • Future Work

Problem Formulation • Frequent updates • Efficient R-tree index structure • Concurrency between queries and updates • Objectives • Query performance • Update performance • Data freshness • Challenges • Structural modifications during concurrent tree operations: queries and updates avoid locks

Range Queries • Objects send updates when moving δ units: • reported position • real position δ range last reported position expanded range real position at current time Current time in [ts, te)  time for which the results are returned

Semantics - may have been in range at time ts • Updates used to construct answer to range query – freshest possible: • all that finish before ts • some that finish after ts • Constructing the resulting set • roll time back or forward to ts • area where object may have been at ts: • circle of radius: min(vmax|ts - tu|, δ), center: (x, y) • intersect with original unexpanded range

Background: Bottom-up Updates • Efficient updates: no top-down traversal • Secondary index on oid

Types of Update: Local & Non-local Query p10 p10 p13 p5 p4 p5 p6 p7 p11 p12 p9 p1 p2 p3 p8 p10 R2 R1 R5 R6 R3 R7 R4

Types of Update: Local & Non-local • Local updates • modify position coordinates • no structural modifications • Non-local updates • move object to another leaf node => delete + insert • problem: concurrent query • Solution • insert + logical delete (negative tu)

Query vs. Update Query • Query retrieves: • old position if new is inserted in the already scanned leaf nodes p10 p10 • – both => query chooses freshest • Pold in hash-table: used by next update • delete logically marked entry

General Index Structure

Overview of Update Process

Split-Supporting Index • Algorithm is based on atomic operations and versioning of the items • Latching is minimal

Node Split - exclusive latch between non-local updates - marks logically deleted items p14 p14 p3 p4 p5 p14

Node Split - exclusive latch between non-local updates - marks logically deleted items R3 R4 R5 p1 p2 p6 p7 p14 R9 p3 p4 p5 p14

Node Split - exclusive latch between non-local updates - marks logically deleted items R10 p14 R3 R4 R5 R9 p2 p3 p4 p5 p14 p1 p6 p7

Local Updates • Are allowed during splits and merges R3 Secondary index N2 N3 N1 p1 p2 N1 N1’ Nil Nil 0 N1 N1’ Nil Nil 1 N2’ N2 0 R4 N2’ N1’ N3’ p3 p4 p6 p7

Non-Local Updates • Are not allowed to make changes for items which are involved into split • Updates are put into a priority queue and retried later

Merge Merge underflow node with one of the sibling nodes • Sibling node have space for all entries • Sibling node would become overflow after insertion

Sibling node has space for all entries • Sibling and underflow nodes are latched • New empty node is created • Entries from sibling and underflow nodes are copied into the new node • New node is introduced into structure by atomic swap of the pointers

Sibling node would become overflow • Split of the sibling node is performed • Split function accepts all entries from the underflow node instead of one entry • Entries are distributed between two new nodes • Two new nodes are introduced into structure in two atomic operations

Summary Advantages Disadvantages • Local updates are permitted during node splits and merges • Queries can execute concurrently • High complexity due to avoidance of locks • Creation of artificial updates

Second Approach – Main Idea • Splits and merges: • Time consuming • Increase complexity • Artificial updates • Goals: • Objects update only when they move • No splits and no merges

Parameters

Logically Overfull Node cut_val persistent part evacuating part R2 R1 p7 p5 p1 p4 p6 p3 Node is logically overfull: LO = 6 Create new node p2 Algorithm: choose cut value Change nodes’ states Store pointer to new node

Node Structure split_ptr – pointer to newly created node state – represents a node’s state: Normal, Evacuating, Populating or New cut – stores the axis by which the node was “divided” cutval – stores the value of the axis ev_part– indicates the part that is evacuating iNeed – indicates a node’s desire to attract or repel objects

State Diagram PU≤NR ≤LU NR = 0 Creation Populating Insert(obj) New NR ≤LU Insert(obj) & NR = LU Delete(obj) & NR=PU+1 Delete(obj) & NR = LU+1 Insert(obj) & NR = LU Normal NR ≤PU Total Evacuation NR ≥LO/2 Insert(obj) & NR = LO LU+1≤NR ≤LO Delete(obj) & NR =LO/2 Delete(obj) & NR=1 Evacuating Deletion

Find Node Heuristics • Search parent node first • Sibling node in need of objects • Top-down tree traversal based on: • MBR area enlargement • iNeed values

Local and Non-local R1 R2 R12 R3 R4 R5 R6 R7 R8 R9 R10 R11 p1 p2 p3 p5 p4 p3 p4 P3 local update p5 non-local update to R3

Summary • Advantages: • Algorithmic simplicity • No artificial updates • Novelty • Disadvantages: • Setting heuristic parameters • Logical complexity

Related Work • Logical and Physical Versioning in Main Memory Databases [Rastogi et al. 1997] • Trees or Grids: Main-memory Indexing [Šidlauskas et al. 2009] • High-Concurrency Locking in R-trees: R-link [Kornacker & Banks 1995] • Existing concurrent approaches: • An Enhanced Concurrency Control Scheme for Multi-dimensional Index Structures [Song et al. 2004] • CGiST: Concurrency and Recovery in Generalized Search Trees [Kornacker et al. 1997]

Status of the Project • Semantics of application domain • Concurrent queries and updates • An approach based on copying on demand: • Create minimal structure on the side • Integrate using atomic operations • A new approach: • A tree structure with no splits or merges • Necessary heuristics to compensate

Conclusion • Addresses concurrency issues when minimizing locking/latching • Two approaches debated (one novel) • Focus on concurrency while maintaining structure integrity

Future Work • Next semester: • Implementation of second approach • Comparison with relevant existing approaches • Additional work: • Implementation of the first approach • Comparison between the two

Feedback • What parts of the presentation: • needed more focus? • unnecessary? • too detailed? • Was the flow of the presentation natural? • Any thoughts about our two presented methods?

High Concurrent R-tree Operations when Tracking Continuous Movement