330 likes | 455 Views
SD-Rtree: A Scalable Distributed Rtree. Witold Litwin & Cédric du Mouza & Philippe Rigaux. Plan. Introduction SDDS R-tree SD-Rtree Evolution Balancing Spatial Rotations Overlapping Redundant Coverage Queries Performance Conclusion. SDDS Principles (1993).
E N D
SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux
Plan • Introduction • SDDS • R-tree • SD-Rtree Evolution • Balancing • Spatial Rotations • Overlapping • Redundant Coverage • Queries • Performance • Conclusion
SDDS Principles (1993) • Data are at server nodes • Communicating through point-to-point messaging ; • Overloaded servers split over new servers • Queries go to client nodes use local images of the SDDS • No central addressing component • A node can be client and server (peer)
SDDS Principles (1993) • An outdated image may send a query an incorrect server • Servers forward such a query to the correct server • Image gets adjusted • Image Adjustment Message (IAM) comes back • Client does not repeat the same error twice • Data are basically in the RAM of the servers
SD-Rtree : a Spatial SDDS Distributed Spatial Data
SD-Rtree : a Spatial SDDS • Distributed Index • No central component
SD-Rtree : a Spatial SDDS • Point & Window Queries • kNN queries (future)
SD-Rtree : Generalizes R-tree • R-tree: • Nodes are minimal bounding boxes • Leaf nodes point to data • Internal nodes bound subtrees • May overlap • Split when overflow • Generate balancedm-ary tree
SD-Rtree : Generalizes R-tree • R-tree: • An insert may go through multiple paths • Ends up in the smallest bounding box • If there is any • One of the boxes gets enlarged • Box may split
SD-Rtree : Generalizes R-tree • R-tree: • Search may go through multiple paths • All paths may bring relevant objects
SD-Rtree: a Balanced Binary Tree • The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: • Each internal node (or routing node) has exactly two sons • Each leaf node stores a subset of the indexed dataset • At each node, the height of the subtrees differ by at most one • Each server stores one data node and one routing node
Sd-tree: Binary Tree Structure • di = data node (leaf) • ri = routing node (internal node)
SD-Rtree Balancing • The binary tree should be height-balanced • The heights of the two subtrees rooted at any node should not differ by more than 1 (cf. AVL trees) • The tree height is then logarithmic in the number of leaves
SD-Rtree Balancing • SD-Rtree balancing occurs during splits • Messages are sent bottom-up to adjust the height of the ancestor nodes • Rotation occurs if an ancestor is imbalanced • SD-Rtree rotation are spatial • change rectangles of internal nodes • Best rotation minimizes rectangle overlapping • Tie breaking minimizes the « dead space »
Properties The sons of a node are not ordered => more freedom for reorganizing the tree Any imbalanced node matches a rotation pattern A rotation pattern is a subtree a(b(e(f,g),d),c) such that: h(c) = h(d) = h(f ) = n − 1 (n > 0) h(g) = max(0, n − 2) Rotation Pattern
Rotation Cost • Constant number of messages (3 or 6, depending on the choice) • Few rotations in practice • In particular when the dataset is uniformly distributed • See our experiments
SD-Rtree : Images • Each image defines the addressing structure • Resides as cache on a client or on a peer • Starts with the address of the contact server • IAMs make it a subtree • Splits make images outdated • IAMs adjust it incrementally
Image Adjustment • Client contacts a server with a query • Each incorrect server initiates a traversal of the tree • During the traversal, the description of the nodes is collected • The correct server sends the up-to-date tree structure • The client updates its image
Overlapping management • The directory rectangles in an Rtree may overlap • Local subtree does not suffice for locating all the nodes that contains the point (point query) or the window (window query) searched for. • SD-Rtree servers maintain data on node overlapping • Redundant Coverage • It avoids to systematically access the root node.
Redundant Coverage • Example • The region common to A andB is stored on both nodes • If a point query sent to A falls in the region shared with B: A sends a point query message to B • For D: we must keep the intersection with C or B: here empty.
Queries • Point queries and window queries. The technique is similar to the insertion algorithm: • Search in the client image a server whose mbb contains the point or intersects the window • Send the query to this server • If the server actually covers the point or the window; it answers to the client; else it sends the query to its parent node • A server uses the overlapping information to transmit the query
Experiments • Synthetic data (points and rectangles) generated with GSTD • 50.000 to 500.000 objects • 0 to 3.000 queries • Server capacity: 3 000 objects • Comparison of three SD-Rtree variants: • BASIC: no image; every query is processed top-down from the root • IMSERVER: no IAMs among the servers • IMCLIENT: client images
Conclusion • SD-Rtree is an efficient scalable distributed Rtree • For very large spatial data collections • Can be processed in distributed RAM • Access time much faster than to disk data • Load balancing • Spatial rotations • Overlapping management • Redundant coverage • O(log n) worst insert cost • Future work • kNN-queries • Objects distribution balancing on servers
SD-Rtree Thank You for Your Attention Questions: First.Last@dauphine.fr