710 likes | 853 Views
I/O-efficient Point Location using Persistent B-Trees Lars Arge, Andrew Danner, and Sha-Mayn Teh Department of Computer Science, Duke University (2003). The Planar Point Location Problem.
E N D
I/O-efficient Point Location using Persistent B-TreesLars Arge, Andrew Danner, and Sha-Mayn TehDepartment of Computer Science, Duke University (2003) Michal Balas
The Planar Point Location Problem Storing a planar subdivision defined by N line segments such that the region containing a query point p can be computed efficiently Michal Balas
Planar Point Location Applications • Geographic Information systems (GIS) • Spatial Databases • Graphics Usually the datasets are larger than the size of physical memory and must reside on disk Michal Balas
Previous Works • So far, few theoretically I/O efficient structures were developed, but all are relatively complicated and none of them was implemented • Vahrenhold and Hinrichs (2001) suggested a heuristic structure that is simple and efficient but theoretically non optimal Michal Balas
Goal find a planar point location structure that minimizes the number of I/Os needed to answer a query, which is efficient both in theory and in practice. Michal Balas
Lecture’s Road Map • Motivation • The Vertical Ray Shooting problem and the need of persistent data structures • Review: • B-trees, B+ trees, and I/O model • Persistent B-trees • The modified Persistent B-tree • Experimental results • Open problems Michal Balas
Vertical Ray Shooting • A generalized version of the Planar Point Location problem • Given a set of N non-intersecting segments in the plane, construct a data structure such that the segment directly above a query point p can be found efficiently • We will consider this problem. Michal Balas
Example Michal Balas
Vertical Ray Shooting • Based on the persistent search tree idea of Sarnak and Tarjan (1986). • Any vertical line l in the plane introduces an “above-below” order on the segments it intersects. • We will “sweep” the plane from left to right with a vertical line • Our “critical” x-axis points are the endpoints of all segments Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sort critical points by x-values • For each critical point pi=(xi,yi) we can build a search tree for the segments intersecting a vertical line at xi according to the y-values (at xi) • Until the next critical point pi+1 the tree is static – it will change only in the next begin/end point of a segment Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Worst case analysis: • Hold a search tree to each critical point • Space: O(n2) Michal Balas
Vertical Ray Shooting & Persistent Search Trees • We should use the fact that two consecutive trees (versions) differ only by one insertion or deletion (assuming distinct x-values for all endpoints). Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Persistent data structure • Preserves versions. In ordinary (ephemeral) data structures there is only one last version (every update changes the data structure so its state before the update can no longer be accessed) • Each update creates a version • The current version of the structure can be modified and all versions of the structure, past and present, can be accessed. Michal Balas
Vertical Ray Shooting & Persistent Search Trees • We would like to save a version of the search tree for each critical point. Since we want to be space efficient, we will use persistent search tree. • A persistent search tree differs from an ordinary search tree in that after an insertion or deletion, the old version of the tree can still be accessed. • Here the persistent search tree should supports insertions and deletions in the present and queries in the past. (partially persistent) Michal Balas
Vertical Ray Shooting & Persistent Search Trees • We will insert a segment into the persistent search tree when its left endpoint is encountered • We will delete a segment persistently from the tree when its right endpoint is encountered. • Two consecutive versions of the tree differ only by a certain number of deletions and insertions (in the distinct x-values case by 1 only) Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Given a query point p=(x,y) , we will search for the position of y in the version of the search tree when the sweep line was at x. Michal Balas
x Vertical Ray Shooting & Persistent Search Trees • Path Copying: • A balanced search tree • When x is inserted the changes are only on the path from the root to x • Instead of copying the whole tree we will copy only the updated path • The roots will be ordered by version Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Path Copying: • Space: O(nlogn) – better, but not good enough r1 r2 x Michal Balas
right left t1 t2 Vertical Ray Shooting & Persistent Search Trees • Extra Pointers : • Instead of copying the path, we will save for each node a few pointers ( a list of left children and right children, thought it’s a binary tree) Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Extra Pointers : • Here there is no limitation on the # of pointers per node • In the worst case, it will take O(logn) time to find the relevant version per node (the pointers are in a binary search tree) – which is not optimal • We need constant time per node Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution: • Limited node copying, k extra pointers per node • k should be a small positive number (k=1 will do) • When a pointer is added to a node, if there is no empty slot for a new pointer, we copy the node, setting the initial left and right pointers of the copy to their latest values. • Update the parent with the new copy, if the parent has no free slot the process is repeated. Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • Amortized analysis: we will see that every set of m operations takes O(m) space. • The potential of the structure is defined to be: F = # live nodes – (1/k)*(# free slots in the live nodes) • amortized space cost of update = (actual # of nodes it creates) – DF Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • We will show that amortized space cost of an update is bounded by O(1) per update. • If a new unused slot in node v is used, but the node is still not full, then the actual # of new nodes created is 0, DF is (-1/k) (#free slots in live nodes decreased by 1), thus amortized space cost of this update is 1/k. • If node copying has occurred, the actual # of new nodes created is 1, DF is 1 (#free slots in live nodes increased by k), thus amortized space cost of this update is 0. Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • During an update, node copying continues in the path from node to root until the root is copied or a node with a free slot is reached. • The amortized space cost of node copying is 0 and of occupying a free slot is 1/k Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • The total amortized space cost of an update is constant (0 or 1/k) • The space of rebalance information per node is constant • In red-black trees, rebalancing after deletion or insertion can be done in O(1) rotations and O(1) color changes per update in the amortized case • Since an insertion or deletion requires O(1) new pointers not counting node copying, the amortized space cost of an update is O(1) Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution - Space analysis • sum up over all updates: amortized space cost over all updates = cn = required space – (Fend – Fstart) • Fstart=0 (we start with an empty data structure) • Fend=O(n) (according to the potential function definition, this is an upper bound on the potential in the end) • Required space = cn + O(n) = O(n) (this is a bound on the number of nodes created) Michal Balas
Vertical Ray Shooting & Persistent Search Trees • Sarnak & Tarjan solution – Complexity • O(log m) query time (m is the total # of updates) • O(log n) update time (n is the current size of the set) • O(1) amortized space per update • O(nlogn) preprocessing time Michal Balas
Where are we going? The use of Persistent Data structures (always preserves the previous version of itself when it is modified) The use of B-trees in the I/O Model (B-tree is the I/O model equivalent of a search tree) I/O efficient Persistent B-tree (works great with totally ordered elements) Modified I/O efficient Persistent B-tree (only elements present in the same version of the structure need to be comparable) Michal Balas 28
Vertical Ray Shooting & Persistent Search Trees • Two segments that cannot be intersected with the same vertical line are not comparable ( “above-below”) • Corollary: Not all segments stored in the persistent structure over its lifespan are comparable An I/O efficient structure cannot directly be obtained using a persistent B-tree (because standard persistent B-trees require total order on all elements) Michal Balas
Vertical Ray Shooting & Persistent Search Trees • To make the structure I/O-efficient, we need to modify the tree so it will only require elements present in the same version of the structure to be comparable Michal Balas
Lecture’s Road Map • Motivation • The Vertical Ray Shooting problem and the need of persistent data structures • Review: • B-trees, B+ trees, and I/O model • Persistent B-trees • The modified Persistent B-tree • Experimental results • Open problems Michal Balas
Review: The I/O Model • Infinite disk size • M - Main Memory size • B - Block size • N - elements in the structure D M Block I/O Michal Balas
Review: The I/O Model - Cont • Computation can only occur on data stored in main memory. • We are interested in the number of I/Os used to answer a query. • The B-tree is the external memory equivalent of the balanced search tree in internal memory. Michal Balas
Review: B-tree • A balanced search tree • All leaves are on the same level • All internal nodes (except the root) have between B/2 and B children (q(B)) • A node/leaf can be stored in O(1) blocks Michal Balas
Review: B-tree - Cont • Space complexity of the tree: O(N/B) blocks (where N is the number of elements) – linear • Tree height: O(logBN) • Insert/Delete can be done with O(logBN) I/Os Michal Balas
Review: B+-tree • It is a B-tree in which all elements are stored in the leaves. • The internal nodes contain “routing elements”. Michal Balas
B-tree Example (B+-tree) 3 5 1 2 3 4 5 6 7 d2 d3 d4 d5 d6 d7 d1 Michal Balas
Where are we going? The use of Persistent Data structures (always preserves the previous version of itself when it is modified) The use of B-trees in the I/O Model (B-tree is the I/O model equivalent of a search tree) I/O efficient Persistent B-tree (works great with totally ordered elements) Modified I/O efficient Persistent B-tree (only elements present in the same version of the structure need to be comparable) Michal Balas 38
Review: Persistent B-tree • Directed acyclic graph • The elements are in the sinks (leaves) • “routing elements” in internal nodes • Elements (and nodes) augmented with “existence interval” • In this interval the element is “alive” • An element is “alive” - between its insert and its delete version Michal Balas
Review: Persistent B-tree - Cont • Nodes “alive” at time t form a (aB,B) B-tree, 0<a<1/2 • We will work with a=1/4 • Additional invariant: • A new node must contain between (a+g)B and (1-g)B alive elements ( a > g ) • For g=1/8, a=1/4,new node contains between (3/8)B and (7/8)B alive elements • We require that g>2/B, a-g >=1/B, 2a+3g<= 1-3/B Michal Balas
Review: Persistent B-tree - Cont • In order to find the appropriate root at time t, the roots are stored in a standard B-tree • Takes O(logBN) I/Os • A node/leaf contains O(B) elements = O(1) blocks # Blocks needed to hold the structure: O(N/B) Michal Balas
Persistent B-tree Insert • x is the element to insert into the current version of the tree • Search the leaf l and insert x (O(logBN) I/Os) • if l contains > B elements -> Block overflow • Version-Split (copy all k alive elements from l to a new node v and mark l as dead) • If k is in [(3/8)B,(7/8)B] - simple • If k > (7/8)B – strong overflow • If k < (3/8)B – strong underflow Strong overflow/underflow violates the additional invariant we defined earlier Michal Balas
Persistent B-tree Insert • If k is in [(3/8)B,(7/8)B] : recursively update parent(l): persistently delete the reference to l and insert a reference to v Michal Balas
Persistent B-tree Insert - Cont • If k > (7/8)B – strong overflow • split create nodes v1, v2 each with k/2 elements. k/2 is in ((3/8)B,(7/8)B) (this is not tight) • Update parent(l) recursively: persistently delete the reference to l and insert two references to v1, v2 Michal Balas
Persistent B-tree Insert - Cont • If k < (3/8)B – strong underflow • Version-split of sibling l’ of l -> obtain k’ other alive elements (k’ is in [aB,B]) k+k’>= 2aB, and a > g, thus k+k’ > (a+g)B (the invariant…) 1) ifk+k’ <= (1-g)B: merge -create a new leaf with k+k’ elements 2) if k+k’ >(1-g)B: share – split to create two new leaves. • Update parent(l) recursively: persistently delete two references and insert one or two Michal Balas
Persistent B-tree Delete • x is the element to delete from the current version of the tree • Search the leaf l that contains and mark x as dead (O(logBN) I/Os) • if l contains < (1/4)B alive elements -> Block underflow (this is also a strong underflow, since k < (3/8)B ) • Version-Split on a sibling node to obtain k+k’ elements. k+k’>= 2aB -1 , and a- g > =1/B, thus k+k’ > (a+g)B (the invariant…) mark l dead and create a new node v with k+k’ elements (merge) if there is a strong overflow in v – share (as in insert) • Update parent(l) recursively: persistently delete two references and insert one or two references Michal Balas
Persistent B-tree – Rebalance Operations Delete Insert Block Overflow Block Underflow Done 0,0 Version-split Version-split Done -1,+1 Strong Underflow Strong Overflow Merge Split Done -2,+1 Done -1,+2 Strong Overflow Split Done -2,+2 Michal Balas
Persistent B-tree - Complexity • Updates: O(logBN) I/Os • search and rebalance on one path from root to leaf • What about the required space? Michal Balas
Persistent B-tree - Complexity • A few observations: • A rebalance operation on leaf creates <= 2 new nodes • Once a leaf is created, at least gB updates have to be performed on it before another rebalance operation will occur. • Two version-splits might only create one new leaf • Each time a leaf is created or a leaf version-split performed, a corresponding insertion or deletion is performed recursively one level up the tree. • During N updates: • # leaves created <= 2N/gB = O(N/B) • # leaf version-splits<= 2N/gB • # nodes created one level up the tree <= 22N/(gB)2 • By induction: # nodes created i levels up the tree <= 2i+1N/(gB)i+1 • Total # nodes created <= (it is also the # of blocks used after N updates) • Space: O(N/B) blocks Michal Balas
Lecture’s Road Map • Motivation • The Vertical Ray Shooting problem and the need of persistent data structures • Review: • B-trees, B+ trees, and I/O model • Persistent B-trees • The modified Persistent B-tree • Experimental results • Open problems Michal Balas