220 likes | 345 Views
A Self-adjusting D ata Structure for Multi-dimensional Point S ets. Eunhui Park & David M. Mount University of Maryland Sep. 2012. Motivation. Sleator & Tarjan introduced the splay tree almost 30 years ago. S elf adjusts to access distribution
E N D
A Self-adjusting Data Structurefor Multi-dimensional Point Sets Eunhui Park & David M. Mount University of Maryland Sep. 2012
Motivation • Sleator & Tarjan introduced the splay tree almost 30 years ago. • Self adjusts to access distribution • Supports insertion and deletion inO(log n) amortized time • Efficient access: • Balance property – maccesses in O((m+n) log n) time • Scanning property [Elmasry 2004] – access all items in O(n) time • Working set property – … on temporal locality • Static optimality property – Efficient access based on frequency • Static & dynamic finger [Cole, 2000] properties – … on spatial locality Is there a multi-dimensional generalization?
Background • Compressed Quadtree • Hierarchical partition of space • O(n) space • O(log n) access time if augmented: • Topology tree [Frederickson1985, Har-Peled2005 ] • Skip quadtree [Eppstein, Goodrich, Sun 2005] • Quadtreap [Mount, Park 2010] based on treap [Seidel, Aragon 1996] • Efficient approximate proximity queries • Approximate nearest neighbor search • Approximate range search
Objective • Like quadtrees: • A versatile geometric partition tree • Supports efficient approximate proximity queries • Like splay trees: • Adjusts to access distribution • Supports insertion/deletion in O(log n) amortized time • Supports splay tree access properties: balance, static optimality, working set, static finger Quadtree + Splay tree Splay Quadtree
Overview • BD-tree • BD-tree • Rotation • Splaying operation • Basic splaying • Splaying • Efficiency • Insertion/deletion • Search and access efficiency
BD-tree • Each node is associated with a region of space called a cell. • Each cell is defined by an outer box and an optional inner box. • Partition operations: split and shrink. • Internal nodes: split nodes and shrink nodes. • Each leaf has a single point or a single inner box. Box Decomposition tree (BD-tree) : A geometric data structure based on a hierarchical decomposition of space into d-dimensional axis-aligned rectangles box cell leaves
BD-tree: Partitioning Operations • Split Partitions a cell by an axis-orthogonal hyperplanethat bisects the cell’s longest side. • Shrink Partitions a cell by a shrinking box, which lies within the cell. C D E C right left E D split C C C outer inner F F C\F shrink
523686 BD-tree: Promotion • By construction, nodes are generated in shrink-split pairs. We merge each into a single ternary node, called a pseudo-node. • Tree can be restructured through a local operation, called promotion. shrink node outer inner split node right left pseudo-node right outer left x y E y x A D E B C D C A D E B C B A
Splay Quadtree • Given an internal node, x, splay(x) uses promotions to transform x to the root of the tree • This makes future accesses to x more efficient g x splay(x) b f g e c d c f d b e x
Basic Splaying • As in Sleator & Tarjan, splaying is based on primitive operations: • Zig-zag • Zig-zig z z x y x F G F G z y D y x A B D E A E B C F G C A D E B C x z y y y A B F G z x z D x D E C D A E B C F G E F G A B C
The Problem of Right Promotion • Inner-left convention: • If an internal node’s cell has an inner box, it resides in its left child • If necessary, left and right children are relabeled to satisfy this • This guarantees that each cell has constant complexity • Right promotion may violate this convention y x E y x B A E C A D u v D A E B C D u v u v If this cell has an inner box, u C B Now, y’s cell has two inner boxes, u and v !
Splaying in 3-Phases • Promotions must be carefully structured to avoid this problem • 3-phased approach (3 passes from bottom to top) • As in Sleator & Tarjan, amortized efficiency is established by a potential-based analysis. g a g R g R b f b O R L g f e O c O a c c d L d R d c e d b L R f a b f L e a e
Insertion and deletion • Insert(q): locate leaf x containing q add q as new leaf splay(x) • Insertion can be performed in O(log n) amortized time. • Deletion can be performed in O(log n) amortized time. x q x x q
Analogous to Splay Trees • Balance Theorem: Total access for q1, q2, …, qmtakes O((m+n)log n) time. • Working Set Theorem: For each access qj, let tj be the number of different queries since the last access of qj, or since the beginning if this is the qj’s first access. Total m access queries take O(). • Static Optimality Theorem: Given a quadtree subdivision Z, where each cell zZ has an access probability pz, the entropy of Z is defined as Total m access queries take O().
Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, imtakes O(m). • d-dim • For a single point , - Let ×
Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, imtakes O(m). • d-dim • But most geometric queries involve regions, not points - Let ×
Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, imtakes O(m). • d-dim • queries - Let ×
Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, imtakes O(m). • d-dim • For the technical reasons, need to expand - Let ×
Static Finger Theorem • 1-dim (Sleator & Tarjan 83) Total access for i1, i2, …, imtakes O(m). • d-dim • Consider an expanded ball - Let • Define the working set to be the set of points within distance from • Total access for approx. range queries : (1/ε) d-1 • ANN queries • Box queries × : set of points in expanded ball
Conclusions • Splay Quadtree: • Self-adjusting geometric data structure • Supports insertion/deletion in O(log n) amortized time • Supports efficient approximate proximity queries • Open problems: • Other properties of standard splay trees? • Dynamic finger theorem • Scanning theorem • Better notions of distance (or generally locality) in a geometric setting?