170 likes | 471 Views
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries. Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang. Problem Definition: Range Max Queries. Range-aggregate queries : range-count, range-sum, range-max
E N D
I/O-Efficient Structures for OrthogonalRange Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang
Problem Definition: Range Max Queries • Range-aggregate queries: range-count, range-sum, range-max • N points in Rd • Each point p is associated with a weight w(p) • Query rectangle Q • Compute max{w(p) | pQ} • Static and dynamic
Problem Definition: Stabbing Max Queries • N hyper-rectangles in Rd • Each rectangle γis associated with a weight w(γ) • Query point q • Compute max{w(γ) | qγ}
Model • I/O Model • N : Elements in structure • B : Elements per block • M : Elements in main memory • n = N/B • Assumptions • M>B2 • Each word holds log2N bits • Any coordinate or weight can be stored in one word D Block I/O M P
Related Work & Our Results: Range Queries • 1D range queries are easy: B-tree • O(n) space, O(logBn) query & update • 2D range queries: • Poly-logarithmic query: CRB-tree [AAG03] • O(nlogBn) space, O(log2Bn) query • Linear space: kdB-tree, cross-tree, O-tree • query, O(logBn) update • Our results:
Related Work & Our Results: Stabbing Queries • 1D stabbing queries • SB-tree [YW01] • O(n) space, O(logBn) query & insert • Does not allow deletions! • 2D stabbing queries • No structures with worst-case guarantee • Our results:
2D Range Max Queries • The external version of Chazelle’s structure [C88] • Linear space, • Static: O(log1+εN) query • Dynamic: O(log3N log log N) query & update • Overall structure • A normal B-tree Φ on y-coordinates of all the points • A Fan-out base B-tree T on x-coordinates • Pv: all points stored in the subtree of v • Each internal node v stores two secondary structures Cv, Mv storing information about Pv in a compressed manner • Cv and Mv of size O(|Pv| / logBn) → linear size in total • Weights of points stored at leaves explicitly
2D Range Max Queries • Cv borrowed from CRB-tree • Compute the ranks of the points one level down in O(1) I/Os • Identify the weight of a point explicitly in O(logBn) I/Os • Mv computes the maximum weight in a multislab inO(logBn) I/Os • Answering a query: • Use Φ to compute the ranksin the root of T • Use Mv to compute maximumat each level • For a total of O(log2Bn) I/Os v v1 v2 v3 v4 v5 v6
2D Range Max Queries: Mv • Divide Pv into chunks of BlogBN • Divide each chunk into minichunks of size B • Three-level structures • Mv=(Ψ1, Ψ2, Ψ3) • each of size O(|Pv| / logBn) v
2D Range Max Queries: Mv • Basic idea: encode the range max information in a compressed manner, identify the maximum point using Cv once its rank is found • Ψ3[l]: for each minichunk, stores a (slab index, weight rank) pair for each point inside the minichunk • Find the rank of the maximum-weight point in O(1) I/Os; • Identify it in O(logBN) I/Os. • Ψ2[k]: for each chunk, encode a Cartesian tree on the O(logBN) minichunks for each of the O(B) multislabs • Find the minichunk containing the maximum-weight point in O(1) I/Os; • Use Ψ3to find the exact point in O(logBN) I/Os; • Ψ1: A fanout B-tree on the O(|Pv| / (BlogBn)) chunks • Find the maximum-weight point in O(logBN) I/Os.
2D Range Max Queries • Static structures • O(n) size, O(log2BN) query, O(nlogBN) construction • O(n) size, O(logB1+εN) query, O(NlogBN) construction • Dynamization: • Throw away Ψ2 and expandΨ3 • O(nlogBlogBN) size • O(log3BN) query, worst case • O(log2BN logM/BlogBN) insert, amortized • O(log2BN) delete, amortized • Extending to d-dimension • Standard technique • Pay an extra O(logd-2BN) factor to all these bounds
v 1D Stabbing Max Queries • Modify the external interval tree [AV96] to support max • Fan-out base B-tree on x-coordinates • Interval stored in highest node v where it contains slab boundary • In one left (right) slab structure and the multislab structure • Answering a query • Search down tree and visit O(logBN)nodes • Compute the maximum weight in left (right)slab structure and the multislab structure
1D Stabbing Max Queries • Slab structures are implemented using B-trees • Query and update: O(logBN) I/Os • Multislab structure: Fan-out B-tree • At each internal node, we store the maximum weight for each of the slabs and for each of the children • Query: O(1) I/Os (only look at the root) • Update: O(logBN) I/Os • Rebalancing the base tree: O(logBN) I/Os • Weight-balanced B-trees • Overall cost: size O(n), query O(log2BN), update O(logBN).
1D Stabbing Max Queries • Space-time tradeoff: • O(nlogBεN) size • O(nlogB2-εN) query • Can handle the general semigroup queries • A semigroup (S, +) • Each weight w(γ) S • Want to compute ∑ qγw(γ) • Ideas can also be used to improve the internal memory algorithm • Linear size, O(log2N / log log N) query and update
2D Stabbing Max Queries • Extend our 1D stabbing query structure • Use our 2D range query structure as a building block • Extending to d-dimension • Standard technique • Pay an extra O(logd-2BN) factor to all these bounds
Conclusions and Open Problems • In this project, we developed I/O-efficient • linear space structures with poly-logarithmic query cost for the static 2D range max queries • near linear space structures with poly-logarithmic query & update cost for the dynamic 2D range max queries • linear space structures with poly-logarithmic query cost for the dynamic 1D stabbing max queries • near linear space structures with poly-logarithmic query & update cost for the dynamic 2D stabbing max queries • Open problems • Linear size dynamic structures for the 2D range & stabbing max queries? • General semigroup queries?
THE END Thank you!