780 likes | 925 Views
Computational Geometry Chapter 12. employee. age salary start date city address. Motivation Range Trees. e.g. Database Database Record Suppose you want to know all employees with Salary ∈ [40K, 50K] Scan records & pick out those in range *slow*. employee. age salary
E N D
employee • age • salary • start date • city address Motivation Range Trees • e.g. Database Database Record • Suppose you want to know all employees with Salary ∈ [40K, 50K] Scan records & pick out those in range *slow*
employee • age • salary • start date • city address Motivation • e.g. Database Database Record • Suppose you want to know all employees with Salary ∈ [40K, 50K] AND Age ∈ [25, 40] Scan records & check each one *slow*
Motivation Ctnd. • Alternative to Scan: Each employee is a point in space • age range [15, 75] • salary range [0K, 500K] • start date [1/1/1900, today] • city/address [College Station, Bryan, Austin, …] 4D • may just keep as data w/records • could encode • could use categories (of a row)
Motivation (cont) Orthogonal Range Query (Rectangular) • Want all points in the orthogonal range • Faster than linear scan if good data structures are used. • Want time O( f(n)+k ) ; where k= # of points reported Query #2 (age 25-40, salary 40K-50K) age … 45 35 25 15 … 0 50K 100K 150K 200K salary
3 9 27 28 29 98 141 187 200 201 202 999 1-D Range Searching • Data:Points P={p1, p2, … pn} in 1-D space (set of real numbers) • Query: Which points are in 1-D query rectangle (in interval [x, x’]) Data structure 1:Sorted Array • A= • Query: Search for x & x’ in A by binary search O(logn) Output all points between them. O(k) Total O(logn+k) • Update: Hard to insert points. Add point p’, locate it n A by binary search. Shift elements in A to make room. O(n) on avarage • Storage Cost:O(n) • Construction Cost: O(nlogn)
Evaluation of sorted array (cont) • Update: hard to insert. Find point of insertion in log n (via binary search), but have to shift. • Storage costs: O(n) • Construction costs: - just sorting (n log n)
1-D Range Searching Ctnd. Data structure 2:Balanced Binary Search Tree • Leaves store points in P (in order left to right) • Internal nodes are splitting values. v.xused to guide search. • Left sub tree of V contains all values ≤ v.x • Right sub tree of V contains all values > v.x • Query:[x, x’] • Locate x & x’ in T (search ends at leaves u & u’) • Points we want are located in leaves • In between u & u’ • Possibly in u (if x=u.x) • Possibly in u’ (if x’=u’.x) Leaves of sub trees rooted at nodes V s.t. parent (v) is on search path root to u (or root to u’)
49 23 80 62 10 37 89 3 19 30 49 59 70 100 89 3 10 19 23 30 37 59 62 70 80 100 105 1-D Range Searching Ctnd. • Look for node Vsplit where search paths for x & x’ split • Report all values in right sub tree on search path for x’ • Report all values in left sub tree on search path for x • Query: [18:77]
49 23 80 62 10 37 89 3 19 30 49 59 70 100 89 3 10 19 23 30 37 59 62 70 80 100 105 1-D Range Searching Ctnd. • Update Cost O(logn) • Storage Cost O(n) • Construction Cost O(nlogn) 1-D Range Tree Search path u u’
Algorithm: 1D Range Query(x, x’,v) • Input: range tree rooted at v and x ≤ x’ • Output: all points in range [x, x’] • if v is null return {} • if (x < v.x < x’) • L = 1DRangeQuery(x,x’,v.left) • R = 1DRangeQuery(x,x’,v.right) • return (L+v + R) • if v.x < x return 1DRangeQuery(x,x’,v.right) • return 1DRangeQuery(x,x’,v.left)
Time complexity of 1D-Range Query • Balanced Binary Search Tree • O(n) storage • O(n log n) construction time • Query Time • Time Spent – traversing root to μ + μ’ paths – O(log n) time • Time spent in ReportSubtree • Worst case theta Θ(n)Since may need to report all points if they fall in query range • Finer analysis gives total time in ReportSubtree is proportional to number of nodes reported O(s) if report s node Total Time – O(s + log n) when s is number of nodes reported
Tx Ty(v) T4 p6 p5 p4 p3 p1 p2 p7 Ty(v) v v p7 p5 p6 p1 p2 p3 p4 p5 p6 p7 P(v) P(v) Range trees 2D Queries - For each internal node vTx let P(v) be set of points stored in leaves of subtree rooted at v. Set P(v) is stored with v as another balanced binary search tree Ty(v) (second level tree) on y-coordinate. (have pointer from v to Ty(v))
Range trees Build 2D-Range Tree(P) input: a set of P points in the plane output: root of a 2D range tree • Construct 2nd level tree Ty for P (store entire points at leaves) • if (|P|=1) • then create leaf v. Ty(v):= Ty • else split P into Pleft and Pright parts of equal size by x coordinate around xmid • vleft :=Build2D-RangeTree(Pleft) • vright:=Build2D-RangeTree(Pright) • createnew v such that xv:=xmid Leftchild(v):=vleft Righteftchild(v):=vleft Ty(v):=Ty 8. return v end /* Build2D-RangeTree */
Range trees - revised Build 2D-Range Tree(P) input: a set of P points in the plane, sorted by x output: root of a 2D range tree, set of points sorted by y • if (|P|=1) • then create leaf v. Ty(v):= Ty • else split P into Pleft and Pright parts of equal size by x coordinate around xmid • (Vleft, Yleft):=Build2D-RangeTree(Pleft) • (Vright, Yright):=Build2D-RangeTree(Pright) • createnew v such that xv:=xmid Leftchild(v):=vleftt Righteftchild(v):=vrightt Yall =merge(Yleft,Yright)); Ty(v):=buildTree(Yall) 7. return (v, Yall) end /* Build2D-RangeTree */
Range trees • A 2D-range tree with n points uses O(nlogn) storage. • Proof. • Consider a point pP. • p is stored in Ty(v) for every node vTx such that p is a leaf of the subtree of Tx rooted at v. • There are O(logn) such subtrees those rooted on root of (Tx) to p path. (Tx has height O(logn)) • Each point stored O(logn) times. • N points requires O(nlogn) storage in total.
Construction Time for 2D Range Tree • Naive implementation of step 1 takes O(n log n) time (unsorted points) • But, if points already sorted by y-coordinate can build 1D binary search tree in (n) time (bottom up) pre-sort points by x- and y- coordinates (two lists) build trees bottom up and merge sorted lists • construction of tree Ty takes (n’) time (n’ = number points in Ty) total construction time = O(n log n)
Queries in 2D Range Trees • first determine O(log n) sub-trees to search (those w/ x-coord in range, don’t visit kids of internal node) • search each sub-tree Ty for points in y-coord range. both above steps use 1D search algorithm. so alg identical to 1D Range Query (on x-coords) except replace calls to Report Subtree by 1D Range Query (on y-coords) Lemma: A query w/ axis-parallel rectangle in range tree for n points takes O( log2 n+ k) time, where k = # reported points
2dRangeSearch(x1,x2,y1,y2,root,type) -page 555 if root == null return; M=L=R= 0; if x1 < root.x < x2{ if y1 < root.y <y2 M = root if type = left L = 2dRangeSearch(x1,x2,y1,y2,root.left,left) R = 1dRangeSearch(y1,y2, root.right) else if type = right L = 1dRangeSearch(y1,y2,root.left) R = 2dRangeSearch(x1,x2,y1,y2, root.right,right) else L=2dRangeSearch(x1,x2,y1,y2,root.left, left) R=2dRangeSearch(x1,x2,y1,y2,root.right, right) } else if root.x < x1 then R = 2dRangeSearch(x1,c2,y1,y2,root.right, type) else L = 2dRangeSearch(s1,s2,y1,y2,root.left, type) return L+M+R
Proof • spend O(log n) time searching 1st level tree Tx • for each 1D range query in a second level tree spend O(log n + ) time searching 2nd level tree Ty(v) • total time is (since log = O(log n) where summation is over all visited nodes v. (total # nodes reported) • nodes v visited when searching Ty
Proof (Cont.) total =
Priority Search Trees • These act as balanced binary search trees for the x coordinate and as max heaps for the y coordinates. • In such a tree, the root node stores the item with the largest y value, the median x coordinate of its tree (note that this isn’t the x coordinate of the item), and the left and right subtrees represent those items with x coordinate less than or greater than that median. • See Figure 12.5 (page 556) in which nodes are placed to represent their (x,y) values
Priority Search tree: placement of nodes represents (x,y) coordinates Dotted curves represent the median x coordinate of the subtree
Such a tree is built in O(n log n) time, as we would expect for a balanced tree. • Searching involves a standard range search on x, except that we terminate early when the y value at a node is less than the minimum y we are interested in. • Searches are O(log n + s). • Alternative trees for downward, leftward, or rightward infinite regions are very similar
Priority Range Trees • What if have four sided range queries? • can convert a binary search tree keyed on x coordinates to a priority range tree by associating a priority search tree (3 sided) with each node. • Right children have associated priority search trees which are unbounded to the left. Left children have associated priority search trees which are unbounded to the right. In each case, the parent node serves as the missing bound, so the three sided search is sufficient. • requires O(n log n) space and time for construction. • This structure can answer two-dimensional queries in O(s + log n) time by noticing that a search target in a left child won’t go beyond the values in the right child, so it doesn’t matter that we seem to be searching an infinite range; and vice versa.
Quad and kD Trees • Quadtrees are used to store points in a plane in a way that regional searching becomes easy. Generally, non-rectangular regions are difficult to deal with, so we use the bounding box as a first approximation to the region. • use a 4-ary tree to represent quadrants, sub-quadrants, etc. The quadrants are ordered as in geometry • Since the depth of a quadtree is governed by the closeness of points, it can be very deep. Usually as a practical safeguard we give a maximum depth, D. The book doesn’t explain how to store multiple points that occur in the same maximally refined sub-sub-…quadrant.
The major use of a quadtree is range searching, where the range is now a region of the plane, R. The algorithm recursively examines each sub-quadrant which intersects R, stopping when it arrives at external nodes. When R totally contains a sub-quadrant, we don’t employ any more logic on the sub-quadrant’s tree, but simply enumerate it. • When the quadtree is of bounded depth <= D, then both construction and range searching are O(Dn).
KD-Trees (Higher dimensional generalization of 1D-Range Tree.) idea:first split on x-coord (even levels) next split on y-coord (odd levels) repeatlevels : store ptsinternal nodes : splitting lines (as opposed to values)
Algorithm : BuildKDtree (p, depth) • Input: set of pts P + currrent depthoutput : root of KD-tree storing P • If (|p| = 1) • then return leaf storing p • else if (depth is even) // books method does not change direction • split p into 2 equal sets by vertical line l(p1+p2) • else • split P into equal sized p1+p2 by horizontal line l • Endif • Vleft := Buildkdtree (p1, depth +1) • Vright := Buildkdtree (p2, depth +1) • (new) v s.t lc(v) := Vleft rc(v) :=Vright • Return v Lc(v)=v’s left child Rc(v)=v’s right child
Complexity Construction time • Expensive operation: determining splitting line(median finding) • Can use linear time median finding algorithm (Quickselect) • Then total time is • but can obtain this time without fancy median finding Presort points by x-coord and by y-coord (O(nlogn)) Each time find median in O(1) and partition lists and update x and y ordering by scan in O(n) time
Complexity StorageNumber of leaves = n (one per point) Still binary tree O(n) storage total Querys • each node corresponds to a region in plane • Need only search nodes whose region intersects query region • Report all points in subtrees whose regions contained in query range • When reach leaf, check if point in query region
Algorithm: Search KD-Tree (v, R) • Input: root of a subtree of a KD-tree and a range R Output: All points at leaves below v that lie in the range • If (v = leaf) • then report v’s point if in R • else if (region (lc(v)) fully contained in R) • then ReportSubtree (Rc(v)) • else if (region (lc(v)) intersects R) • then SearchKdTree(lc(v), R) • if (region(rc(v)) fully contained in R) • then ReportSubtree(rc(v)) • else if (region(rc(v)) intersects R) • then SearchKdtree(rc(v), R) • Endif • Note: need to know region(v) • - can precompute and store • Computer during recursive calls, e.g., • L(v) is v’s splitting line and is left halfpland of l(v)
Query time Lemma 5.4A query with an axis parallel rectangle in a Kd-tree storing n points can be performed in O(¯+k) time where kis the number of reported points. Proof. • Total time for reporting points in Report Subtree is O(k). So need to bound number of nodes visited by query algorithm that are not in traversed subtree. • For each such node v, region(v) intersects but is not contained in R. n
Query time • To bound number of such nodes we bound number of regions intersected by vertical line. (gives bound on number of regions intersected by left and right edges) (bound on number of intersected by top and bottom edges of R is similar) • Let l be vertical line and • Let l(root(t)) be root’s splitting line.
Query time • l intersects region to right or left of l(root(t)) but not both. Q(n) is equal to the number of in n point Kd-tree whose root contains vertical splitting line. This is important since if include horizontal nodes then don’t get reduction. • Go down 2 levels before counting • l intersects 2 of 4 regions at this level • Each contains n/4 points Q(1)=O(n) Q(n)=2+2·Q(n/4)=O(¯). × n
Query time Note: Analysis is probably pessimistic… Bounded on the number of regions intersecting an edge of the query rectangle by the number of regions intersecting line through its edge. If range is small, so will be the edge and it won’t intersect this many.
Nearest Neighbor • We can use KD trees to find the nearest neighbor (see page 564)
Orthogonal Segment IntersectionPlane Sweep • We can imagine a set of horizontal line segments and vertical line segments scattered in the plane. We may want to find out which segments intersect each other. In a brute-force method we could search all pairs in O(n2) time. But it would be nice to have an algorithm which is proportional in time to the actual number of intersections, since most often these segments won’t intersect anything. • While we imagine the vertical line sweeping in a continuous fashion, in fact we only need to jump from encountering one object to encountering the next. We can think of these encounters as events which interrupt the continuous sweep for some event processing.
There are three kinds of objects and event processing steps: • Left endpoint of horizontal segment, add the segment to the range-search dictionary • Left endpoint of horizontal segment, remove segment from the range-search dictionary • Vertical segment, search down the dictionary for intersecting horizontal segments. • We first have to sort the objects from left to right, O(n log n)
For each object, we either: • Add something to a range tree – O(log n) • Remove something from a range tree O(log n) • Search a range tree – O(log n + s’) • So overall, this is an O(n log n + s) operation.
Closest Pairs • make sure that no two objects are too close to each other. We may have a lot of points scattered over a plane, and want to find the two points which are closest to one another. For instance, a point might be the locus of a moving part in a machine which shouldn’t touch any other, or parts of a circuit which might “leak” voltage if they are too close. • Problem: Given a set of n points in the plane, find a pair of closest points.Brute force: Compute the distance between every pair. O(n2)How can we improve? Can we determine which pairs are interesting?