140 likes | 465 Views
Proximity Closest pair, divide-and-conquer. Closest pair CLOSEST PAIR INSTANCE: Set S = { p 1 , p 2 , ..., p N } of N points in the plane. QUESTION: Determine the two points of S whose mutual distance is smallest. We’ve seen a proof that CLOSEST PAIR has a lower bound for
E N D
ProximityClosest pair, divide-and-conquer Closest pair CLOSEST PAIR INSTANCE: Set S = {p1, p2, ..., pN} of N points in the plane. QUESTION: Determine the two points of S whose mutual distance is smallest. We’ve seen a proof that CLOSEST PAIR has a lower bound for time (N log N). We seek an algorithm with upper bound O(N log N). If found, these together imply that CLOSEST PAIR (N log N). Two algorithm paradigms come to mind for O(N log N): 1. Sorting 2. Divide-and-conquer To use sorting, a total ordering of the points is needed, but none seem useful. For example, projecting the points of S onto the y axis give a total ordering on y coordinate but destroys useful information: points p1 and p2 are closest in Euclidean distance but farthest in y distance.
ProximityClosest pair, divide-and-conquer Divide-and-conquer, concept Using the divide-and-conquer paradigm, time in O(N log N) can be achieved by: 1. Dividing the problem into two equal-sized subproblems 2. Solving those subproblems recursively 3. Merging the subproblem solutions into an overall solution in linear O(N) time. Unfortunately, it is not immediately obvious how to perform the merge in linear time. Suppose the problem has been solved for subproblem sets S1 and S2, where S1S2 = S, S1S2 = , |S1| |S2| N/2; giving a closest pair of points for S1 and another for S2. How can the closest pair for S be found? It may consist of one point from S1 and one from S2. Testing all possible pairs of points from S1 and S2 requires time in O(N/2) ·O(N/2) O(N2), which is unsatisfactory. S1 S2
S1 S2 p1 p2 p3 q3 q1 q2 m ProximityClosest pair, divide-and-conquer Divide-and-conquer for d = 1, 1 We consider a divide-and-conquer algorithm for CLOSEST PAIR in 1 dimension (d = 1). Partition S, a set of points on a line, into two sets S1 and S2 at some point m such that for every point pS1 and qS2, p < q. Solving CLOSEST PAIR recursively on S1 and S2 separately produces {p1, p2}, the closest pair in S1, and {q1, q2}, the closest pair in S2. Let be the smallest distance found so far: = min(|p2 - p1|, |q2 - q1|) The closest pair in S is either {p1, p2} or {q1, q2} or some {p3, q3} with p3S1 and q3S2.
ProximityClosest pair, divide-and-conquer Divide-and-conquer for d = 1, 2 To check for such a point {p3, q3}, is it necessary to test every possible pair of points in S1 and S2? Note that if {p3, q3} is to be closer than (i.e., |q3 - p3| < ), then both p3 and q3 must be within of m. How many points of S1 can lie within of m, i.e., within the interval (m - , m]? Because is the distance between the closest pair in either S1 or S2, a semi-closed interval of length can contain at most 1 point. For the same reason, there can be at most 1 point of S2 within of m, i.e., in the interval [m, m + ). So, the number of distance computations needed to check for a closest pair {p3, q3} with p3S1 and q3S2 is 1, not O(N2). Finding any points in the intervals (m - , m] and [m, m + ) to do the computation can take at most O(N) time, giving a linear merge step. Thus a divide-and-conquer algorithm can solve 1-dimensional CLOSEST PAIR in O(N log N) time. S1 S2 p1 p2 p3 q3 q1 q2 m
ProximityClosest pair, divide-and-conquer Divide-and-conquer for d = 1, 3 procedure CPAIR1(S) Input: X[1:N], N points of S in one dimension. Output: , the distance between the two closest points. 1 begin 2 if (|S| = 2) then 3 = |X[2] - X[1]| 4 else if (|S| = 1) then 5 = 6 else 7 begin 8 Construct(S1, S2) /* S1 = {p: p m}, S2 = {p: p > m} */ 9 1 = CPAIR1(S1) 10 2 = CPAIR1(S2) 11 p = max(S1) 12 q = min(S2) 13 = min(1, 2, q - p) 14 end 15 endif 16 return 17 end Caveat Note that p (= p3) and q (= q3) are found not by looking within intervals (m - , m] and [m, m + ), but by identifying the two points closest to m. This works for d = 1 but not for d 2. For that reason the idea of p3 and q3 within of m was introduced.
ProximityClosest pair, divide-and-conquer Generalizing to d = 2 Partition two dimensional set S into subsets S1 and S2 such that every point of S1 lies to the left of every point of S2. To do so, cut S by vertical line l at the median x coordinate of S. Solve the problem recursively on S1 and S2. Let {p1, p2} be the closest pair in S1 and {q1, q2} in S2. 1 = distance(p1,p2) and 2 = distance(q1,q2) are the minimum separations in S1 and S2 respectively. = min(1, 2) S1 l S2 p1 1 p2 q1 2 q2
ProximityClosest pair, divide-and-conquer Where is the closest pair? To perform the merge it must be determined if there is a pair of points {p, q} such that pS1 and qS2 and distance(p, q) < . If such a pair exists, then p and q must both be within of l. Let P1 and P2 be vertical strips (regions) of the plane of width on either side of l. If {p, q} exists, p must be within P1 and q within P2. For d = 1, there was at most one candidate point for p and one for q. For d = 2, every point in S1 and S2 may be a candidate, as long as each is within of l. This seems to imply that O(N/2) O(N/2) O(N2) distance computations may be needed; unsatisfactory. P1 l P2 p1 q1 S2 S1 1 p2 2 q2
ProximityClosest pair, divide-and-conquer Closing in on the closest pair But is it really necessary for some pS1 and within P1, to determine the distance to every point qS2 and within P2? No. We really only need to do so for those points that are within of p. Thus we can bound the portion of P2 to consider by that distance. The points to consider for a point p must lie within the 2 rectangle R. P1 l P2 S2 S1 p R
ProximityClosest pair, divide-and-conquer Distance computations required How many points can there be in rectangle R? Recall that no two points in P2 are closer than . Because rectangle R has size 2, there can be at most 6 points within it; any more and they would be closer than , a contradiction. This means that for each of the O(N/2) points p S1 within P1, only 6 points must be checked, not O(N/2) for each, so 6 O(N/2) = O(3N) O(N) distance comparisons are needed in the merge step, not O(N2). An O(N log N) algorithm for CLOSEST PAIR is possible. P1 l P2 S2 S1 p R
ProximityClosest pair, divide-and-conquer Which points much be checked? Though an O(N log N) algorithm is possible, we do not have it yet. Though we know that for a point p only 6 points within P2 must be checked, we do not yet know which six. Project p and all the points of S2 within P2 onto l. Only the points within of p in that projection (y coordinate) need be considered; as seen, there will be 6. By sorting the points of S in order on y coordinate, the nearest neighbor candidates for a point p can be found by scanning the sorted list. P1 l P2 S2 S1 p R
ProximityClosest pair, divide-and-conquer Closest pair algorithm Algorithm from Preparata, p. 198: 1. Partition S into two subsets, S1 and S2, about the vertical median line l. 2. Find the closest pair separations 1 and 2 recursively. 3. = min(1, 2) 4. Let P1 be the set of points of S1 that are within of the dividing line l and let P2 be the corresponding subset of S2. Project P1 and P2 onto l and sort by y coordinate. Let P1* and P2* be the two sorted sequences, respectively. 5. The “merge” may be carried out by scanning P1* and for each point in P1* inspecting the points of P2* within distance . While a pointer advances on P1*, the pointer of P2* may oscillate within an interval of width 2. Let 1 be the smallest distance of any pair thus examined. 6. S = min(,1) Step 4 appears to put an O(N log N) sort into each merge. It seems to have been written that way for clarity only. To achieve an O(N) merge, all the points of S are sorted at the start of the algorithm (“presorting”). The sorted sequences P1* and P2* are obtained in O(N) time from the presorted list.
ProximityClosest pair, divide-and-conquer Merge process algorithm Preparata p. 199 says: “… when executing Step 4, extract the points from the list in sorted order in only O(N) time.” One way to assemble P1*, for example, follows. Let Y be the presorted points of S, i.e., an array of size N storing the points of S in order by ascending y coordinate; each entry has three fields: x, y, and active. Let P1* be an array of size N with two fields: index and y. 1 for i = 1 to N /* Initialize P1*. */ 2 P1*[i].index = 0 3 P1*[i].y = 0 4 endfor 5 for every point piS 6 Y[i].active = FALSE 7 endfor 8 for every point piS1 /* Mark the points of P1* in Y. */ 9 Y[i].active = TRUE 10 endfor 11 j = 1 12 for i = 1 to N /* Retrieve the points in sorted order. */ 13 if Y[i].active = TRUE 14 P1*[j].index = i /* Points to Y for coordinate retrieval. */ 15 j = j + 1 16 endif 17 endfor
ProximityClosest pair, divide-and-conquer Analysis Let T(N) be the running time of the algorithm on N points. The steps of CPAIR2 require: 1. O(N) 2. 2T(N/2) 3. O(1) 4. O(N) 5. O(N) 6. O(1) Thus, T(N) = 2T(N/2) + O(N) O(N log N). The presort of S on y coordinate also requires O(N log N) time. Overall time for the CPAIR2 algorithm to solve the CLOSEST PAIR problem for d = 2 is O(N log N). Because CLOSEST PAIR problem for d = 2 has a lower bound in (N log N) for any algorithm, this complexity is optimal, i.e., (N log N).
ProximityClosest pair, divide-and-conquer Closest pair for d > 2 Preparata p. 199-204 discusses solving CLOSEST PAIR using a divide-and-conquer algorithm for d > 3. The approach depends on the notion of sparsity, a measure of how many points can be in a d-dimensional hypercube with sides that measure 2. We made use of sparsity in the d = 1 and d = 2 case. The result is that the closest pair of points in a set of N points in Edcan be found in (N log N) time.