530 likes | 686 Views
UMass Lowell Computer Science 91.504 Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2001. Lecture 8 Approximate Nearest Neighbor Searching Derandomization for Efficient Geometric Partitioning Monday, 4/30/01. Part 2 . Advanced Topics Applications Manufacturing
E N D
UMass Lowell Computer Science 91.504Advanced AlgorithmsComputational GeometryProf. Karen DanielsSpring, 2001 Lecture 8 Approximate Nearest Neighbor Searching Derandomization for Efficient Geometric Partitioning Monday, 4/30/01
Part 2 Advanced Topics Applications Manufacturing Modeling/Graphics Wireless Networks Visualization Techniques (de)Randomization Approximation Robustness Representations Epsilon-net Decomposition tree Part 2
Approximate Nearest Neighbor Searching “An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions” Arya, Mount, Netanyahu, Silverman, Wu
Goals • Fast nearest neighbor query in d-dimensional set of n points: • approximate nearest neighbor • distance within factor of (1+e) of true closest neighbor • preprocess using O(dnlogn) time, O(dn) space • Balanced-Box Decomposition (BBD) tree • note that space, time are indepenent of e • query in O(cd,elogn) time C++ code for simplified version is at http://www.cs.umd.edu/~mount/ANN
Approach: Distance Assumptions • Use Lp (also called Minkowski) metric • assume it can be computed in O(d) time • pth root need not be computed when comparing distances • Approximate nearest neighbor • distance within factor of (1+e) of true closest neighbor p* • Can change e or metric without rebuilding data structure
Approach: Overview • Preprocess points to create: • Balanced-Box Decomposition (BBD) tree • Query algorithm: for query point q • Locate leaf cell containing q in O(log n) time • Priority search: Enumerate leaf cells in increasing distance order from q • For each leaf cell, calculate distance from q to cell’s point • Keep track of closest point p seen so far • Stop when distance from q to leaf > dist(q,p)/(1+e) • Return p as approximate nearest neighbor to q.
x4 x3 x2 y3 y2 y1 x1 >= < y3 y2 y2 y1 x1 x3 x2 x1 x4 3 4 6 9 7 5 1 2 8 Balanced Box Decomposition(BBD) Tree • Similar to kd-tree [Samet handout] • Binary tree • Tree structure stored in main memory • Cutting planes orthogonal to axes • “Alternating” dimensions • O(log n) height • Subdivides space into regions of O(d) complexity using d-dimensional rectangles • Can be built in O(dn log n) time One possible kd-like tree for the above points (not a BBD tree, though)
Balanced Box Decomposition(BBD) Tree (continued) subdivision • Distinguishing features of BBD tree: • Cell is either • d-dimensional rectangle or • difference of 2 d-dimensional nested rectangles • In this sense, BBD tree is like: • Optimized kd-tree: partition points into roughly = sized sets [inner box shrink] • While descending in tree, number of points on path decreases exponentially • Specialized Quadtree: aspect ratio of box is bounded by a constant [hyperplane split] • While descending in tree, size of region on path decreases exponentially • Leaf may be associated with more than 1 point in/on cell: O(n) node • Inner boxes are “sticky”: if it is close to edge, it “sticks” tree split shrink
Midpoint Algorithm for Splitting/ Shrinking single-stage simplified shrink • Split box b using hyperplane through center of b and orthogonal to ith coordinate axis (longest dimension) • Bounds aspect ratio what’s wrong with this approach? • Centroid shrink: produce O(1) subcells, each with <= 2nc/3 points [nc=# pts in current cell] • 3-stage: shrink, split, shrink 3-stage shrink, split, shrink
Middle-Interval Algorithm for Splitting/ Shrinking • Flexibility for splitting plane choice • Choose plane from a central strip of current outer box
Packing Constraint • Each subdivision cell satisfies this packing constraint: • Proof has 2 cases: • Overlapping boxes • Disjoint boxes: • Box of side 2r encloses ball of radius r • Aspect ratio 3:1 implies smallest side length >= s/3 • Densest packing given by regular grid of boxes of side length s/3 • Interval of length 2r can intersect no more than intervals • Account for all dimensions by raising to power d Given a BBD-tree for a set of data points in Rd, the number of leaf cells of size at least s>0 intersecting a (Minkowski Lm) open ball of radius r>0 is at most
Priority Search from Query Point • Visit boxes in increasing order of distance from q • Similar to kd-tree priority search • Maintain priority queue of tree nodes • Node priority inversely related to dist(q,cell) • Search repeats: • Extract highest priority node • Descend subtree • visit leaf closest to q • add siblings to queue At start, root + v1, v2 , v3 , v4 are in priority queue node closest to query point
xR xR xL xL xM yT yT (xR - x3 , yT - y3 ) (xR - x3 , yT - y3 ) (xL + x1 , yT - y1 ) (xM - x’1 , yT - y1 ) (xL + x2 , yB + y2 ) (xL + x2 , yB + y2 ) (xR - x4 , yB + y4 ) (xM + x’4 , yB + y4 ) yB yB Incremental, Relative Distance [Arya, Mount93] • Maintain sum of appropriate powers of coordinate differences between query point and nearest point of outer box • Incrementally update distance from parent box to each child when split is performed: • Closer child has same distance as parent • Further child’s distance needs only 1-coordinate update (along splitting dimension) • Can make a difference in higher dimensions! L1 distance
Experiments Experiments generated points from a variety of probability distributions: Uniform Gaussian Laplace Correlated Gaussian Correlated Laplacian Clustered Gaussian Clustered Segments
Conclusions • Algorithm is not necessarily practical for large dimensions • But, for dimensions <= ~20, does well • Shrinking helps with highly clustered datasets, but was not often needed in their experiments • Only needed for 5-20% of tree nodes • BBD tree (in paper’s form) is primarily for static point set • But, auxiliary data structure could maintain changes
Derandomization for Efficient Geometric Partitioning “Bounded-Independence Derandomization of Geometric Partitioning with Applications to Parallel Fixed-Dimensional Linear Programming” Goodrich, Ramos
Overview • Paper concerns geometric partitioning: • Given: • a collection X of n hyperplanes in Rd • a parameter r • Partitioning Goal: • partition Rd into O(rd ) constant-sized cells • so that each cell intersects few hyperplanes • Previous Work: • Random sampling -> partition in which each cell intersects at most en hyperplanes , where e=logr/r • Derandomization can be used for deterministic construction • Current Work: • Assume set is a special space with a special property • For such a set, construct (efficiently, deterministically, and in parallel) a (small-sized) approximation for the space • Apply to efficiently & deterministically solve parallel fixed-dimensional linear programming For other Goodrich papers, see http://www.cs.jhu.edu/~goodrich/cgc/pubs/
Background: Derandomization • Common approach for randomized geometric algorithms: • use small-sized random samples • Derandomize: • quantify combinatorial properties of the random samples • show that sets with these properties can be constructed efficiently without randomization • Combinatorial properties often characterized by what the next long series of slides is about….
Background: Configuration • Given an abstract set (universe) N of geometric objects • A configurations over N is a pair (D,L) = (D(s),L(s)), where D, L are disjoint subsets of N • Objects in D are: • triggers associated with s • objects that define s • d(s) = cardinality of D(s) = degree • Objects in L are: • stoppers associated with s • objects that conflict with s • l(s) = cardinality of L(s) = level = (absolute) conflict size Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
h1 h1 h2 h2 h3 h3 s h4 h4 H(R) N segments in R segments in N \ R Background: Configuration Example • N = {h1, h2 , h3 , h4 }= set of line segments in the plane • s is feasible if s occurs in trapezoidal decomposition H(R) for some subset R of N • trapezoids arising in incremental computation of H(N) • Here, R = {h3 , h4 } Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Example • For a feasible trapezoid s define its: • trigger set D(s) = segments of N adjacent to boundary of s • conflict set L(s) = segments of N \ D(s) intersecting s Configuration (D(s), L(s)) where D(s)={h3, h4} and L(s)={h1, h2} h1 h2 h3 s h4 H(R) segments in N \ R segments in R Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Space • A configuration spaceP(N)over N is a (multi)set of configurations with the • Bounded Degree Property: • The degree of each configuration in P(N)is bounded (by a constant -- something independent of N) Note: The term configuration space is also used in motion planning. In that context, is refers to the motion planning search space. Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Example • Associate with each feasible s a configuration (D(s), L(s)) • If N in general position, d(s) = cardinality of D(s) <= 4 • since s is a trapezoid • Due to bounded degree d(s) , result P(N)is a configuration space of all feasible trapezoids over N configurations for feasible trapezoids s1 and s2 D(s2)={h3, h4} D(s1)={h3, h4} h1 h2 L(s2)={0} L(s1)={h1, h2} h3 s1 s2 h4 H(R) segments in N \ R segments in R Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
h3 s1 s2 h4 2 feasible trapezoids for N = {h3 , h4 } Background: Configuration Example • If we restrict N to be {h3 , h4 }, then • s1 , s2 are 2 feasible trapezoids • D(s1)= D(s2)={h3, h4} • L(s1)= L(s2)={0} • 2 “distinct” configurations: • (D(s1), L(s1)) = (D(s2), L(s2)) • Size of P(N) includes such “duplicate” configurations • Reduced Size of P(N) excludes “duplicates” Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
h1 h2 h3 s h5 h4 Background: Configuration Example • Note that not every arrangement of line segments (before overlaying a trapezoidal decomposition on it) has the bounded degree property. • In general, it can have d(s) = O(N) • Can you think of another type of decomposition that has bounded degree? Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
h3 s1 s2 h4 2 feasible trapezoids for N = {h3 , h4 } Background: Configuration Example • Definition: Pi(N) is set of configurations in P(N) with level i • [recall level is size of L(s), the conflict set] • P0(N) is active over N. • Example: P0(N) = {(D(s1), L(s1)), (D(s2), L(s2))} Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
h3 s1 s2 h4 Background: Configuration Example • Definition: A configuration space P(N)has bounded valence if the number of configurations in P(N)sharing the same trigger set is bounded (by a constant). • Example: For P(N)= our configuration space of all feasible trapezoids over N has bounded valence • all feasible trapezoids with same trigger set can be identified with trapezoids in trapezoidal decomposition formed by that trigger set • size of that trigger set is bounded by a constant, so number of such trapezoids is also bounded by a constant Trapezoidal decomposition induced by trigger set = {h3, h4} Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Configuration Example • Theorem: • Let: • P(N)be a configuration space of bounded valence • n=size of N • d = maximum degree of a configuration in P(N) • R = a random sample of N of size r • Then: • For each active configuration s in P0(R) • with probability > 1/2 • the conflict size of s relative to N is <= c(n/r) log r for large enough c • Expected reduced size: E[reduced size of P(R)] is in O(rd) • Example: For any random sample R of N of size r • each trapezoid in the trapezoidal decomposition H(R) has O([n/r] log r) conflict size with high probability • size of P(R) is in O(rd) • for bounded P(N)size and reduced size only differ by constant factor Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: Range Space • Definition: • Let: P(N)be a configuration space • n=size of N • p’(r) be maximum reduced size function of P(N)for r <= n • P(N)has bounded dimension if there is a constant d such that p’(r) is in O(rd) for all r <= n • In this case, d is the dimension of P(N) • Bounded valence -> bounded dimension • Some important types of configuration spaces don’t have bounded valence but have bounded dimension • Range space: configuration space for which trigger set of every configuration is empty. In this case, a configuration is a range. • Half-space Range: Range = points in halfspace. P(N) = set of distinct ranges induced by (upper) halfspaces. Dualize -> line arrangement [it has bounded dimension] Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Background: e-net of a Range Space • Theorem: • If: • P(N)is a configuration space of bounded dimension d • e >= 0 • R is a random subset of N • formed via r independent draws from N with replacement • r >= 8/e • then: • conflict size of each configuration s in P0(R)in (relative to N) of every range in P0(R) <= e n • with probability at least 1- 2p’(2r) 2-er/2 • For a range space P(N) • R is an e-net of the range space P(N) • for large enough r, a random sample of size r is an e-net with high probability Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
Example: h5 h7 h6 h3 h1 h2 h4 h8 p6 p3 p5 p4 p1 p7 p2 N=set of 1D points P(N)=space of ranges induced by rightwards half-spaces Background: VC-Dimension of a Range Space • Use to bound dimension when direct argument fails • Let P(N)be a range space • A subset M of N is shattered by N if every subset of M occurs as a range in P(M) • Reduced size of P(M) is 2m • VC-Dimension of P(N)is maximum size of a shattered subset of N. What is the VC-Dimension of P(N)? Source: “Computational Geometry: An Introduction Through Randomized Algorithms” by Ketan Mulmuley
What does all this have to do with the paper???? • Paper concerns geometric partitioning: • Given: • a collection X of n hyperplanes in Rd • a parameter r • Goal: • partition Rd into O(rd ) constant-sized cells • so that each cell intersects few hyperplanes • Previous Work: • Random sampling -> partition in which each cell intersects at most en hyperplanes , where e=logr/r • Derandomization can be used for deterministic construction • Current Work: • Assume set is a range space with bounded VC-exponent • VC-exponent is more general concept than VC-dimension • For such a set, construct (efficiently, deterministicaly, and in parallel) a (small-sized) approximation for the range space that is a variation on the e-net concept.
Additional Handouts • Parallel programming • PRAM CREW, EREW models • Parallel geometric algorithms
Project Deliverable Due DateGrade % Proposal Monday, 4/9 2% Interim Report Monday, 4/23 5% Final Presentation Monday, 5/7 8% Final Submission Monday, 5/14 10% 25% of course grade
Guidelines: Presentation • 1/2 hour class presentation • Explain to the class what you did • Structure it any way you like! • Some ideas: • slides (electronic or transparency) • demo • handouts
Guidelines: Final Submission • Abstract: Concise overview (at most 1 page) • Introduction: • Motivation: Why did you choose this project? • Related Work: Context with respect to CG literature • Summary of Results • Main Body of Paper: (one or more sections) • Conclusion: • Summary: What did you accomplish? • Future Work: What would you do if you had more time? • References: Bibliography (papers, books that you used) Well- written final submissions with research content may be eligible for publishing as UMass Lowell CS technical reports.
Guidelines: Final Submission • Main Body of Paper: • If your project involves Theory/ Algorithm: • Informal algorithm description (& example) • Pseudocode • Analysis: • Correctness • Solutions generated by algorithm are correct • account for degenerate/boundary/special cases • If a correct solution exists, algorithm finds it • Control structures (loops, recursions,...) terminate correctly • Asymptotic Running Time and/or Space Usage
Guidelines: Final Submission • Main Body of Paper: • If your project involves Implementation: • Informal description • Resources & Environment: • what language did you code in? • what existing code did you use? (software libraries, etc.) • what equipment did you use? (machine, OS, compiler) • Assumptions • parameter values • Test cases • tables, figures • representative examples
Final Exam: Date, Format • Format: • in class • open book, notes • similar to midterm: • 50% calculate/ manipulate • 50% design, analyze • Date Choices • Friday, 18 May at • 1:00-4:00 pm or • 5:30-8:30 pm • Wednesday, 23 May at • 9:00-12:00 am or • 1:00-4:00 pm or • 5:30-8:30 pm or
Final Exam: Part I Material • O’Rourke CH 1-8: emphasis on chapters omitted from midterm (CH 7-8) • Some key themes • Common geometric/combinatorial structures: • Decomposition/Partition: • Triangulation • Trapezoidalization • Delaunay Triangulation • Voronoi Diagram • Arrangment (level, zone) • Enclosure: • Convex Hull • Nested Polytope Hierarchy • Visibility Polygon & Kernel of Star Polygon
Final Exam: Part I Material • Some key themes (continued) • Algorithmic Paradigms • Sweep: sort, then sweep a line, parabolic front • Divide-and-Conquer • Incremental • Randomized • Output-Sensitive • Preprocessing for fast queries • Representations: • Quad-edge • O’Rourke • Geometric Primitives
Final Exam: Part I Material • Some key themes (continued) • Math: • Convexity • Monotonicity • Distance Metrics • Visibility/ Star-shapedness • Euler’s Formula • Duality • Graphs • Point <-> Line • Parabolic • Minkowski Sum • Randomness • Graph Theory: Independent Set
Final Exam: Part II Material • Part II • Translational Polygon Containment • Connected Dominating Sets for Wireless Networks • Mesh Generation using Delaunay Triangulation • Approximate Nearest Neighbor Searching • Derandomization for Efficient Geometric Partitioning