900 likes | 1.12k Views
Random Partition via Shifting. Tomer Margalit , 21/5/2012. Table of Contents. Introduction and fundamental properties of randomly shifted grids. Application – minimal disk covering of points. We improve the running time of the trivial approach using randomly shifted grids.
E N D
Random Partition via Shifting TomerMargalit, 21/5/2012
Table of Contents • Introduction and fundamental properties of randomly shifted grids. • Application – minimal disk covering of points. We improve the running time of the trivial approach using randomly shifted grids. • Shifting Quadtrees – we will see how using random shifts when constructing Quadtrees can be beneficial. • Approximation of an n-point Metric Space – We will show how we approximate such a metric using a new data structure called an HST.
Background • In this lecture we will introduce randomness to partitions of a domain. • We will do this by partitioning starting from a random shift. • For instance, suppose that we have as the domain that we would like to partition. • Then we randomly choose a uniformly distributed vector in , to use to shift to . • For instance, we can shift a quadtree over a region of interest. • This yields some simple clustering and nearest neighbor algorithms.
Partition via Shifting 1 • We initially focus on partition shifting of the real line. • First, we define the partition function: • This function partitions the real line according to its preimage – i.e. for any integer I, the group is a unique cell.
Partition via Shifting 2 • Remark – The value of the function does not affect the partition, only the partitioned sets do. That is, the partition is determined according to the equivalence classes implied by h. • Simply put, if we have two functions h and h’ that agree on two points everywhere, then they are treated as the same partition. • Equivalently, if we have b and b’ such that for some integer k, then the partitions are identical, even though the functions may differ. • In particular, we can pick shifts uniformly from , without losing any partitions. • We could also pick them from any , as long as .
Good Probability for a Nice Distribution • Lemma – For any , we have that • In other words, if two points are at some distance, and we use a coarse partition, then there is a very low chance we will classify them together. • For example, what is the chance that for we choose a b such that x and y have different ids?
Proof • Claim: For any , we have that . • The claim is obviously true if . Indeed, then x and y must be in separate cells because we partition to stripes of width . • If not, assume that x < y and pick some shift b uniformly from . • As said before, we still get all partitions this way, so we are in the same probability space. • But now, if and only if .
Generalized Partition Shift • Now we generalize the notion of shifting partitions to multidimensional spaces. • We have a group of points . • Consider a shift randomly chosen from . • We partition the points according to a grid that has its origin in b, and with a side length of . • We define the id of a point to be
Good Probability for a Nice Distribution in Multidimensional spaces • Given a ball B of radius r in , the probability that B is not contained in a single cell of a randomly shifted grid, , is bounded by . • Proof: • Project B into the ith coordinate. • Now we have an interval of length 2r in the shifted grid . • Now we know that B is contained in a single cell if and only if all s are contained in a single cell of the one dimensional grid.
Proof, Continued • Now, denote by denote the event in which is not contained in a single cell. • Then obviously, suffice that even one of these events occurs for the ball to not be contained in a single cell. • In other words, if we denote by the event that the ball is not contained in a single cell, then we have that: • If this is larger than one, than we get a probability of one, which proves our claim.
Remarks about Randomly Shifted Grids • We have shown that if we choose grids randomly, we get a good chance of separating two points if we have a coarse enough partition. • Obviously, we will use this property to give randomized algorithms, and algorithms that have a high probability to succeed.
Application – Covering by Disks • Now we show an application for these simple facts about random partitions. • Given a set of n points in the plane, we would like to cover them by the minimal amount of unit disks. • Apparently, the randomly shifted grids can improve the trivial algorithm by quite a bit. • First we show the trivial algorithm, and then the improvement gained by using randomly shifted grids.
Trivial Cover • We show the trivial algorithm first. We will also use it later. • We can cover trivially – that is by traversing all covers – by using . • Before we can show that, we must establish some fundamental insights about equivalence of disk covers:
From Intuition to Proof • The intuition shows that two covers are equivalent if all the disks in both covers cover the same set of points. • This intuition will direct us in providing the trivial algorithm. • Instead of considering any cover, we consider only equivalence classesof covers. • Very trivially, we can consider every partition of the points into (not necessarily disjoint) groups, and check if that cover exists for them. • However, this approach gives us a lot of invalid covers. • We can use some simple observations to cut that number.
I. Every pair of input points (at distance under 2) defines two possible disks
II. Given a cover, we can move every disk to have at most 2 points on the edge
Proof • The first insight is obvious. • The second insight is not that obvious though. • To get the second claim, given a disk (that is part of a cover), we can move the disk downward until it “hits” a point. • And then rotate it (around the point) until it “hits” another one.
Trivial Cover – Feasibility • In case the disk contains at most one point, we get one point on the edge. • Now, using these facts, it is obvious that every arbitrary disk cover is equivalent to a cover of disks that have either 1 or 2 points on their edge. • Given this insight, we can just traverse all possible covers composed of these disks to find the minimal one. • Ok, so how many disks are there?
Trivial Cover – Given the Minimal Cover Size • If we knew the minimal cover has m disks, how much time would it take to find the cover? • We have covers, and checking if a cover is valid takes O(nm) time (check each point against each disk). • So in total, it takes time to find the cover.
Trivial Cover – Full Algorithm • To complete the algorithm, given a set of points, we exhaustively check for each k between 1 and n, whether we have a cover with k disks. • According to the insights we saw, the first cover we find must also be a minimal cover. • Now, the time it takes to find the cover is
Trivial Cover – Final Statement • In conclusion, we get the following claim: • Claim: Given a set P of n points in the plane one can compute, in a minimal cover – where k is the size of the minimal cover. • The problem with this approach is that k may be large (say n/4).
Minimal Disk Cover - Improvement • Now, we compromise some accuracy – we will try to calculate an approximation of the minimal disk cover. • To do that, we will use the randomly shifted grids. • As mentioned before, this will be a randomized algorithm, that will give a good approximation for the minimal cover size on average. • Specifically, like algorithms we have seen in the past, we choose some small factor (denote it by ), and then both our running time and the expectation of the cover size depend on it.
Minimal Disk Cover – Randomized Algorithm • Claim: Given a set P of n points in the plane and a parameter , one can compute using a randomized algorithm, in time, a cover of P by X unit disks, where , where opt is the minimum number of unit disks required to cover P.
Algorithm • First, let . • Now, for a shift , consider the grid . • Compute all the grid cells that contain points of P. • That is done by computing the id of every point, and then storing it in a hash table. • This clearly can be done in linear time. • Now, for each grid cell c, we denote by the points of P in the cell. • Observe that any cell of can be covered by unit disks.
Algorithm 2 • Note that the bound of M is very loose. • We can do much better – instead of using this cover, we use the trivial algorithm to compute the minimal cover for each square. • Since the minimum is at most M, we can compute it in , according to what we showed about the trivial algorithm. • As for the running time, there are at most n non-empty grid cells, which means it will take us . Now since , we get that the time is . • Note that and that .
Good Expectation of the Cover Size • Now that we have reached our goal time, we turn to the expectation. • Reminder – We denoted X to be the size of the returned cover, and we claimed that – a property we will prove. • Plan: Given an optimal cover, we will show another cover that is considered by the algorithm, and that the expectation of its size (with regards to all the possible shifts) is what we want.
Proof 1 • Consider the optimal solution . • We generate a solution from that is checked by the algorithm. • For each grid cell, c, denote by the set of disks of the optimal solution that intersect c, and denote each disk with c. • Now, define . • Remember, this set may contain the same disk twice – for instance in cases where the same disk covers two cells.