520 likes | 591 Views
What I learned about cut, expansion and density problems. Guy Kortsarz. The character flaw of Michael Myers, Freddy Kruger and Jason Voorhees: Cut problems. Input: an edge weighted graph G(V,E ) and a collection of pairs { S i , t i }
E N D
What I learned about cut, expansion and density problems Guy Kortsarz
The character flaw of Michael Myers, Freddy Kruger and Jason Voorhees: Cut problems. Input: an edge weighted graph G(V,E) and a collection of pairs {Si,ti} Output: a minimum cost collection of edges whose removal disconnects all pairs We look at it as a fractional relaxation that says: Put lengths on edges so that the distance between Si andtiis at least 1 for every i.
The “good cut point” lemma 0.5 0.1 0.2 0.2 0.3 0.2 0.1 0.2 0.3 t 0.1 s 0.1 0.05 0.4 0.1 0.07 0.01 0.1 There is a radios r0.5so that if you cut at r the cut cots value is only O(log n) factor larger than the LP value inside the sphere.
Remarks • The fact that r<0.5 is important. • Means that no s,tpair can both be in the sphere. • The total cost is roughly O(log n) times the LP value. • Needs to give some non zero value for radios 0 for this to work.
Vertex separators in undirected graphs • Say that we have a collection of {s,t} pairs so that the distance between every pair is at least d. • Say that you want to delete vertices and cut all such pairs. • A fractional solution of n/d is giving every vertex value 1/d.
Small vertex separator • There is a collection of at most O(log n)n/d vertices who separate all the pairs of distance at least d. Not that far from the fractional solution. • This is not written anywhere (but not hard to prove). • Credit: the upper bound due to GargVazirani and Yanakakis
Separating pairs of distance al least d: directed graphs • How many edges do you need to remove if we have a collection {s,t} of pairs so that dist(s,t)≥d? • We worked on this problem without knowing that its closely related to the Maximum Number of Disjoint Paths in Directed graphs. • Thus some results were known • Chekuri and KhannaO(n4/d4) edges. • Hajiaghayi, Leighton O(n3/d3) edges
What we proved • We proved that there exists a separator with O(n2/d2) edges (K, Nutov) • Good achievement . • Then I found out: The above easily implies an O(n2/3) approximation to the “Maximum number of disjoint paths problem on directed graphs” • Thus I started to suspect this lemma was proved before.
It was done 3 months before in a SODA paper • The authors Varadarajan andVenkataraman. Hence the O(n2/3) was known. • The algorithm: among the non joined pairs that are reachable take the shortest path. • Still we proved a ratio O(n2/3/opt1/3) ratio for unit capacities. • Also O(opt) is known (see later Anupam Gupta)
Clearlysqrt{n} ratio isonlyifopt=sqrt{n} • Best ratio known by AmitAgarwal, NogaAlonand Moses Charikar. O(n11/n23) ratio. • The paper seems very complex to me • For the unit capacity case since we have sqrt{n} only for opt=sqrt{n} maybe there is a simpler algorithm that returns o(n) edges? • Hardness of approximation: Labelcoverhard. Chuzhoy and Khanna. • Both the breaking of the sqrt{n} and the lower bound are huge achievments.
The elegant and simple paper by AnupamGipta • The ratio is only O(sqrt{n}) but very elegant. • Thepaper of Agarwal et al used ideas from the paper of Gupta. • Also from my paper with Nutov. Reducing reachability among pairs. • And now to another less known but very simple cut lemma. • In this case we may charge all edges.
The ideas of Gupta • Solve the LP. Add all edges with xe ≥1/sqrt{n} to the solution. • Consider a non separated pair sjtj • And now we show that there is a cut with 1/3 ≤ r≤ 2/3 distance from sjof value at most the fractional opt. • For this, break the distances of the LP to multiplication of a small . • We do get x eceif we go from 1/3 to 2/3 and multiply the costs by
The I’thfarctional value adds the cost crossing i to (i+1) times • 1/3≤i≤2/3 Cut(i)≤optf • It can not be that for every i: Cut(i)>3 optf • This means that there is a cut whose value is at most 3optf • Note that here every(relevant) edge for s jtjis charged.
Loss of reachability Let (u,v) be an edge on a path from sj to tj. Note that there is a path of length 1/3 in the LP distances that u was able to reach and now it does not. To make length 1/3 you need (sqrt{n}) vertices. Thus (u,v) is charged at most O(sqrt{n}) times hence sqrt{n} ratio. In addition as each time charged 1/3 and disjoint 1/3, so O(opt f) ratio follows.
A very interesting alternative objective function After the the cut is chosen give length 1 cut edges and length 0 to non cut edges Minimize the max distance between sjtj. Called the Checkpoint Problem. A paper by M. Hajiaghayiand R. Khandekar and K. and J. Mestre. On trees a tight 2 ratio. If all path go up, a gem combinatorial optimal solution polynomial. Only polynomial ratios for the case of general G.
Conductance • Conductance: e(S,V-S)/deg(S) (a.k.a sparsest cut) • Approximation Algorithms • O(log n) [Bourgain], [Leighton, Rao],[Linial,London,Rabinovich] • Uses (quite elementary) embedding of a metric into L1 O(log n)stretch factor.
The best known ratio By Arora, Rao and Vazirani. Approximation ratio Uses the so called negative type metric. Uses what the authors call expander flows We now focus on the small set expansion conjecture.
The small set expansion conjecture Let ≤0.5 be a constant and let be an arbitrary small constant. Let (S)=e(S,V-S)/deg(S) be the expansion factor of S. Let X be the collection of all subsets of V with size n Consider ()=MinSX(S)
The conjecture It is hard to distinguish between the following two cases: (a)()≤ and (b) ()≥1- This is a weaker assumption than the Unique Game conjecture by Khot. If we prove the SSEC we prove the UGC
A closely related problem There is an alternative way to disprove the conjecture (due to Gandhi and K). By describing a very simple problem that has ratio 2 even in the weighted case. And we will show that under the SSEC this is the best ratio.Or a 2- ratio for the problem disproves the SSEC.
The problem Given: a graph G(V,E)and a number k. Required: Find a set Uof k vertices with minimum number of touching edges A touching edge: at least one endpoint in U Remark: our main results were for the weighted case. We improved a result by Shmoys et al and a different one by Hochbaum et al.
A trivial ratio of 2 • Let OPT, |OPT|=k, be the best solution • Let U be the k least degrees vertices, thus deg(OPT)≥ deg(U) • Clearly t(OPT)≥deg(OPT)/2 • : • t(U)≤deg(U)≤deg(OPT) • ≤2t(OPT)
The weighted case Vertices have weights. Minimum edges under cost at least k: Find a set U of cost at least k and minimize the number of edges touching U. We explain the ratio 2 for a related question: maximum touching edges m’ and maximize the vertex cost.
Some ideas of how to give ratio 2 for Maximum cost at mostM edges We use Dynamic Programming. We guess the number P of edges between the optimum set OPT and V-OPT. We guess the sum of degree of OPT whom may be 2M. A serious technical problem: we are only able to compute A[n, P, M].
The reason for that This is the only way, it seems, to assure feasibility. Indeed if deg(U)≤M then t(U)≤M. The question is do we loose a lot by bounding the sum of degrees by M while the sum may be 2M? One more detail: we need to guess the highest cost vertex in OPT and add it the our solution.
If deg(U)≤M, how much cost we loose? Let OPT= A {x} B so that deg(A+x) isthe first to be above M Thus A isa feasible solution for M. Clearly B too is a feasible solution for M because deg(A+x)>M and the total at most 2M One of A or B has ½ the weight. The fact that we guess the highest cost vertex in OPT compensate for x. Thus ratio 2.
What is the properties of a good solution? Let us check the case of adregular graphs. The question is if the edges are internal or external S
What is the properties of a good solution In this example most of the edges in S stay inside. Which means that t(S) is close to kd/2 U
What is the properties of a good solution But S can behave badly. Namely most edges go to V-U In this case t(S) close to dk. U
Is the Small SET Expansion Conjecturereliable? Opinions vary. I think: VERY RELIABLE. We tried to disprove the SSGE and failed. Namely we tried to give a ratio 2- for the problem descrbed before. It seems that the SSECis related to the a Dense k-subgraph
Partition via Sparsest cut • Motivation: • Natural Social Communities[MSST08,ABL10,…] • Better clusters (AGM) • Easier to compute (GLMY) • Useful for Distributed Computation (AGM) • Good Clusters Low Conductance? • Inside: Well-connected, • Toward outside: Not so well-connected.
Overlapping Clustering • Find a set of (at most K) overlapping clusters: each cluster degree sum≤ B, coveringall nodes, and minimize: • Maximum conductance of clusters (Min-Max) • Sum of the conductance of clusters (Min-Sum) • Overlapping vs. non-overlapping variants?
Summary of Results [Khandekar, K, Mirrokni.] Overlap vs. no-overlap: • Min-Sum: Within a factor 2 using Uncrossing. • Min-Max: Might be arbitrarily different.
Arora et al: Finding overlapping communities in social networks: Toward a rigorous approach. A lot of follow up work. M. Balcan et al: Again, rigorous study of overlapping clusters And much more. Now last topic: Density
The densest subgraph problem • Let e(S), be the number of edges in the graph induced by S. • This problem requires finding a subset S of V that maximizes e(S)/|S|. • A faster algorithm, approximates the best density by 2 but get O(n) time which is much faster than flow. • Was done by K,Peleg in 1992. Also Charikar1998. • Very extensively cited for social networks. Almost always attribute the result to Charikar.
A quick approximation for densest subgraph • Let be e(OPT)/|OPT| • We show that all vertices in the optimum have degree at least . • Otherwise, removing a vertex with degree less than increases the optimum. • Therefore we can iteratively remove vertices of degree strictly less than . We never remove a vertex of OPT and so the remaining graph is not empty.
The 2 approximation continued • The degree of all vertices in the final graph is at least . • Say that there are i vertices. The density is the sum of the degrees, over 2,divided by i. • In other words the density is at least i*/(2i)= /2 Thus ratio 2. With a bit of data structure, O(n) running time.
The dense k-subgraph problem Input: A graph G(V,E) and a number k Required: a set U of size k with maximum e(U) I started working on this circa 1993. What did I learn?
Unfortunately, not much It may be fair to say that this is a central problem. Amazing number of applications (too long to list). A big disappointment: under PNP there is still no hardness No PTAS under NP is not sub exponential. Khot Harness under assumptions on random 3-SAT. Feige
A fact about walks (and a one line proof) If the minimum degree is then clearly the number of walks of length d is at least nd What about the average degree? It turns out that this is also true for the average degree.
The number of walks with respect the the average degree It turns out that the number of walks is also at least nd I do not know the proof to this. But I discovered years later that is was known from 1920’s. A problem is hard: bypass the proof.
Do not solve problems: bypass them By the claim above the number of walks of length d is at least nd We bypass this hard proof. We give a one line or so proof that there is alwaysu,vso that Walks(u,v)≥ d/n Uses a known fact: the largest eigenvalue of the matrix of G is at least the average degree in G.
[Feige, K, Peleg]: n approximation for <1/3 Consider the matrix of the graph. Let 1≥ 2≥ 3 ≥… be the (real) eigenvalues of A. Well known: trace(A)=ii Well known: the eigenvalues of Ak areIk Well known that Ak[i,j] are the number of walks fromito j of length k.
The proof Ak[i,j] Ak[i,j]= i I2k ≥ 12k ≥2k This implies that there is an i and a j so that A2l[i,j] ≥2k/n2 The claim follows by taking square root of every side. Why did we count walks? Walks are example of trees that exist if the graph is dense enough but not if it is random.
The addiotonal insight Bhaskara, Charikar, Chlamtac, FeigeVijaraghavan. Almost 20 years later! The intuition comes from comparing random graph to a random graph with a dense subgraphimplanted in the random graph. It also turns out that walks are not the best thing to count.
Counting local trees It turns out that in a dense graph some trees appear more than in G(n,p). For example walks. Ad hoc- systematic. The state of the art: we can not tell between a graph chosen from G(n,1/sqrt{n}) and the same graph with sqrt{n} vertices changed to G(sqrt{n},1/n1/4)
Note the random gap The average degree of the planted graph is n1/4 In the random graph any set of sqrt{n} vertices will have O(sqrt{n}) edges. The author complement this with essentially O(n1/4 ) approximation. This was done by two groups one using LP lift and project and the other using combinatorial methods.
Important remarks In this example, the average degree equals the square root of the number of vertices both in the random planted graph and the original graph. As far as I understand, if this is not the case this paper can get better than trivial ratio. Technically the improvement is done by counting caterpillars and not walks.
Something is missing in our lower bound understanding We keep adding assumptions. The SSEC, the 2 to 1 unique game conjecture. Projection game conjecture. Countless paper show hardness just by saying the paper is Dense k-sub graph hard to approximate.
Technically, there is no value to such hardness Still, I would not try to give a polylog ratio to problems that are dense k-subgraphhard. An example of a simple such problem. k-Steiner Forest: connect k of the pairs. At best we can expect poly ratios. It seem OK assume the exponential time hypothesis namely that 3-SAT can not be solved in time 2o(n)