170 likes | 263 Views
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm that is guaranteed to solve the problem in all cases.
E N D
Lecture 24 Coping with NPC and Unsolvable problems. • When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm that is guaranteed to solve the problem in all cases. • However, it's rare that we actually want the capability of solving a problem in all possible cases. We can: • Specialize for particular applications. • Try heuristics. • For example, we know compressing a string is not solvable (we proved it), not even approximate compression. But everybody is doing it anyways and some people are making money out of it. • Theory and practice are usually very far apart!
When a problem is NP-hard … • Similarly, when we prove a problem is NP-complete, that means that no one currently has a polynomial-time algorithm for the problem. But that's absolutely not a reason to give up. • Theoretical proofs are often deceiving. For optimization problems, we are often willing to settle for solutions that are not best possible, but come pretty close to being best possible. • For example, for the travelling salesman problem, finding the tour of least cost is nice, but in real life we would often be content with finding a tour that is close to optimal. This leads to the idea of approximation algorithms.
Exhaustive search • Although exhaustive search is too slow for large instances of NP-complete problems, as the solution space can grow exponentially, there are tricks that can speed up the computation in many cases. • For example, although the travelling salesman problem is NP-complete, we can find optimal travelling salesman tours for real-world instances with hundreds or even thousands of cities, by using some search techniques.
Backtracking • Backtracking and exhaustive search is something we have “avoided” at all cost in this course. • But is it really that bad?.
Example. This often works. For input (x1 OR ~x2) AND (~x2 OR x4) AND (x1 OR x2 OR x3). By setting x1 = 0 and x1 = 1, we reduce to simpler formulas: (x1 OR ~x2) AND (~x2 OR x4) AND (x1 OR x2 OR x3) / \ x1 = 0 / \ x1 = 1 / \ (~x2) AND (~x2 OR x4) AND (x2 OR x3) (~x2 OR x4)
Branch and Bound • Branch-and-bound is a natural idea applied to optimization problems. • The idea is that we keep track of the cost of the best solution or partial solution found so far, and reject partial solutions if they exceed some quantity, as sometimes we can estimate the cost from the partial solutions.
Example. travelling salesman • Suppose we have a partial solution given by a simple path from a to b, passing through vertices S and denote it by [a, S, b]. • Extend this to a full tour by finding a path [b, V-(S ∪ {a,b}), a]. We do this extension edge by edge, so if there is an edge (b, c) in the graph then [a, S, b] gets replaced by [a, S ∪ {b}, c]. • How to estimate the cost of a partial solution? Given a partial tour [a, S, b], the remainder of the tour is a path through V-S-{a,b}, plus edges from a and b to this part. Therefore the cost is at least the sum of the least-weight edge from a to V-S-{a,b}, the least-weight edge from b to V-S-{a,b}, and the minimum spanning tree of V-S-{a,b}, which can be estimated.
Approximation Algorithms. • If we cannot solve it exactly, we can find approximate solutions. • For example, for the travelling salesman problem, we might settle for a tour that is within some constant factor of the best. • For minimization algorithms, the approximation ratio of an optimization algorithm A is defined to be A’s Solution / Optimal Solution
Vertex Cover: approx. ratio 2 • A matching in a graph is a subset of the edges such that no vertex appears in two or more edges. • A matching is maximal if one cannot add any new edges to it and still preserve the matching property. • Maximal matching Alg.: Examine edges consecutively and add them to our matching if they are disjoint from edges already chosen, all in polynomial time.
Ratio-2 Vertex Cover continues … Clearly • (1) The number of vertices in any vertex cover of G is at least as large as the number of edges in any maximal matching. • (2) The set of all endpoints of a maximal matching is a vertex cover. • So, letting M be the set of edges in a maximal matching, and C be the number of vertices in the smallest vertex cover, we have |C| ≥ |M| by (1) and 2|M| ≥ |C|. It follows that our algorithm for constructing a vertex cover has an approximation ratio bounded by 2. • Dinur and Safra proved you can’t do better than 1.3606 unless P=NP.
Shortest Common Superstring • Approximation algorithms are usually simple, but the proof of approximation guarantees are usually hard. Here is one example. • Given n strings s1, s2, … , sn, find the shortest common superstring s. (I,e, each si is a substring of s, and s is the shortest such string.) • The problem is NP-complete. • Greedy Algorithm: keep on merging max overlapped strings, until one left. Theorem: This Greedy algorithm is 4 x optimal.
Widely used in DNA sequencing • It is widely used in DNA shotgun Sequencing (especially with the new generation of sequencers which promises to sequence a fragment of 40k BP long): • Make many copies (single strand) • Cut them into fragments of lengths ~500. • Sequence each of the fragments. • Then assemble all fragments into the shortest common superstring by GREEDY: repeatedly merge the pair with max overlap until finish. • Dec. 2001 release of mouse genome: 33 million reads, covering 2.5G bases (x10 coverage)
Many have worked on this: • Many people (well known scientists, including one author of our textbook) have worked on this and improved the constant from 4 to • 3 • 2.89 • 2.81 • 2.79 • 2.75 • 2.66 • 2.5
Theorem. GREEDY achieves 4n, n=opt. Proof by Picture: Given S={s1, … ,sm}, construct G: • Nodes are s1, … ,sm • Edges: if then add edge: where pref is the pref length. I.e. |si|=pref+overlap length with sj • |SCS(S)| = length shortest Hamiltonian cycle in G • Greedy Modified: find all cycles with minimum weights in G, then open cycle, concatenate to obtain the final superstring. (Note: regular greedy has no cycles.) sj pref si pref si sj
This minimum cycle exists • Assuming initial Hamiltonian cycle has w(C) = n • Then merging si with sj is equivalent to breaking into two cycles. We have: w(C1)+ w(C2) ≤ n • Proof: We merged (si, sj) because they have max overlap. Picture shows: Reasoning: s’ and s” at least overlap that much so that the sum of red is no more than sum of green: d(si,sj)+d(s’’,s’)≤d(si,s’)+d(s’’,sj) • Continue this process,end with self-cycles: C1, C2, C3, C4, … Sw(Ci) ≤ n. C si sj s’ s’’ … S” sj si C1 S’ si sj s’ s’’ C2
Then we open cycles & concatenate • Let wi=w(Ci) • Li =| longest string in Ci | • |open Ci| ≤ wi + Li • We know n ≥ S wi Lemma. S1 and S2 overlap ≤ w1+w2 • S(Li-2wi) ≤ n, by Lemma, since Li’s must be in the final SCS. • |Greedy’(S)|<S(Li+wi) =S(Li-2wi)+S3wi ≤ n + 3n =4n. QED w1 w1 w1 w1 s1 w2 w2 w2 s2
Open Question • Show Greedy achieves approximation ratio 2.