430 likes | 576 Views
Topics in Algorithms. Introduction to Computational Complexity Theory. Quiz. A set S of strings is given as below. Find the shortest strings s (called superstring ) of S that contains every element of S as a substring. This quiz mimics DNA sequencing. (example). [Quiz].
E N D
Topics in Algorithms Introduction to Computational Complexity Theory
Quiz • A set S of strings is given as below. • Find the shortest strings s (called superstring) of S that contains every element of S as a substring. • This quiz mimics DNA sequencing. (example) [Quiz] S={ate, half, lethal, alpha, alfalfa} S={TCTCTA,CAGTCT,CTCCAAA, GGCAA,TAAGCTCC,TTCTCTC, TCCAAATTCTA,CTTTCT,AACACCTT, CTCCGACC,TTCTATC,TCTATCTC, CTCTGTAACA, CAACAG} s’= atehalflethalphalfalfa s = lethalphalfalfate ate half lethal alpha alfalfa This example is from [Blum 94].
Issues in Computational complexity theory • Showing upper/lower bounds of computational resources required for solving a problem L. • Upper/lower bounds are described as functions of the length of an input. • Such bounds for • time, • (memory) space, • … • Structural complexity • among classes of problems • Example) P NP E EXP, P E
This talk’s main issues (1/2) • How to deal with hard (time-consuming) problems • What to do when we find a problem that looks hard. • Sometimes, we could not find any efficient (polynomial-time) algorithm to solve the problem. • (1) If the problem is not hard, someone can find it. • (2) If the problem is really hard, other smart people cannot find it either. (1) (2)
This talk’s main issues (2/2) • The previous quiz looks intractable to solve. • # possible solutions is 14!=14 13・・1=87,178,291,200. • However, it is not easy to say the problem is hard. • It is hard to find a needle in a haystack. • needle = efficient algorithm • It seems harder to say that there is no needle in a haystack. • You just might miss a needle in the haystack. No needle? Computational complexity theory provides an answer.
Key idea • We have two problems A and B. • Given input x, we would like to know if xA (xB). • Suppose A is efficiently transformed with f into B • such that aA iff f(a) B. • a: input of A, f: transformation (reduction), f(a): input of B. • This shows that B is harder than (or as hard as) A. • A is solvable if there is a way to solve B. x1 B ‘yes’ x2 B ‘no’ x3 A f(x3) B ‘yes’ x4 A f(x4) B ‘no’ algorithm B1
Overview • Intuitive explanation of hard (time-consuming) problems • Decision problems/Optimization problems • Polynomial time • Class P, Class NP • Reductions • NP-complete and NP-hard • Examples • Superstring problem • Reduction from Traveling salesman problem
Types of problems (1/2) • Computational problems roughly fall into two categories: • Decision problem (output: yes/no), • Optimization problem (output: solution with max./min. cost). • Decision problem L • input: • string x • output: • ‘yes’ if xL, • ‘no’ otherwise. • Example) L: positive odd numbers. • L={1, 3, 5,…} • x=3 ‘yes’ since xL, • x=4 ‘no’ since xL.
Types of problems (2/2) • Computational problems fall into two categories: • Decision problem (output: yes/no), • Optimization problem (output: solution with max./min. cost). • Optimization problem M • input: • string x • cost function f • output: • y such that f(y) is the maximum (or the minimum) • Example) maximize f(x,y)= 2x2y–xy2+3. • x=1 y=1, f(1,1)=4.
Examples of problems (1/6) • Euler cycle problem (ECP) • Decision problem • Input (instance): • A undirected graph G=(V,E). • Output: • ‘yes’ if there is a graph cycle which uses each edge in G exactly once, • ‘no’ otherwise. ‘yes’ ‘no’
Examples of problems (2/6) • Shortest superstring problem (SSP) • Decision problem • Input (instance): • A set of sequences S={s1, …sn} and an integer (threshold) l. • Output: • ‘yes’ if there is a string s such that, for all i, si is a substring of s and the length of s is at most l. • ‘no’ otherwise. ‘yes’ since this string contains any sequences and its length is less than 18. s1 = TACGA s2 = ACCC s3 = CTAAAG s4 = GAGC length: 18 TACGACCCTAAAGAGC TACGA ACCC CTAAAG GAGC length: 10 ‘no’
Examples of problems (3/6) • Shortest superstring problem (Min-SSP) • Optimization problem • Input (instance): • A set of sequences S={s1, …sn}. • Output: • The shortest string s such that, for all i, si is a substring of s. s1 = TACGA s2 = ACCC s3 = CTAAAG s4 = GAGC TACGACCCTAAAGAGC
Examples of problems (4/6) • Traveling salesman problem (TSP) • Decision problem • Input (instance): • n cities (nodes) with the cost of travel between each pair of them, and an integer (threshold) t. • Output: • ‘yes’ if there is a tour of visiting all the cities and returning to your starting point with cost at most t, • ‘no’ otherwise. 4 b a max. cost: 14 ‘yes’ since the cost of this tour is less than 14. 5 4 4 2 3 3 2 3 a b d c a d c ‘no’ max. cost: 10 3
Examples of problems (5/6) • Traveling salesman problem (Min-TSP) • Optimization problem • Input (instance): • n cities (nodes) with the cost of travel between each pair of them. • Output: • A tour of visiting all the cities and returning to your staring point with the smallest cost. 4 b a 4 2 3 3 5 4 2 a b d c a 3 d c 3
Examples of problems (6/6) • Satisfiability problem (SAT) • Decision problem • Input (instance): • ABoolean function f over variables x1,…,xn. • Each takes either true (1) or false (0). • Output: • ‘yes’ if there is a truth assignment of x1,…,xn that satisfies f. • ‘no’ otherwise. ‘yes’ since f = T (1) where f = x1 (x1x2 x3 ) (x1x2x3 x4 ) (x2 x3 x4) (x1x3) x1 = F (0), x2 = T (1), x3 = F (0), x4 = F (0).
Polynomial time • To simplify the notion of ‘hardness’, we use polynomial-time as the cut-off for efficiency. • polynomial p(n) • Function for some k 1 and ak,…,a0 : • p(n)=aknk+ ak – 1nk –1+・・・+・・・+a0 . • Key property of polynomials • Let p(n) + q(n) be polynomials. • The sum p(n) + q(n) is also polynomial. • A composite function q(p(n)) is also polynomial of n.
Turing machine • An abstract model of computers. • At each step, • based on • its current state and • the symbol indicated by the header, • the Turing machine changes • its internal state, • the symbol indicated by the header, and • a position of the header. B B 1 0 0 1 1 B B header s1 one step B B 1 1 0 1 1 B B header s2
Hierarchy in the Computational Theory Halting problem of Turing machines undecidable EXP decidable 2n intractable= exponential time Traveling salesman NP graph isomorphism tractable= polynomial time P nlogn sorting n: input size median n Based on a figure in http://www-imai.is.s.u-tokyo.ac.jp/~imai/lecture/quantum_complexity.pdf
Well-known classes of decision problems • P: a set of decision problems solvable by a deterministic Turing machine in polynomial time. • ECP P. • NP: a set of decision problems solvable by a non-deterministic Turing machine in polynomial time. • ECP, TSP, SSP, SAT NP. NP P
Example of class NP • TSP NP since • TSP is solvable in polynomial time by a non-deterministic Turing machine. • At each branch, one node is chosen non-deterministically. • We suppose that it is possible to select the best choice at each branch with the non-deterministic Turing machine. a Time b c d threshold: 14 c d d b c b 4 a b 5 b d c b c d 4 2 3 a a a a a a c d 3 16 16 12 14 12 16
a b d c a certificate Alternate definition of class NP • TSP NP since • TSP is a decision problem defined with a verifier A(x, y) over strings such that • a string y is with length smaller than |x|c where c is a constant, • A(x,y) is computable by a deterministic Turing machine in polynomial time of |x|+|y|. • A(x,y) is also computable by a deterministic Turing machine in polynomial time of |x|. • Such y is usually called a certificate for x. 4 verifier A(x, y) running in polynomial time a b threshold: 14 5 4 ‘yes’ 2 3 c d 3
Features of problems in NP (1/2) • The number of possible solutions grows exponentially with the size of inputs. • Example) SSP • Threshold: 12 S={half, alpha, alfalfa} halfalphalfalfa alphalfalfa alfalfahalfalpha half alpha alfalfa half alpha alfalfa half alpha alfalfa halfalfalpha alphalfalfahalf alfalfalphalf half alpha alfalfa half alpha alfalfa half alpha alfalfa
Features of problems in NP (2/2) • We can verify any instance in polynomial time where we have its certificate (a superstring). • Example) SSP • Threshold: 12 S={half, alpha, alfalfa} alphalfalfa half alpha alfalfa
Harder problems (1/3) • Suppose that • problems L1 and L2 are in NP. • C(x) denotes a certificate for x. verifier A1 ‘yes’ x1 L1, C(x1) x2 L1, y ‘no’ verifier A2 x3 L2, C(x3) ‘yes’ x4 L2, y ‘no’
Harder problems (2/3) • Suppose that • problems L1 and L2 are in NP, • C(x) denotes a certificate for x, • we construct this transformation called a reduction. verifier A1 ‘yes’ x1 L1, C(x1) x2 L1, y ‘no’ reduction running in polynomial time verifier A1 f(x3) L1,C(f(x3)) ‘yes’ x3 L2, C(x3) f(x4) L1, y ‘no’ x4 L2, y
Harder problems (3/3) • Under these assumptions, verifier A1 for L1 is able to say ‘yes’ or ‘no’ correctly for any instance of L2. • We say L1 is (polynomial-time) reducible to L2. • We denote this by L1 L2 • L2 then has to be harder than or as hard as L1 if we can construct this reduction. • When a polynomial-time algorithm for L1 is available, the algorithm also provides a solution in polynomial time for any instance of L2. verifier A2 verifier A1 f(x3) L1, C(f(x3)) ‘‘yes’’ x3 L2, C(x3) f(x4) L1, y x4 L2, y ‘‘no’’
Cook-Levin Theorem • [Theorem] Any decision problem Q in NP is reducible to SAT. • SAT is one of the hardest problems in NP. • Such a problem is called a NP-complete problem. f(x1) SAT, C(f(x1)) x1 Q1, C(x1) verifier A f(x2) SAT, y x2 Q1, y ‘yes’ f’(x3) SAT, C(f(x3)) x3 Q2, C(x3) ‘no’ f’(x4) SAT, y x4 Q2, y
Good property on reductions • Reduction can contain multiple transformations. verifier A2 ‘yes’ x3 L2, C(x3) x4 L2, y ‘no’ verifier A1 f(x3) L1,C(f(x1)) ‘yes’ x3 L2, C(x3) f(x4) L1, y ‘no’ x4 L2, y verifier A3 ‘yes’ x3 L2, C(x3) ‘no’ x4 L2, y
NP-complete • A problem L in NP is NP-complete • if Q is reducible to L for any problem Q in NP, • if SAT is reducible to L, • since QSATL for any Q in NP, • or if an NP-complete problem L’ is reducible to L. • since QL’L for any Q in NP, • SAT is reducible to other problems in NP. • 3-SAT, • Clique, • 3-Color, • Hamilton path problem, • Traveling salesman problem, … • These problems are also the most intractable problems in NP. Clique Indep. set SAT 3-SAT 3-Color Vertex Cover HamPath TSP
How to show that a problem L is NP-complete • It consists of two steps: • A decision problem L is in NP. • There is a reduction from an NP-complete problem Q to L. • L is (as hard as or) harder than Q. • From the definition of NP-complete, for any problem Q’ in NP, there is a reduction from Q’ to L. • For an optimization problem Max(Min)-L, we can say Max(Min)-L is NP-hard • if there is a reduction from an NP-complete problem Q to L.
Example of reductions (1/9) • We will see that TSP is reducible to SSP. • SSP is as hard as or harder than TSP. • SSP is NP-complete since TSP is NP-complete and TSPSSP • Let x be an instance of TSP, where threshold = n. • Let f(x) be a transformed instance of SSP, where threshold = 3n + 2m + 1. x f(x) (SSP) (TSP) a#A b#B c#C d#D … n+m strings threshold: 3n+2m+1 optimal cost: 3n+2m+k+1 a b n vertices m edges with cost 1 threshold: n optimal cost: n+k f c d
Example of reductions (2/9) • Reduction from TSP to SSP • Input x of TSP • Graph with costs between two nodes (arc 1, without arc: 2) • Input f(x) of SSP • Created from the input x of TSP. nodes arcs with cost 1 strings a b c d e ab ac ae cd ce a CdCe CeCd a#A b#B c#C d#D e#E AbAc AcAe AeAb db de b c DbDe DeDb BaBc BcBa ba bc eb ec EbEc EcEb d e
Example of reductions (3/9) • x TSP f(x) SSP • TSP • the optimal cost is 5 with the tour (aecdba). • n=5, m=11, k=0. • SSP • the shortest superstring is 38 long. • 3n + 2m + k + 1 = 35+211+0+1=38. 20 30 10 a a#AeAbAcAe#EcEbEc#CdCeCd#DbDeDb#BaBcBa b#B d#D c#C e#E a#A DbDe DeDb BaBc BcBa CdCe CeCd AbAc AcAe AeAb EbEc EcEb b c d e
BcBaBc BcBa BaBc CeCd#D CeCd d#D Example of reductions (4/9) • x TSP f(x) SSP • Distance graph • A weight on an arc is # characters of a prefix before a match. • thin line = cost 2, thick line = cost 3, no line = more than 3. a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb
b#BaBc b#B BaBc b#BcBa b#B BcBa Example of reductions (5/9) • x TSP f(x) SSP • Distance graph with cost-2 arcs a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb
Example of reductions (6/9) • x TSP f(x) SSP • Distance graph with cost 2 arcs • The sum of costs of arcs: 2m. a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb b#BaBcBa b#B BaBc BcBa
Example of reductions (7/9) • x TSP f(x) SSP • Distance graph with cost 2 arcs • 3n + 2m + k + 1 = 35+211+0+1=38. • Tour aecdba a#A b#B c#C d#D e#E AbAc AcAe AeAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb
Example of reductions (8/9) • x TSP f(x)SSP • TSP • the optimal cost is 6 with the tour (aecdba). • n=5, m=11, k=1. • SSP • the shortest superstring is 37 long, where the threshold is 36. • 3n + 2m + k + 1 = 35+210+1+1=37. arcs nodes cd ce strings a a b c d e CdCe CeCd a#A b#B c#C d#D e#E ab ac AbAc AcAb db de b c DbDe DeDb BaBc BcBa ba bc eb ec EbEc EcEb d e
Example of reductions (9/9) • x TSP f(x)SSP • Distance graph • a–ecdba • Additional cost from an edge between and “AbAc” to “e#E”. a#A b#B c#C d#D e#E AbAc AcAb BaBc BcBa CdCe CeCd DbDe DeDb EbEc EcEb
Results on approximation • Min-SSP is MAX SNP-hard [Blum 94], • that is, there is no polynomial time algorithm for Min-SSP that finds approximate solution with arbitrary error ratio if P NP [Arora 98]. • It is hard to efficiently find an arbitrary approximate solution for a given instance of Min-SSP. • On the other hand, several constant-factor (4-, 3-, or 2.5-) approximation algorithms have been developed.
Summary • NP-complete problems is the most intractable decision problems in NP. • No one knows any polynomial-time algorithm that finds a solution of an NP-complete problem. • A decision problem L is NP-complete if • L is in NP and • there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem. • A optimization problem Max-(Min-)L is NP-hard if • there is a polynomial-time reduction from Q to L, where Q is an NP-complete problem.
Reference (1/2) • Issues on the computational complexity theory • Textbooks • M.R. Garey and D.S. Johnson (1979): Computers and Intractability: a guide to the theory of NP-completeness, W. H. Freeman. • O. Watanabe (1992): Introduction to computability and complexity theory, Kindai-Kagaku-sha (in Japanese). • M. Sipser (1996): Introduction to the theory of computation, PWS Publishing company. • M. T. Goodrich and R. Tamassia (2002): Algorithm Design: Foundations, Analysis, and Internet Examples, John Wiley and Sons, Inc. • Slides of ‘NP-completeness’ (http://www.algorithmdesign.net/handouts/NPComplete.pdf) • Article • A. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy (1998): “Proof verification and the hardness of approximation problems”, Journal of the ACM, 45(3), pp. 501 – 555.
Reference (2/2) • Shortest superstring problem • Textbook • D. Gusfield (1997): ‘‘Algorithms on strings, trees, and sequences: computer science and computational biology’’, Chapter 16, Cambridge University Press. • Article • A. Blum, T. Jiang, M. Li, J. Tromp, and M. Yannakakis (1994): “Linear approximation of shortest superstring”, Journal of the ACM,41(4), pp. 630 – 647.