Chapter 1: Introduction

Chapter 1: Introduction • What is the course all about? • Problems, instances and algorithms • Running time v.s. computational complexity • General description of the theory of NP-completeness • Problem samples

What is this Course About? • Generally: • Computational complexity • Intractability “The inherent computational complexity of problems” • Particular Topics: • Turing machines, deterministic & non-deterministic • Complexity classes P, NP, co-NP, #P, and PSPACE • NP-completeness, NP-hardness, #P-completeness, PSPACE-completeness • Special cases and subproblems • Approximation algorithms, e.g., heuristics, & performance bounds • The polynomial hierarchy, etc.

What is this Course About? • What does it mean to say that a problem is “intractable?” • undecidable • decidable, but require exponential time to output a solution • decidable, but require exponential time to compute a solution • In this course we focus on the last category.

The Traveling SalesmanOptimization Problem • A problem is a general question to be answered that consists of: • Some number of parameters (a generic instance) • A statement of what properties a solution possesses (page 4) • An example of a problem: TRAVELING SALESMAN OPTIMIZATION INSTANCE: Set C of m cities, distance d(ci, cj)  Z+ for each pair of cities ci, cj C. GOAL: Find a tour of C (i.e., a permutation <c(1) , c(2),…, c(m)> of C) having minimum total length. • Note the format!

TSP Optimization Instance • A problem instance is a collection of specific values for all of a problems parameters. • A TSP instance: C = {c1, c2, c3, c4} D(c1,c2) = 10 D(c1,c3) = 5 D(c1,c4) = 9 D(c2,c3) = 6 D(c2,c4) = 9 D(c3,c4) = 3 c1 9 c4 5 3 c3 10 6 9 c2

Problems, Instances and Algorithms • Let  denote a problem. Then the parameters for  define a multi-dimensional “data space” (or collection) of instances referred to as D. • Each point in this space represents one specific instance.

Problems, Instances and Algorithms • Our definition of a problem is very general, and contains many useless problems: SILLY INTEGER COMPUTATION INSTANCE: Positive integer B. GOAL: Compute the largest prime number less than 1000. • The question to be asked is usually in terms of the instance parameters.

Optimization vs. Decision Problems • Many (natural) problems of interest are optimization problems. • Minimization, maximization • Although not as natural on the surface, the theory will focus on decision problems, which are problems that have yes or no answers. • A decision problem consists of two parts: • A list of parameters (i.e., a generic instance); defines a set D of instances. • A yes/no question asked in terms of the parameters; specifies a subset of yes instances Y which is a subset of D.

The Traveling SalesmanDecision Problem (TSP) TRAVELING SALESMAN INSTANCE: Set C of m cities, distance d(ci, cj)  Z+ for each pair of cities ci, cj C positive integer B. QUESTION: Is there a tour of C having length B or less, I.e., a permutation <c(1) , c(2),…, c(m)> of C such that: *See the books appendix for a list of over 300 well know/studies problems.

TSP Instance • A TSP instance (decision version): C = {c1, c2, c3, c4} D(c1,c2) = 10 D(c1,c3) = 5 D(c1,c4) = 9 D(c2,c3) = 6 D(c2,c4) = 9 D(c3,c4) = 3 B = 27

Optimization vs. Decision Problems • Why decision problems? • Simple formal counterpart – a formal language • As a matter of convenience: • easier to transform/reduce decision problems than it is optimization problems • unreasonably large output does not affect running time or complexity. • No loss of generality; results extend to optimization problems***

Optimization vs. Decision Problems • More generally, all the optimization problems we will deal with can be converted to a decision problem by adding an additional parameter B. • A decision problem can be no harder than the corresponding optimization problem. • Why? • Observation: An algorithm for an optimization problem can typically be used to solve the corresponding decision problem, e.g., TSP. • A couple of scenarios: • There is an efficient algorithm for the TSP optimization problem. • There is a proof that the TSP decision problem is very hard. • Both of the above can’t happen; one or the other, but not both!

Optimization vs. Decision Problems • One scenario we have not ruled out – the decision problem is easy, but the optimization problem is hard. • The above is not very common, a decision problem can frequently be shown to be no easier than the optimization problem (not quite as obvious).

Running Time v.s. Complexity • We will distinguish between the running time of a specific algorithm vs. the computational complexity of a particular problem. • Example: MATRIX MULTIPLICATION INSTANCE: Two n x n matrices A and B SOLUTION: One n x n matrix C = A x B • Running times of specific algorithms: • Simple row/column algorithm - O(n3) • Strassen’s algorithm - O(n2.807) • Coppersmith-Winograd algorithm - O(n2.3728639)

Running Time v.s. Complexity • We will distinguish between the running time of a specific algorithm vs. the computational complexity of a particular problem. • Example: MATRIX MULTIPLICATION INSTANCE: Two n x n matrices A and B SOLUTION: One n x n matrix C = A x B • Statement on the inherent computational complexity of matrix multiplication: • Any algorithm for matrix multiplication requires (n2)in the worst case, i.e, O(n2) is the best any algorithm could possibly do (this is an information theoretic argument).

Running Time v.s. Complexity • Example: INTEGER SORTING INSTANCE: List of n integers. SOLUTION: The list of integers in non-decreasing order. • Running times of specific algorithms: • Real dumb algorithm - O(n3) • Bubble sort - O(n2) • Merge sort - O(nlogn) • Statement on the inherent computational complexity of sorting: • Any comparison-based sorting algorithm requires (nlogn) operations in the worst case, i.e, O(nlogn) is the best any algorithm could possibly do.

The Satisfiability Problem (SAT) • A very important problem in the theory of NP-completeness is the satisfiability problem. SATISFIABILITY INSTANCE: Set U of variables and a collection C of clauses over U. QUESTION: Is there a satisfying truth assignment for C? • Example #1: U = {u1, u2} C = {{ u1, u2}, { u1, u2}} Answer is “yes” - satisfiable by setting both variables T

The Satisfiability Problem (SAT) • A very important problem in the theory of NP-completeness is the satisfiability problem. SATISFIABILITY INSTANCE: Set U of variables and a collection C of clauses over U. QUESTION: Is there a satisfying truth assignment for C? • Example #2: U = {u1, u2} C = {{ u1, u2}, { u1, u2}, { u1 }} Answer is “no”

Satisfiability, Cont. • What would be a simple algorithm for SAT? • Build a truth table • Running time would be (at least) O(n2m) • m is the number of variables • n is the length of the expression • See pages 7 and 8 from the book • Is a more efficient algorithm possible? • probably… • How about one with polynomial running time? • Come see me if you find one! • A live white turkey and a Stanford job awaits…

General Points • We are interested in the “border” between exponential and polynomial - given a problem, is there a polynomial time algorithm for it, or are all algorithms for it exponential in running time? • We are not interested in what the specific polynomial or exponential is, “per se,” although the theory can be modified/refined to consider these. => Simplistically and inaccurately speaking, saying that a problem is “NP-complete” or “NP-hard” is essentially saying that there is no (deterministic) polynomial time algorithm for that problem.

General Points, Cont. • Polynomial time does not necessarily imply practical. • O(n1000) • O(n2) could be 10,000,000n2 • NP-complete/NP-hard/intractable does not necessarily imply that their aren’t useful, practical algorithms. • Our measures are worst-case, and average case may not be all that bad, e.g., quicksort is O(n2) worst case, but O(nlogn) on average. • In theory, an algorithm could have worst-case running time O(2n) because of one case, and O(n2) average • Simplex algorithm for linear programming • Branch-and-bound algorithm for knapsack problem. • isn’t all that bad.

General Points, Cont. • Proving a problem is NP-complete or NP-hard is just the beginning: • Heuristic development and analysis (the problem doesn’t go away) • Special cases of the problem may be solvable in polynomial time • Sub-exponential time algorithms may exist.

NP General Description of the Theory • We will describe a class of (decision) problems called NP. • NP consists of those decision problems that can be solved in Non-deterministic Polynomial time • Holy cow! What is that, and how could it be possibly be important? • This class contains many/most commonly encountered problems.

NP P General Description of the Theory • We will define a subset of NP called P. • P consists of those problems from NP that can (also) be solved in (deterministic) polynomial time • Why is deterministic in parenthesis? • A very big, important question is P = NP? • i.e., can all problems in NP be solved in (deterministic) polynomial time? • The answer to this question appears to be no, i.e., there exist problems in NP for which there is no known (deterministic) polynomial time algorithm.

General Description of the Theory • This last point will lead us to define another subset of problems in NP called NP-complete. • The above diagram implies several relationships: • P and NP-complete are subsets of NP (fact) • P and NP-complete are proper subsets of NP (unproven, widely believed) • P and NP-complete do not intersect (unproven, widely believed) • Why is this set NP-complete important? NP NP-complete P

NP NP-complete P Facts about NP-complete Problems • Some basic facts about NP-complete problems that we will prove. • Suppose  is an NP-complete problem. • Fact #1: There are no known polynomial time algorithms for ; all known algorithms require exponential time, e.g., exhaustive search • Fact #2: It is not known for certain whether  requires exponential time or not. • All NP-complete problems appear to require exponential time, but only because no polynomial time algorithm has been found for any of them. • Fact #3: If  P then P = NP • No such NP-problem has ever been identified.

Facts about NP-complete Problems • Because of #3, it is frequently said that NP-complete problems are the hardest problems in NP. • Give a problem , we would like to know if   P or   NP-complete. • Since NP contains many very practical problems that people have tried (and failed) to come up with polynomial time algorithms for, it is highly unlikely that any NP-complete problem can be solved in polynomial time.

More Sample Problems DIVISIBILITY BY 2 INSTANCE: Integer k. QUESTION: Is k even? CLIQUE INSTANCE: A Graph G = (V, E) and a positive integer J <= |V|. QUESTION: Does G contain a clique of size J or more? GRAPH K-COLORABILITY INSTANCE: A Graph G = (V, E) and a positive integer K <= |V|. QUESTION: Is the graph GK-colorable?

Problems, Instances and Algorithms • And, by the way… • An algorithm is a general, step-by-step procedure for solving a specific problem, e.g., a computer program. • An algorithm is said to solve a problem if that algorithm can be applied to any instance of the problem and is guaranteed to always produce a solution for that instance.

Chapter 1: Introduction