The Traveling Salesman Problem in Theory & Practice

The Traveling Salesman Problem in Theory & Practice Lecture 3: Polynomial-Time Solvable Special Cases 4 February 2014 David S. Johnson dstiflerj@gmail.com http://davidsjohnson.net Seeley Mudd 523, Tuesdays and Fridays

Outline • Rectangular Graphs • Weighted Planar Graphs • Gilmore-Gomory Instances But first, corrections to, and elaborations on, Lecture 2. (Already incorporated in last week’s lecture notes at davidsjohnson.net)

Getting to the Grid Lemma: Any 2-connected planar graph with f faces and n edges (all of degree 2 or 3) has an embedding in the 2D grid that can be contained in square of size 2f+n (has “extent” no more than 2f+n).

Proof of Lemma • Note that, if a graph is planar, then for each of its faces there is a planar representation in which that face is the external face (that is, the one that contains all the other vertices). • We proceed by induction on the number of faces f, with the hypothesis that, for any graph G as above and a designated face, there is a grid embedding of extent 2f+n or less, such that • the designated face is the external face of the embedding, and • no degree-2 vertex of the external face has any vertex or edge of the embedding to its right on the gridline containing it.

Base Case: f = 1 ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ 2f + n = 4 + 6 = 10 > 8

Inductive step: Assume true for all f’ < f ∂∂∂ Chosen Face F ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ Planar 2-connected graph G with all vertex degrees equal to 2 or 3

Pick a face F’ that shares an edge with our chosen face. Delete all shared edges and degree-2 vertices, leaving n’ vertices. Now we have an (f-1)-face graph and a chosen face, for which the induction hypothesis holds. ∂∂∂ Neighboring Face F’ Chosen Face F ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ Combined Face F’’ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂

Embedding with F’’ as external face. Extent no more than 2(f-1)+n’ ∂∂∂ Degree 2 vertices on the boundary of our original face ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ Two Cases for the boundary of F’’ that is not shared with F: Interior Exterior

First Case: Internal ∂∂∂ Degree 2 vertices on the boundary of our original face Deleted edges and degree-2 vertices from F. ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ Extent no more than 2(f-1)+n’ + 1 < 2f+n Extent no more than 2(f-1)+n’ ∂∂∂

First Case: Internal ∂∂∂ But what if there were many deleted degree-2 vertices? ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ Extent no more than 2(f-1) + n’ Extent no more than 2(f-1) + n’ + (n-n’) < 2(f-1) + n Extent no more than 2(f-1) + n + 1 < 2f + n

2nd Case: External ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ Extent no more than 2(f-1) + n’ Extent no more than 2(f-1) + n’ + (n-n’) < 2(f-1) + n Extent no more than 2(f-1) + n + 2 < 2f + n

Note that this argument assumes we can always find a neighboring face that shares only a single path with our chosen face. ∂∂∂ No Yes ∂∂∂ ∂∂∂ ∂∂∂ F ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ Exercise: Show that this is true.

Color-Preserving Embeddings • If a graph is bipartite, we can two-color its vertices (say black and white), so that no two adjacent vertices get the same color. • Similarly, we can 2-color the vertices of the 2D grid, where the vertices whose coordinates sum to an even number get white, and those with odd sum get black. • Lemma: If our original planar graph is bipartite, we can obtain an embedding into the 2D grid is color preserving.

Embed as before. • Multiply the scale by two so that all vertices go to black grid points • Move white vertices one cell to the right, as illustrated below.

Rectangular Graphs Rectangular subgraph of the infinite grid. 14 x 8 Trivial Theorem: A rectangular graph has a Hamilton circuit if and only if at least one of its dimensions is even.

Hamilton Paths in Rectangular Graphs • No specified endpoints: Always exists. • One specified: Depends on the endpoint. • Two specified endpoints: Depends on the endpoints. • For the last two cases, algorithms exist for determining whether desired Hamilton path exists [Itai, Papadimitriou, & Swarcfiter, 1982].

Grid graphs without holes (Solid grid graphs) Theorem [Umans & Lenhart, “Hamiltonian cycles in solid grid graphs,” FOCS 97, 496-507. HAMILTON CYCLE is polynomial-time solvable for solid grid graphs.

Rectangular Graphs with Edge Weights • Goal: Find Hamilton cycle with smallest total edge weight. • Conjecture: NP-hard for general rectangular graphs. • Idea of potential proof: Pick up vertices in holes using forced Hamilton paths, setting weights to do the forcing. • Theorem: For any fixed k, solvable in polynomial time for rectangular graphs of height no more than k.

Proof by Dynamic Programming (for Even k)

States (j,Ej,Pj) of the Dynamic Program 1 2 3 4 … n j Pick a vertical gridline j, 1 ≤ j ≤ n, [n possibilities]. Pick a subset Ejof the grid segments on the gridline j [2k-1 possibilities]. Pick a set Pj of disjoint pairs of gridpoints on gridline j, where each pair must be connected by a path lying entirely to the right of the gridline. Total number of possibilities is hence O(n23k) which is linear in n for fixed k(although exponential in k).

Bounding the Possibilities for Pj • For each vertex on the chosen jth vertical gridline, let degj(v) be the number of edges in in Ej incident on v. • If degj(v)= 0, then tour edges leave v both to left and right and v must be in a pair. • If degj(v) = 1, then an edge must go left or right, a total of at most 2k choices. • If degj(v) = 2, then v cannot be in any pair. • Thus there are at most 2kchoices, given Ej, each yielding a set of k’ ≤ k pair members and hence k’/2≤k/2 pairs (if k’ is odd, then the state is infeasible). • Because our grid is planar, the paths connecting the pairs cannot intersect, and so the pairs can be viewed as a set of k’/2 correctly-matched parentheses. ((())) ()(()) ()()() (())() (()())

Catalan Numbers • The number of ways one can correctly nest n parentheses is called the Catalan number Cn. • Wikipedia currently has 5 different proofs that: • This means there are at most O(2kCk/2) = O(22k) possibilities for the set Pj.

States (j,Ej,Pj) of the Dynamic Program 1 2 3 4 … n j Pick a vertical gridline j, 1 ≤ j ≤ n, [n possibilities]. Pick a subset Ej of the grid segments on the chosen gridline [2k-1 possibilities]. Pick a set of properly nested pairs Pj [O(22k) possibilities]. Total number of possibilities is hence O(n23k) which is linear in n for fixed k(although exponential in k).

Valuef(j,Ej,Pj) of State (j,Ej,Pj) 1 2 3 4 … n j • Let Gj be the grid graph from column 1 through column j, augmented by artificial edges connecting the pairs in Pj and having cost 0. • The value f(j,Ej,Pj) is the minimum cost of a Hamilton circuit H for the Gj subject to the constraint that H contains all the segments in Ej and all the artificial edges connecting pairs in Pj. • The value is ∞ if no such H exists.

Basis for the Dynamic Program Only finite-cost states for j = 1.

Induction • Suppose we wish to compute f(j,Ej,Pj)and we know the values f(j’,Ej’,Pj’)for all states with j’ = j-1. • Consider all “consistent” subsets of the k horizontal grid segments and k-1 vertical grid segments on the left (at most 22k-1possibilities). • Take best sum of • The value of a state (j-1,SG’,SA’)induced in this way, plus • The total weight of horizontal grid segments in the consistent subset inducing that state, plus • The total weight of the vertical segments in SA. Not Consistent J-1 j

Induction • Suppose we wish to compute f(j,Ej,Pj)and we know the values f(j’,Ej’,Pj’)for all states with j’ = j-1. • Consider all “consistent” subsets of the k horizontal grid segments and k-1 vertical grid segments on the left (at most 22k-1 possibilities). • Take best sum of • The value of a state (j-1,SG’,SA’) induced in this way, plus • The total weight of horizontal grid segments in the consistent subset inducing that state, plus • The total weight of the vertical segments in SA. Potentially Consistent J-1 j J-1 J-1 j J-1 j

Computing the Optimal Value • Compute the values of f(n,Ej,Φ) for all potentially consistant states (n,Ej,Φ). • Return the smallest value found. • Total running Time = O(n23k22k-1k) = O(nk25k). • Probably much better in practice, although still exponential. n n n n n

Proof for Odd k • Answer is ∞ if n is odd. • Otherwise, we only have valid states for j = 2i, i = 1 ≤ i ≤ n/2. • Hence we must proceed in steps of two grid columns rather than just one. • A bit more complicated, but do-able.

TSP for Planar Weighted Graphs • Solvable in time 2O(sqrt(N)) by Divide-and-Conquer. • (Recall our best general TSP algorithm takes time O(n22n)). • Exploits planar separator theorem [Lipton & Tarjan, 1979]: • In any planar graph Gwith N vertices, the vertices of G can be partitioned into 3 sets, S, A, and B, such that |S| ≤ 2sqrt(2N), both |A| and |B| are bounded by 2N/3, and there is no edge between a vertex in A and one in B. This partition can be found in linear time. ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ A B B A ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ S S S

How an Optimal Tour Might Look Forced Edges ∂∂∂ ∂∂∂ B A ∂∂∂ ∂∂∂ ∂∂∂ S

Cases (ES,PS) for Separator and its State. ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ • ES = subset of the edges linking vertices of the separator S to each other such that no vertex in S has more than two incident edges from ES,and no cycle is present in ES. For this case, the edges in ES must be in our tour. • Note that |ES| ≤ 3|S|- 6 by Euler’s Formula. Thus there are less than 26sqrt(2N)possibilities for ES(probably far fewer because of the degree and cycle restrictions). • PSis a set of pairs of vertices in Swhich must be connected by paths, all of whose vertices are in A∪S, or else all are in B∪S.

An upper bound on the possibilities for PS • For each v in S, let degE(v) be the number of edges in ES that are incident on v and note that we must have degE(v) in {0,1,2}. • If degE(v) = 2, then v cannot be the endpoint of any path in A or B. • If degE(v) = 1, then vcan be the endpoint of one path, in either Aor B. [2 possibilities] • If degE(v) = 0, then v can be the endpoint of one path in one path in Aand one path in B, or can be in the middle of a path on the A side, or can be in the middle of a path on the B side. [3 possibilities] • Note that in a Hamilton circuit for G, the paths through any connected component X of A between distinct vertices in S must correspond to a parenthesization with at most floor(|SX|/2) pairs, where SX is the subset of S consisting of those members that are adjacent to a node in component X, and similarly for B. • The worst-case count will occur when there is just one connected component.

The Possibilities for PS • For each choice of ES, there are at most 3|S|possibilities for the roles of the members of S. • In choosing endpoints of paths, we can never have more than floor(S/2) possible pairs of members of S. • Using our previous Catalan number analysis, this gives at most 2|S| possibilities for nested pairs. • Thus, given ES, we have no more than (3|S|)(2|S|) possibilities, or no more than 2(1+log(3))sqrt(2N) • Multiplying by the number of possibilities for ES, we get a total number of possibilities of 26sqrt(2N)2(1+log(3))sqrt(2N) = 2(7+log(3))sqrt(2N) < 213sqrt(N)

Running Time Induction • Claim: We can solve the planar weighted TSP (with some edges fixed) in time O(2αsqrt(N)) for a constant α to be determined later. • Proof: We will show that if the bound holds for N’ <N, then it holds for N as well. As a basis for the induction, if N ≤ 225, we solve the problem by exhaustive search (in large but constant time).

So, suppose N > 225. Find a planar separator S with max(|A|,|B|)≤ 2N/3, and |S| ≤ 2sqrt(2N) < 3sqrt(N), which is less than N/5 since N > 225. Thus neither |A∪S| nor |B∪S| exceeds (0.87)N. So for each of our less than 213sqrt(N) choices, we can solve the relevant A and B problems in time no more than than 2αsqrt(.87N) < 2(.94)αsqrt(N) For a total of at most 213sqrt(N)2(.94)αsqrt(N) = 2(13+(.94)α)sqrt(N). This will be less than 2αsqrt(N)if α > 13 + (.94)α, or (.06)α > 13, and α > 217 will hence suffice and our overall running time is O(2217sqrt(N)). QED?

Gilmore-Gomory Distances • Cities are pairs of rational numbers vi = (ai,bi), 1 ≤ i ≤ N. • Distances determined by two real integrable functions f,g: R→ R, subject to the constraint that for any x, f(x) + g(x) ≥ 0. • Cost of from city vito city vjis • (The area between ai and bj under the relevant curve f or g.) • Note that this is not necessarily symmetric. • The constraint guarantees that the total cost of any cycle is non-negative.

Example 1: f(x) = 1, g(x) = -1 • If bi ≤ aj, cij=aj- bi, • Otherwise cij= -|bi - aj|. • Example 2 (Scheduling Application): • Each city i represents a job that must be started at temperature ai and will end at temperature bi. • Distance from city i to city j then represents the cost of raising (or lowering) the temperature from the ending temperature for job i to the starting temperature for job j. • A tour can be viewed as a schedule that starts with job 1 at temperature a1, executes all the jobs, and then returns the system to temperature a1 so it can go round again. • The constraint guarantees that one cannot gain money by going through a cycle.

The Algorithm • Standard Heuristic: • Find a minimum-cost directed matching on the bipartite graph with copies of the cities representing the out-ports of the cities on the left and copies representing the in-ports on the right, with the cost of the edge from vi(out) to vi(in) being cij. • A matching thus yields a set of N directed arcs, with each city having in- and out-degree 1. If the resultig graph is a single cycle, then it is an optimal tour and we are done. • Otherwise, we have a union of directed cycles. • Patch the cycles together repeatedly by • picking a pair of cycles • removing one arc from each, and • replacing the removed arcs by a pair of arcs joining the two cycles. ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂ ∂∂∂

Gilmore & Gomory’s Novel Ideas • Simple, problem-specific method for finding the initial matching in time O(NlogN), as compared to the O(N3) of our best algorithm for the general problem. • Clever algorithm, exploiting minimum spanning trees and some extra, problem-specific magic, that finds a series of patches that yields the optimal tour.

Finding the Matching • Index the cities so that b1 ≤ b2 ≤ … ≤ bN. • By sorting, find a permutation πsatisfying aπ(1) ≤ aπ(2) ≤ … ≤ aπ(N). • Our matching then can then be viewed as consisting of the pairs (vi,vπ(i)), 1≤ i≤ N. • Proof by an uncrossing argument.

Classical Uncrossing Argument Theorem: For a TSP in the plane under a metric that obeys the triangle inequality, there is always an optimal tour in which no two edges intersect each other in their interiors. A B ∂∂∂ ∂∂∂ X ∂∂∂ ∂∂∂ ∂∂∂ C D By the triangle inequality, d(A,B)≤ d(A,X) + d(X,B), and similarly, d(C,D) ≤ d(C,X) + d(X,D) Crossing gone at no increase in cost!

Uncrossing Argument for our Matching aπ(1) b1 b2 aπ(2) aπ(i) bi aπ(j) bj Claimed Optimal Matching with Crossing

Case 1: Both Jobs Warming bi bj aπ(i) aπ(j) Crossed Uncrossed Integral Ranges for f Same cost whether crossed or uncrossed!

Case 2: One Job Warming, One Cooling f g f g aπ(i) bi bj aπ(j) Crossed Uncrossed Integral Ranges for f and g Original has an extra region of f+g, which is non-negative and so removing it cannot make things worse – uncrossed is at least as good!

Additional Cases • There are additional cases, depending on whether bi is below aπ(i) and so forth, but they all work more-or-less the same way. • Consequently, the no-crossing matching is optimal, as claimed.

The Traveling Salesman Problem in Theory & Practice