A Parallel GPU Version of the Traveling Salesman Problem

A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science

The Traveling Salesman Problem • Common combinatorial optimization problem • Wire routing, logistics, robot arm movement, etc. • Given n cities, find shortest Hamiltonian tour • Must visit all cities exactly once and end in first city • Usually expressed as a graph problem • We use complete, undirected, planar, Euclidean graph • Vertices represent cities • Edge weights reflect distances July 2011

TSP Algorithm • Optimal solution is NP-hard • Heuristic algorithms used to approximate solution • We use an iterative hill climbing search algorithm • Generate k random initial tours (k climbers) • Iteratively refine them until local minimum reached • In each iteration, apply best opt-2 move • Find best pair of edges (a,b) and (c,d)such that replacing them with (a,d) →and (b,c) minimizes tour length July 2011

GPU Requirements Thepcreport.net • Lots of data parallelism • Need 10,000s of ‘independent’ threads • Sufficient memory access regularity • Sets of 32 threads should have ‘nice’ access patterns • Sufficient code regularity • Sets of 32 threads should follow the same control flow • Plenty of data reuse • At least O(n2) operations on O(n) data July 2011

TSP_GPU Implementation • Assuming 100-city problems & 100,000 climbers • Climbers are independent, can be run in parallel • Plenty of data parallelism • Potential load imbalance • Different number of steps required to reach local minimum • Every step determines best of 4851 opt-2 moves • Same control flow (but different data) • Coalesced memory access patterns • O(n2) operations on O(n) data July 2011

Code Optimizations • Key code section: finding best opt-2 move • Doubly nested loop • Only computes difference in tour length, not absolute length • Highly optimized to minimize memory accesses • “Caches” rest of data in registers • Requires only 6 clock cycles per move on a Xeon CPU core • Local minimum compared to best solution so far • Best solution updated if needed, otherwise tour is discarded • Other small optimizations (see paper) July 2011

GPU Optimizations gamedsforum.ca • Random tours generated in parallel on GPU • Minimizes data transfer to GPU • (CPU only generates distance matrixand prints result) • 2D distance matrix resident in shared memory • Ensures hits in software-controlled fast data cache • Tours copied to local memory in chunks of 1024 • Enables accessing them with coalesced loads & stores July 2011

Evaluation Method • Systems • NVIDIA Tesla C2050 GPU (1.15 GHz 14 SMs w/ 32 PEs) • Nautilus supercomputer (2.0 GHz 8-core X7550 Xeons) • Datasets • Five 100-city inputs from TSPLIB • Implementations • CUDA (GPU), Pthreads (CPU), serial C (CPU) • Use almost identical code for finding best opt-2 move July 2011

Runtime Comparison (kroE100 Input) • GPU is 7.8x faster than CPU with 8 cores • One GPU chip is as fast as 16 or 32 CPU chips July 2011

Speedup over Serial (kroE100 Input) Pthreads code scales well to 32 threads (4 CPUs) CPU performance fluctuates (NUMA), GPU stable July 2011

Solution Quality • Optimal tour found in 4 of 5 cases with 100,000 climbers • 200,000 climbers find best solution in fifth case • Runtime independent of input and linear in climbers July 2011

Summary • TSP_GPU source code is freely available athttp://www.cs.txstate.edu/~burtscher/research/TSP_GPU/ • TSP_GPU algorithm • Highly optimized implementation for GPUs • Evaluates almost 20 billion tour modifications per second on a single GPU (as fast as 32 8-core Xeons) • Produces high-quality results • May be better suited for GPU than ACO and GA algos. • Acknowledgments • NSF TeraGrid (NICS), NVIDIA Corp., and Intel Corp. July 2011

A Parallel GPU Version of the Traveling Salesman Problem

A Parallel GPU Version of the Traveling Salesman Problem

Presentation Transcript

The Traveling Salesman Problem

Traveling Salesman Problem

Traveling-Salesman Problem

The Traveling Salesman Problem Approximation

Traveling Salesman Problem

The Traveling Salesman Problem

Traveling Salesman Problem (TSP)

A Parallel Architecture for the Generalized Traveling Salesman Problem

Traveling Salesman Problem

Parallel Implementation of Ant Colony Optimization on Traveling Salesman problem

Traveling Salesman Problem

Dynamic Traveling Salesman Problem

A Parallel Architecture for the Generalized Traveling Salesman Problem

traveling salesman problem

Traveling Salesman Problem (TSP)

The Colorful Traveling Salesman Problem

Traveling Salesman Problem, A Parallel Approach

Traveling Salesman Problem

Approximation Algorithm of Traveling Salesman Problem

ปัญหาการเดินทางของพนักงานขาย Traveling Salesman Problem (TSP)

The Traveling Salesman Problem

A Parallel Architecture for the Generalized Traveling Salesman Problem