1 / 17

A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Utilizing GPU power for solving complex combinatorial problems, focusing on RCPSP, with parallelization on Nvidia CUDA Framework. Implementing Tabu Search Algorithm to optimize scheduling.

jmcallister
Download Presentation

A GPU algorithm design for the Resource Constrained Project Scheduling Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A GPU algorithm design forthe Resource Constrained Project Scheduling Problem Libor Bukata and Přemysl Šůcha {bukatlib,suchap}@fel.cvut.cz The Czech Technical University in Prague

  2. Motivation • Our motivation is to use power of the GPU to solve combinatorial problems. • Existing works: • [1] M. Czapinski and S. Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform,” J. Parallel Distrib. Comput., vol. 71, pp. 802–811, June 2011. • [2] V. Boyer, D. El-Baz, and M. Elkihel, “Solving knapsack problems on GPU,” Computers & Operations Research, vol. 39, no. 1, pp. 42–47, 2012. • We tackle more complex combinatorial problem than [1,2]. • We are focused on homogeneous model.

  3. Outline • Problem Statement (RCPSP) • Sequential Solution (Tabu Search Algorithm) • Parallelization • Parallelization on the Nvidia CUDA Framework • Experimental Results • Conclusions

  4. Problem Statement • The Resource Constrained Project Scheduling Problem (RCPSP) is a general scheduling problem. • It is one of the most important problem in project management, manufacturing and production optimization. • The problem is NP-hard since P2||Cmax is already NP-hard (two partitioning problem) 1 2 3 0 4 7 5 6

  5. Problem Statement • A set of NactivitiesV = {0, … , N-1} with durationsD = (d0; … ; dN-1) : di ℤ+. Activity 0 is the first activity of the project and N-1 is the last one. • Precedence among activities are given via a Direct Acyclic Graph G(V, E) where E is a set of edges such that (i, j) E. 1 2 3 0 4 7 5 6

  6. Problem Statement • A set of M renewable resources with capacities R = {R0, … , RM-1}, where Rk  ℤ+. • Activity resource requirement ri,k ℤ+. Cmax Resource 1 R1 4 3 2 1 5 6 3 4 1 1 2 t 0 1 2 3 4 5 6 3 Resource 2 R2 3 2 1 0 4 7 5 6 2 5 1 3 6 t 0 1 2 3 4 5 6

  7. Problem Statement • ScheduleS is vector (s0, … , sN-1) of activities start time values si ℤ+ satisfying constraints of the mathematical model: objective function precedence constraints resource constraints

  8. The Tabu Search Algorithm for the RCPSP • The RCPSP can be solved via the meta-heuristic approach Tabu Search (TS) • l = 0; Find an initial solutionWl W (a topological order); Wbest = Wl. • While (l < L) • Determine W (Wl) neighborhood of Wl. • Eliminate infeasible solutions W (Wl) -> W ‘(Wl) • Compute Cmax(Wnext) of solution Wnext  W ‘(Wl). • Assign Wl+1 = argminCmax(Wnext) : Wnext TL • TL = TL  Wl+1; • If Cmax(Wbest) > Cmax(Wl+1) then Wl+1 -> Wbest. • If the solution was not improved during the given number of iterations perform diversification of Wl+1 • l++ • ReturnWbest

  9. Representation of the Solution • The solution represented by vector of start time values (s0, … , sN-1) results in a huge solution space. • That is the reason why we selected the order of activities W = (w0, … , wN-1) as the solution representation, e.g. (1,5,6,3,4,2) Cmax R1 4 3 2 1 5 6 3 4 1 t 0 1 2 3 4 5 6 R2 3 2 1 5 6 2 1 3 t 0 1 2 3 4 5 6

  10. The Neighborhood of the Solution • Neighborhood W (Wl) is a set of solutions obtained by applying all possible swap operators to Wl . • A swap operator exchanges two activities in Wl. • For example swap(3,7): (1,5,2,3,4,6) (1,5,6,3,4,2) Cmax Cmax R2 R2 3 2 1 3 2 1 5 2 6 5 6 2 1 3 1 3 t t 0 1 2 3 4 5 6 0 1 2 3 4 5 6

  11. TS Parallelization on the GPU • Parallelization was inspired by [3]. • There is a set of independent solutions. • Each CPU thread tries to improve an assigned solution until the given number of iterations is reached. • Each thread processes solutions one by one. • Access is controlled via atomic operations. • [3] T. James, C. Rego, and F. Glover, “A cooperative parallel tabu search algorithm for the quadratic assignment problem,” European Journal of Operational Research, vol. 195, no. 3, pp. 810 – 826, 2009. solution makespan Tabu List

  12. CUDA Mapping • Each CUDA block executes an independent TS algorithm • A thread processes one or more solution(s) in the neighborhood of the solution (elimination of infeasible solutions and Cmax(Wnext) computation).

  13. CUDA Mapping Block 0 Block 27 Shared memory Shared memory current solution W current solution W Registers helper variables … Registers helper variables precedence constraints precedence constraints durations of activities D durations of activities D Global memory Local memory Texture memory Arrays for evaluation of resources TL of Block 0 … TL of Block 27 required resources ri,k activities predecessors Activities start time values

  14. Implementation of the Tabu List • TL is stored in the global memory – access needs to be accelerated. • TLC (Tabu List Cache) is a 2D dimensional array of Boolean values. • Test whether a move is in the TL can be performed by a single read operation. TL: TLC: Add new move to TL: (iold, jold) = TL[index] TC[iold, jold] = false TL[index]= (i, j) TC[i, j] = true index = (index + 1)% |TL|

  15. Computation of Cmax • The goal is to minimize memory consumption. • Activities are added into the schedule one by one according to Wl taking into account precedence constraints and resource constraints. si si + di Rk +1 7 6 5 4 3 2 1 The earliest start time when activity i with ri,k = 3 can be executed. +1 +3 i +2 +2 di = 3 t 0 1 2 3 4 5 6 7 8

  16. Experimental Results • Experiments were performed on the Intel Xeon 2.66 GHz server and Nvidia Tesla 2050C (448 CUDA cores, 14 multiprocessors) graphics card. • J120 benchmark instances (600 projects with 120 activities) were used for performance measurements. • The GPU algorithm tests 1.8  106 solutions per second in average. • GPU is able to perform the same number of iterations 55 times faster than the CPU.

  17. Conclusions • The first known GPU algorithm solving the RCPSP. • Compared to [1] we propose a more efficient TL (Tabu List cache). • The algorithm for the schedule evaluation is suitable for the GPU (low memory requirements). • The homogenous model reduces required communication bandwidth between the CPU and the GPU. • [1] M. Czapinski and S. Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform,” J. Parallel Distrib. Comput., vol. 71, pp. 802–811, June 2011.

More Related