250 likes | 416 Views
A Fully saturated OpenCL Particle swarm optimizer. Donald Pupecki. Why GPUs?. GPUs are scaling faster than CPUs[1] Intel recently wrote a paper claiming a two year old GPU was “only 10x faster” than their top-end CPU. [2]. Why OpenCL?. Portable. Can run on most parallel platforms CPU
E N D
A Fully saturated OpenCL Particle swarm optimizer Donald Pupecki
Why GPUs? • GPUs are scaling faster than CPUs[1] • Intel recently wrote a paper claiming a two year old GPU was “only 10x faster” than their top-end CPU. [2]
Why OpenCL? • Portable. • Can run on most parallel platforms • CPU • GPU • FPGA • Cell B.E. • Industry Wide Support • Binding For many languages
Why PSO? • Particle Swarm Optimization • Created in 1995 by Kennedy, Eberhart , Shi[5][6] • Inspired by BOIDS, a bird flocking simulation by Craig Reynolds[3][4] • Provides good exploration, exploitation sometimes requires parameter tuning. [7] • A subset of Swarm Intelligence, which is itself a subset of Evolutionary Computation • Video: http://vimeo.com/17407010
Is it parallelizable? • The PSO, like most evolutionary computations is iterative. • General process: Within each iteration there is a lot to do.
Parallel Gotchas • Bank conflicts[18] • Divergent Warps • Struct Sizes • Coalescing • Debugging • SIMD • No Branch Prediction
Goals: Implement More Test Functions • The previous version of this work only implemented 4 test functions. • More non-parabolic test functions could help confirm the soundness of the algorithm.
Test Functions Implemented [14][15]
Goals: Shift Test Functions • PSOs have a tendency to flock to the middle. [10] • Remedied by “Region Scaling”[10][11][12] or “Center offset” approach. • Used center offset with an offset of 2.5
Goals: More Parallelism • Version 1.0 only does fitness calculations over Num Particles. • Dimensional Calculations almost all contain products or sums over all dimensions and should be done in parallel and combined with a reduction.
Goals: Move Everything to Card • Less than 1/10th of the time in the 1.0 version was spent doing calculation.
OCLPSO 1.1 • Get everything to the card! • Needed to implement a LCRNG on card. • Eliminated large data transfers. • Added hill climber to fall back on. • Got inconsistent results. • Note on synchronizations. Barrier, Mem Fence, FP Atomics, Compute Capability 2.x and the lack of global sync.
OCL PSO 2.0 • Rewrite. • Introduced the Fully Saturated concept. • Compute Units are now self contained. • All dimensions of all particles of all swarms are evaluated in parallel.
OCLPSO 2.0 vs. CUDA PSO 2.0 • The CUDA PSO[16] [17] was submitted for the GECCO competition on GPGPGPU.com • It features a ring topology for its bests calculation.
OCLPSO 2.0 vs. SPSO • The 2006 Standard PSO was written by Clerc and Kennedy. • Pretty frills-free compared to newer (2010) Standard PSO codes or TRIBES. • Non-parallel but extremely efficient.
Conclusions • OCL PSO 2.0 performs competitively as compared to CUDA PSO and trumps SPSO. • This is despite its lack of neighborhood bests(it uses only a global best). • This is also without the hill climb component as I needed to strip as much complexity out as I could in the final rewrite.
Future work • Re-implement hill climber, it was a great help for peaks. • Re-add neighborhoods. • Implement on newest generation of CUDA 2.x cards supporting global floating point atomics • Implement TRIBES on the GPU.
References [1] Nvidia, “TESLA DPU computing: Supercomputing at 1/10th the Cost.” http://www.nvidia.com/docs/IO/96958/Tesla_Master_Deck.pdf , 2011 [2] Nvidia, “Why Choose Tesla,” http://www.nvidia.com/object/why-choose-tesla.html, 2011 [3] BOIDS paper or wiki:boids http://www.red3d.com/cwr/boids/ [4] Reynolds, Craig (1987), "Flocks, herds and schools: A distributed behavioral model.", SIGGRAPH '87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques (Association for Computing Machinery): 25–34 [5] Kennedy, J.; Eberhart, R. (1995). "Particle Swarm Optimization". Proceedings of IEEE International Conference on Neural Networks. IV. pp. 1942–1948. doi:10.1109/ICNN.1995.488968. http://www.engr.iupui.edu/~shi/Coference/psopap4.html. [6] Shi, Y.; Eberhart, R.C. (1998). "A modified particle swarm optimizer". Proceedings of IEEE International Conference on Evolutionary Computation. pp. 69–73. [7] Clerc, M.; Kennedy, J. (2002). "The particle swarm - explosion, stability, and convergence in a multidimensional complex space". IEEE Transactions on Evolutionary Computation [8] M. Nobile. “Particle Swarm Optimization”, http://vimeo.com/17407010, 2011
References – Cont. [9] M. Clerc, Confinements and Biases in Particle Swarm Optimization. http://clerc.maurice.free.fr/pso/ , 2006. [10] C. K. Monson and K. D. Seppi, "Exposing Origin-Seeking Bias in PSO," presented at GECCO'05, Washington, DC, USA, 2005, pp. 241-248. [11] P. J. Angeline. Using selection to improve particle swarm optimization. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC 1998), Anchorage, Alaska, USA, 1998. [12] D. K. Gehlhaar and D. B. Fogel. Tuning evolutionary programming for conformationally flexible molecular docking. In Evolutionary Programming, pages 419–429, 1996. [13] D. Pupecki, “OpenCL PSO, Development, Benchmarking, Lessons, Future Work,” 2011. http://web.cs.sunyit.edu/~pupeckd/oclpsopres.pdf [14] Molga, M., Smutnicki, C., 2005. “Test functions for optimization needs”, http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf [15] Pohlheim, Hartmut. Genetic and Evolutionary Algorithm Toolbox for usewith MATLAB . Technical Report, Technical University Ilmenau, 1998. [16] L. Mussi, F. Daolio, and S. Cagnoni. Evaluation of parallel particle swarm optimization algorithms within the CUDA architecture. Information Sciences, 2011, in press.
References – Cont. [17] L. Mussi, Y.S.G. Nashed, and S. Cagnoni. GPU-based Asynchronous Particle Swarm Optimization. Proc.GECCO 2011, 2011. [18] NVIDIA, OpenCL Programming Guide for the CUDA Architecture, http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_OpenCL_ProgrammingGuide.pdf, 2011 [19] M. Clerc. TRIBES - un exempled’optimisation par essaimparticulaire sans param`etres de contrˆole. In Optimisation par EssaimParticulaire (OEP 2003), Paris, France, 2003.