380 likes | 572 Views
Large-Scale Evolutionary Optimization on the Grid: Multiple-Deme Genetic Algorithm in the Globus-Based Environment. Adam Padee ( apadee@ire.pw.edu.pl ) Wojciech Padee ( wpadee@ire.pw.edu.pl ) Krzysztof Zaremba ( zaremba@ire.pw.edu.pl ). Goals of the project.
E N D
Large-Scale Evolutionary Optimization on the Grid:Multiple-Deme Genetic Algorithm in the Globus-Based Environment Adam Padee (apadee@ire.pw.edu.pl) Wojciech Padee (wpadee@ire.pw.edu.pl) Krzysztof Zaremba (zaremba@ire.pw.edu.pl) Cracow Grid Workshop – 15-18 October 2006 - 1
Goals of the project • Create a tool for numerical optimization of complex problems that are: • Computationally very expensive • Impossible to solve using classical, gradient-based methods (too many local optima) • Utilize evolutionary algorithms as they don’t rely directly on the gradient vector • Objective function calls an external program and parses it’s output • Easy adaptation to new tasks • Example application: Track reconstruction optimization in HEP experiments – will be shown at the end of this presentation Cracow Grid Workshop – 15-18 October 2006 - 2
Common architectures of the parallel evolutionary algorithms (1) • Master-slave • One population is stored on a server (master node), calculation of the fitness function values distributed among the worker nodes (slaves) • Synchronous • Asynchronous (split population or SSGA – Steady State Genetic Algorithm) • Multiple population algorithms (also called coarse-grained) • They consist of multiple independent populations, exchanging only selected individuals. Frequency of the exchanges, migration channels and the operators applied to the individuals depend on the model, e.g.: • Fully connected topology (suitable especially for parallel supercomputers) • Island Model (arbitrary topology, simple migrations but less frequent) • Pollen transmission • Social • ... Cracow Grid Workshop – 15-18 October 2006 - 3
Common architectures of the parallel evolutionary algorithms (2) • Cellular (also called fine-grained) • One population divided spatially among neighboring processors. Each of them can process one or more individuals. Selection and crossing-over takes place only among neighbors. Most popular implementations: • Hardware (dedicated integrated circuits) • Software: usually on SIMD processors, although there are also very efficient implementations on ccNUMA architecture (Cache Coherent Non-Uniform Memory Access) • Hierarchical • Coarse-grained algorithms consisting of multiple cellular or master-slave algorithms. This is the most advanced, and also most flexible architecture. Cracow Grid Workshop – 15-18 October 2006 - 4
Asynchronous master-slave SSGA on a single cluster • Create empty population. • Create empty execution queue (this is an internal object with mapping to one of the physical queues in the batch system). • If there are not at least two free places in the execution queue, go to step 5 • Check if there are free places in the population • If there are, create two random individuals, place them in the execution queue and return to the step 3 • If not, select two individuals using reproduction operator. Apply crossing-over and mutation. Place them in the execution queue. Return to 3. • Wait until one of the client finishes it’s work. Collect the results. • If there are free places in the population, place the newcomer in one of them. If not, select the individual to replace using reverse reproduction operator (tournament, proportional or random). • Check if the stop criteria has been reached. If yes, terminate the program, otherwise return to the step 3. Cracow Grid Workshop – 15-18 October 2006 - 5
Implementation details(LSF and OpenPBS) • Master process runs on the batch system server (or on the designated UI machine - LSF): • Creates new individuals and applies the genetic operators • Registers input data in MSS • Runs and monitors the slave processes • Collects the results using batch system mechanisms and assigns the fitness values to the individuals in the execution queue (execution queue is program’s internal object, mapping to the batch system queue done via appropriate API) • Batch system introduces couple of seconds delay: • Registration in the queue • Selection of the free CPU • Transfer of the parameters • Monitoring • Gathering results • With job flow around 50-100 jobs/sec, the failure rate doesn’t exceed 10% (in real life application – RECON 2000) Cracow Grid Workshop – 15-18 October 2006 - 6
Flat master-slave on the Grid • At the first glance implementation is relatively easy • Convenient API/CLI functions for job submission • Single sign-on allows the master process to operate autonomously • Global file systems (e.g. LFC) facilitate the data access But ... • Approximately 100 more processors ( 100 more slaves, network bandwidth requirements are very high) • Complicated task monitoring and error analysis • Job submission overhead can reach order of minutes for a single job • RB + L&B is not prepared for a massive submission of short jobs ( frequent failures, disturbance for other users) Cracow Grid Workshop – 15-18 October 2006 - 7
Island Model GA: basic concept Growth phase: population on each of the islands is being developed independently GA 1 GA 2 GA 6 GA 3 GA 5 GA 4 Cracow Grid Workshop – 15-18 October 2006 - 8
Island Model GA: basic concept Migration phase: Each of the population selects one or more individuals (usually the best ones) and sends him to the neighboring island, where immigrant is introduced in the local population GA 1 GA 2 GA 6 GA 3 GA 5 GA 4 Migration channels Cracow Grid Workshop – 15-18 October 2006 - 9
Island model: parameters • Size of the member populations • Migration topology (directed graph) • Frequency of the migrations • Selection of the migrants and adoption of the immigrants These parameters have big influence on the convergence speed, but the optimal choice of their values highly depend on the optimized function and used infrastructure (type of the computer, cost of the CPU cycles vs. communications). There are models based on Markov chains allowing their calculation for a given probability of reaching the global optimum, but applicability of these models is limited to very simple cases Cracow Grid Workshop – 15-18 October 2006 - 10
Flat Island Model GA on the Grid • One deme per every CPU requires high migration rates (bandwidth problems) • Flat model of communication hard to implement across sites. • Grid-wide MPI not available in big production grids like EGEE • Introduction of dedicated service is not flexible • Possible exchange of information via replicas in LFC - very slow and inefficient solution Cracow Grid Workshop – 15-18 October 2006 - 11
Hybrid algorithm: Islands with master-slave SSGA populations • One island is formed on each cluster. • Thanks to fast internal communication, the master-slave algorithm for clusters can be used with only slight modifications • Master is running on the gatekeeper (CE), which is usually a batch system server or at least has proper rights to run batch jobs directly (via qsub or bsub) • This machine has outbound and inbound IP connectivity with other sites (at least GLOBUS_TCP_PORT_RANGE ports are open) • Communication with other islands is possible in any topology • Relatively big population size at each island allows lower migration rates • Migrants can be exchanged also via files in LFC Cracow Grid Workshop – 15-18 October 2006 - 12
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (LFC variant):start of the master processes Resource Broker JDL with task CondorG GA Passwordless SSH PBS Cracow Grid Workshop – 15-18 October 2006 - 13
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (LFC variant): calculations Running slave processes via PBS/LSF GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 14
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (LFC variant): migration Registration of the migrants GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 15
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (LFC variant): migration Readout of the immigrants GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 16
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (LFC variant): calculations Running slave processes via PBS/LSF GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 17
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (LFC variant): registration of the results Saving the best individuals GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 18
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (MPI variant): start of the master processes MPI-enabled Resource Broker JDL with task CondorG PBS GA Cracow Grid Workshop – 15-18 October 2006 - 19
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (MPI variant): calculations Running slave processes via PBS/LSF GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 20
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (MPI variant): migration Communication through MPI GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 21
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (MPI variant): calculations Running slave processes via PBS/LSF GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 22
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (MPI variant): registration of the results Saving the best individuals GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 23
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (TCP variant): start of the master processes Resource Broker JDL with task Registration of IP / port CondorG GA Passwordless SSH PBS Cracow Grid Workshop – 15-18 October 2006 - 24
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (TCP variant): start of the master processes Readout of other machines’ addresses GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 25
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (TCP variant): calculations Running slave processes via PBS/LSF GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 26
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (TCP variant): migration Communication through TCP/IP GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 27
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (TCP variant): calculations Running slave processes via PBS/LSF GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 28
Logical File Catalog UI JDL CE 1 CE 1 CE 1 Worker Nodes Worker Nodes Worker Nodes Hybrid algorithm (TCP variant): registration of the results Saving the best individuals GA GA GA Cracow Grid Workshop – 15-18 October 2006 - 29
Hybrid algorithm on the Grid – conclusions and problems • Size of each deme should reflect the available number of CPUs at a site ( demes have different sizes) • To avoid differences in the convergence speed, it is necessary to differentiate intensities and ranges of the genetic operators. For example, smaller demes have lower Gaussian mutation range, thus performing their search more locally. • Length of the epoch at each deme should be variable and adapted with local population development • Too early migrations may lead all the islands to a suboptimal solution • Migrations have to be done asynchronously • Problem especially with MPI or plain TCP versions. To overcome that difficulty, two independent processes are needed (one for migrations control, one for population development and batch system management) Cracow Grid Workshop – 15-18 October 2006 - 30
Hybrid algorithm on the Grid – conclusions and problems • MPI and TCP variants are not yet implemented • For the most demanding application – particle track reconstruction optimization in HEP – LFC seems to be enough • Inter-cluster MPI not available (at least not in EGEE) • Manual TCP/IP communication is troublesome • It is hard to guess, how many job slots are really available for a given VO (different batch system configurations, not always reflected in the information index) • This does not affect the SSGA directly, but eventually may lead to improper adaptation of the operator ranges and intensities. Therefore, some islands may lag behind the others due to slower convergence Cracow Grid Workshop – 15-18 October 2006 - 31
Test results – simulated behavior on well-known deceptive functions Griewank function Rosenbrock function Cracow Grid Workshop – 15-18 October 2006 - 32
Test results – simulated behavior on well-known deceptive functions Griewank function 1x100 individuals 2x100 individuals 10x100 individuals Fully connected topologies Rosenbrock function 1x50 individuals 2x50 individuals 4x50 individuals Fully connected topologies Cracow Grid Workshop – 15-18 October 2006 - 33
„Real life” application – optimization of particle track reconstruction Bending Magnet Target • Input data: set of hits from the detector planes • Output data: momenta of the charged particles • To get the momentum we need to reconstruct the whole track first. Detector planes Cracow Grid Workshop – 15-18 October 2006 - 34
Problems • Particles don’t leave traces in all the detector planes. • Many hits originate from the background noise. • Mathematical models used in reconstruction are simplified. • In one trigger there are tracks from many particles. Cracow Grid Workshop – 15-18 October 2006 - 35
Optimized parameters • Geometrical tolerances on the straight parts of the tracks (areas 1 and 3) • Number of missing planes allowed in each track • Precision of the crossing point in the area of the magnet • Precision of the primary interaction vertex in the target • ... And • Everything in 3 dimensions + angles where applicable • Each step consists of several iterations controlled by different parameters • Totally about 70 parameters should be optimized simultaneously • Evaluation of one set takes about 10 minutes Cracow Grid Workshop – 15-18 October 2006 - 36
Results Mean number of properly and improperly reconstructed tracks (synchronous, total population size 40, 100 physical events used for fitness calculation). Mean number of properly and improperly reconstructed tracks (asynchronous, total population size 100, 500 physical events used for fitness calculation). Cracow Grid Workshop – 15-18 October 2006 - 37
Literature • Alba E., Tomassini M.: Parallelism and Evolutionary Algorithms. IEEE Transactions on Evolutionary Computation, Vol. 6 no. 5, pp.443-462, (2002) • Cantu-Paz E.: Efficient and Accurate Parallel Genetic Algorithms: Kluwer Academic Publishers (2000) • Goldberg D.E.: Genetic algorithms in search, optimization, and machine learning: Addison-Wesley (1989) • Meunier H. et al.: A Multiobjective Genetic Algorithm for Radio Network Optimization. Proceedings of the 2000 Congress on Evolutionary Computation CEC00 (2000) • Michalewicz Z.: Genetic Algorithms + Data Structures = Evolution Programs: Springer-Verlag Berlin Heidelberg (1996) • Miettinen K. et al.: Evolutionary Algorithms in Engineering and Computer Science: John Wiley and Sons Ltd (1999) • Padee A.,Kurek K., Zaremba K. “Parallel evolutionary algorithm for track reconstruction optimization on PC cluster”. “Artificial Intelligence and Soft Computing”, Polish Neural Network Society, Warsaw 2006, pp. 211-216 • The COMPASS Collaboration: Common Muon and Proton Aparatus for Structure and Spectroscopy. CERN/SPSLC 96-14 (SPSC/P297) (1996) Cracow Grid Workshop – 15-18 October 2006 - 38