Reasons for parallelization

Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization 1) The reason of the nature 2) The reason of GA itself

A classification of parallel GA Basic idea: divide-and-conquer 1) Global parallelization: only one population, the behavior of the algorithm remains unchanged, easy to implement 2) Coarse-grained parallel GA(粗粒度): the population is divided into multiple subpopulations, each subpopulation evolves isolated and exchanges individuals occasionally 3) Fine-grained parallel GA(细粒度): the ideal case is to have just one individual for every processing element 4) Hybrid parallel GA

Global parallelization 1) Initialization 2) Repeat the following steps 2.1) Selection 2.2) Crossover 2.3) Mutation 2.4) Calculate the fitness

Is there any difference? 1) Initialization 2) Repeat the following steps 2.1) Selection 2.2) Crossover 2.3) Mutation 2.4) Calculate the fitness for i=1 to N par_do calculate the fitness of ith individual endfor

The basic characteristics This method maintains a single population and the evaluation of the individuals is done in parallel. Each individual competes with all the other chromosomes and also has a chance to mate with any other individual. The genetic operations are still global. Communication occurs only as each processor receives its subset of individuals to evaluate and when the processor return the fitness values.

Implementation The model does not assume anything about the underlying computer architecture. On a shared memory multiprocessor, the population can be stored in shared memory and each processor can read the individuals assigned to it and write the evaluation results back without any conflicts. It may be necessary to balance the computational load among the processors. On a distributed memory computer, the population can be stored in one processor. This “master” processor will be responsible for sending the individuals to the other processors (the slaves) for evaluation, collecting the results, and applying the genetic operators to produce the next generation.

The genetic operators Crossover and mutation can be parallelized using the same idea of partitioning the population and distribution the work among multiple processors. However, these operators are so simple that it is very likely that the time required to send individuals back and forth will offset any performance gains. The communication overhead is also a problem when selection is parallelized because most forms of selection need information about the entire population and thus require some communications.

Conclusion Global parallel GA is easy to implement and it can be a very efficient method of parallelization when the evaluation needs considerable computations. This method also has the advantage of preserving the search behavior of the GA, so we can apply directly all the theory for simple GA. Unbalanced load.

Coarse grained parallel GA 1) Initialization and divide all the individuals into p subpopulation 2) for i=1 to p par-do 2.1) for j=1 to n do selection, crossover, mutation calculate the fitness 2.2) select some individuals as the migrants 2.3) send emigrants and receive immigrants 3) Go to 2)

The basic characteristics Coarse-grained GA seems like a simple extension of the serial GA. The recipe is simple: take a few conventional (serial) GAs, run each of them on a node of a parallel computer, and at some predetermined times exchange a few individuals. Coarse-grain parallel computers are easily available, and even if there is no parallel computer available it is easy to simulate one with a network of workstations or even in a single processor machine. There is relatively little extra effort needed to convert a serial GA into a coarse-grained parallel GA. Most of the program of the serial GA remains the same and only a few subroutines need to be added to implement migration.

The basic characteristics Strong capability for avoiding premature convergence while exploiting good individuals, if migration rates/patterns well chosen

Migrant Selection Policy Who should migrate? Best guy? One random guy? Best and some random guys? Guy very different from best of receiving subpopulation? (“similarity reduction”) If migrate in large % of population each generation, acts like one big population, but with extra replacements – could actually SPEED premature convergence

Migrant Replacement Policy Who should a migrant replace? Random individual? Worst individual? Most similar individual (Hamming sense) Similar individual via crowding?

How Many Subpopulations? How many total evaluations can you afford? Total population size and number of generations and “generation gap” determine run time What should minimum subpopulation size be? Smaller than 40-50 USUALLY spells trouble – rapid convergence of subpop – 100-200+ better for some problems Divide to get how many subpopulations you can afford

Fine-grained parallel GA 1) Partition the initial population with N individuals to N processors; 2) for i=1 to N par-do 2.1) Each processor select one individual from itself and its neighbour 2.2) Crossover with one individual from its neighbour and remain one offspring 2.3) Mutation 2.4) Calculate the fitness 3) Go to 2)

The basic characteristics The largest possibility of parallelization. There is intensive communication between the processors. It is common to place the individuals in a 2-D grid because in many massively parallel computers the processing elements are connected using this topology.

Hybrid parallel algorithms Combine the methods to parallelize GA and this results in hybrid-parallel GAs.

Some examples This hybrid GA combines a coarse-grained GA (at the high level) and a fine-grained GA (at the low level)

Some examples This hybrid GA combines a coarse-grained GA at the high level where each node is a global parallel GA

Some examples This hybrid uses coarse-grained Gas at both the high and low level. At the low level the migration rate is faster and the communication topology is much denser than at the high level.

Network model Here, k independent GAs run with independent memories, operators and function evaluations. At each generation, the best individuals discovered are broadcast to all the sub-populations.

Community model Here, the GA is mapped to a set of interconnected communities, consisting of a set of homes connected to a centralised town. Reproduction and function evaluations take place at home. Offspring are sent to town to find mates. After mating, "new couples" are assigned a home either in their existing community or in another community.

Why we introduce PGA? Allow a more extensive coverage of the search space and an increased probability to find the global optimum They could also be used for multi-objective optimisation, with each sub-population responsible for a specific objective, and co-evolution, with each sub-population responsible for a specific trait.

Reasons for parallelization

Reasons for parallelization

Presentation Transcript

Parallelization Issues for MINLP

Loop Parallelization

Trend Towards Parallelization

Parallelization

ALTER: Exploiting Breakable Dependences for Parallelization

Efficient Parallelization for AMR MHD Multiphysics Calculations

Cooperative Parallelization

Parallelization and Tuning

GPU Parallelization Strategy for Irregular Grids

HW5: Parallelization

Automatic Parallelization

Parallelization at a Glance

Parallelization of urbanSTREAM

Parallelization of RHSEG

Parallelization of RHSEG

Parallelization Strategies

Exploiting Postdominance for Speculative Parallelization

Assisting technologies for program parallelization

Shared Memory Parallelization

Temperature-Sensitive Loop Parallelization for Chip Multiprocessors

Basic Loop Parallelization

Parallelization Issues for MINLP