130 likes | 320 Views
Genetic Algorithms. Brandon Andrews. Topics. What are genetic algorithms? 3 steps Applications to Bioinformatics. What are genetic algorithms?. Invented and published in 1975 by John Holland Cells have DNA which define properties
E N D
Genetic Algorithms Brandon Andrews
Topics What are genetic algorithms? 3 steps Applications to Bioinformatics
What are genetic algorithms? • Invented and published in 1975 by John Holland • Cells have DNA which define properties • Reproduction crosses DNA from both parents merging properties from both • During this step random mutations can occur • A test of the fitness of the organism is performed • Scores the organism against others based on criteria for survival • Essentially evolution
3 Steps • Selection step • Based on the calculate fitness • Reproduction step • Mutations • Strategies for crossing • Termination step • When the goal is met
Steps Expanded • 1) Generate random properties (chromosomes) for N entities • 2) Calculate their fitness and discard ones that fall below the threshold • Can be determined through a simulation • 3) Randomly cross over pairs that survive the selection step • Also randomly choose properties and mutate them. This could be as simple as jittering them • 4) Go to step 2 until a goal is reached • Return the best set of properties
Fitness Function Could be anything The goal is to minimize or maximize the fitness function normally after each step
Crossover Probability • How often crossovers happens • 0% represents if no crossover and both parents are simply moved to the next step • 100% represents that all of the parents are crossed and only their children are move to the next step • The idea is that hopefully the good properties of both parents are merged or the good parent is preserved completely if it has no flaws that can be fixed via a crossing pair
Mutation Probability • The probability that part of the chromosome is changed after a crossing • 0% if none of it is changed • Not useful since variety is needed to approach the best solution or you’re stuck with the first generated properties • 100% if all of it is changed • Not useful since it negates the point of crossing at all, causes a random search essentially • The concept is to stop the algorithm from halting at a local maximum. The mutations have a chance to generate small better changes
Termination • When the expected error is low • Sometimes it’s hard to calculate an error since the solution isn’t known • Or when the results stop minimizing for a few iterations or stops increasing depending on the problem
Approximate Solutions • Might be obvious, but genetic algorithms are by design approximate solutions since they attempt to optimize to a solution • Perfection is only as good as the fitness function and the number of iterations, crossing and mutation probabilities
Applications • Multiple Sequence Alignment • Initial generation – random generation of an alignment based on the alignments of the given sequences • No authors agree on the initial size of the population • Selection via a tournament style pairing crossing the possible alignments • The fitness function • “Sum of pair” Objective Function (everyone uses a different one) • The survival rate is different for each alignment • Sum all alignment scores together and take a percentage for each alignment • Basically better alignments have a higher percentage to survive
Reproduction • Crossing uses a “one-point crossover” • Takes the first half of the first alignment and cross if with the second half of the second parent • ABCD and EFGH -> ABGH • Or “point-to-point crossover” • Random index is chosen • ABCD and EFGH -> ABCH • Mutation • Remove or insert a gap into the alignment
References Obitko M. (1998). Genetic Algorithms. Retrieved from http://www.obitko.com/tutorials/genetic‑algorithms/ Radenbaugh A. (2008). Applications of genetic algorithms in bioinformatics. Retrieved from http://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=4491&context=etd_theses