330 likes | 772 Views
Genetic Algorithms. Overview. Genetic Algorithms: a gentle introduction What are GAs How do they work/ Why? Critical issues Use in Data Mining GAs and statistics decile performance maximization multi-objective models. Natural Genetics to AI.
E N D
Overview • Genetic Algorithms: a gentle introduction • What are GAs • How do they work/ Why? • Critical issues • Use in Data Mining • GAs and statistics • decile performance maximization • multi-objective models
Natural Genetics to AI • Computational models inspired by biological evolution • survival of the fittest • reproduction through cross-breeding
Genetic Algorithms • Population based search (parallel) • simultaneous search from multiple points in search space • useful in complex, unstructured search spaces (less prone to local failures) Population members: potential solutions • Population of solutions evolve from one generation to the next
Genetic Algorithms • Search objective • Fitness score for population members (fitness function) • Survival of the fittest • selection • Generating new solutions • “Mating” and reproduction of individuals (crossover, mutation)
Basic Operation Recombination Selection Crossover Mutation Generation t Generation t+1
GAs: Parallel Search Fitness X Hill climber X x
GAs: Basic Principles • Representation of individuals • String of parameters (genes) : chromosome eg. optimize a function F(p,q,r,s,t) Population members: p q r s t • genotype and phenotype
Binary representation? • Population members as bit strings F( p,q,r,s,t) as: 1 0 0 1 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 0 p q r s t • early theory in terms of binary strings (schema theorem) • unnecessary perversity?
GAs: Basic Principles • Survival of the fittest (Fitness function) • numerical “figure of merit”/utility measure of an individual • tradeoff amongst a multiple evaluation criteria • efficient evaluation
GAs: Basic Principles • Iterative search • population evolves over generations • Convergence • progression towards uniformity in population • premature convergence? (local optima)
Typical GA Run Fitness Best Average Generations
Operators: Selection • Fitness proportionate selection (fi/f ) • number of reproductive trials for individuals
Selection • Roulette-wheel selection (stochastic sampling with replacement) • wheel spaced in proportion to fitness values • N (pop size) spins of the wheel • Stochastic universal sampling • N equally spaced pins on wheel • single turn of the wheel
Selection • Premature converge • Fitness scaling f = f - (2*avg. - max.) • Ranked fitness • Elitism • Steady-state selection • Demetic grouping
Operators: Crossover Parent 1: axpsqvqbtpihd Parent 2: qzxxaycgbtphw crossover sites Offspring 1: azpsavcbtpphd Offspring 2: qxxxqyqgbtihw (Uniform crossover) • combining good building blocks
Operators: Mutation • alters each gene with small probability x 1 y x 0 y 0 y y 0 x y x y x 1 y x 0 y 1 y y 0 x x x y
Non-Binary Representations • Integer, real-number, order-based, rules, ... • Binary or Real-valued? real representations give faster, more consistent, more accurate results • High-level representation • intuitive, can utilize specialized operators • effective search over complex spaces
Real-valued representation Parent1: 3.45 0.56 6.78 0.976 2.5 Parent2: 0.98 1.06 4.20 0.34 1.8 Offspring1: 3.22 0.56 6.78 0.652.12 Offspring2: 1.43 1.06 4.20 0.411.93 (Arithmetic crossover)
High-level representation Parent1: Parent2: Offspring1: Offspring2:
High-level representation • Generalize/Specialize
Tree-structured representation (GP) • Automated learning of programs (originally) • parse tree expressions • Non-linear interaction terms • Function set : internal nodes • {+,-,*,/,log} • terminal set: leaf nodes • {constants, variables} * / log y x 5 (x log(y))/5)
if AND 0 + < > y y 7 x 2 * x 2 Tree-structured representation • Representing complex patterns If (y<7) and (x>2) then 0 else 2x+y
Genetic search: Issues • Coding scheme, fitness function critical • the “art” in GA design! • General mechanism so robust that, within reasonable margins, parameter settings are not critical. • Representation to match problem, domain • utilizing domain knowledge • problem-specific crossover, mutation, selection • Flexibility in fitness function formulation • modeling business objectives
Genetic search: Issues • Stochastic search • initial populations, probabilistic operators • multiple runs with different random streams • Initializing population with known solutions • seeding initial population with solutions from multiple, independent runs
Genetic search: Issues • Guarantees optimality? • But... • GAs and traditional techniques • especially useful where traditional approaches fail • in conjunction with traditional techniques • Parallelizable for large data • multi-processor, networked machines
Using GAs ? • When to use a GA? • GA and traditional techniques • How long does it take? • Will it perform better?
Using GAs • population size • mutation, crossover rates • how many generations • multiple runs
? Huh? Is it a “black-box”? • Data characteristics • Fitness function • GA parameters
GA Application Examples • Function optimizers • difficult, discontinuous, multi-modal, noisy functions • Combinatorial optimization • layout of VLSI circuits, factory scheduling, traveling salesman problem • Design and Control • bridge structures, neural networks, communication networks design; control of chemical plants, pipelines
GA Application Examples • Machine learning • classification rules, economic modeling, scheduling strategies Portfolio design, optimized trading models, direct marketing models, sequencing of TV advertisements, adaptive agents, data mining, etc.