220 likes | 360 Views
CENG 562 MACHINE LEARNING. GENETIC ALGORITHMS Presented by Abdurrahman Çarkacıoğlu. What is GA? Biological Basis Terminology Operations Fitness Function Reproduction Crossover Mutation. Genetic Programming Traditional Methods vs. GA. Applications and Research Topics
E N D
CENG 562MACHINE LEARNING GENETIC ALGORITHMS Presented by Abdurrahman Çarkacıoğlu
What is GA? Biological Basis Terminology Operations Fitness Function Reproduction Crossover Mutation Genetic Programming Traditional Methods vs. GA. Applications and Research Topics Summary & Conclusion OUTLINE
WHAT IS GA ?. • Search algorithms based on • mechanics of natural selection • natural genetics • Robust, flexible, efficient in vast complex spaces. • Use the idea of randomness search but not directionless.
Chromosomes Heredity aspects of the individual Genes Encodes a specific feature of the individual. Actual value allele. Mating Both parent pass their chromosomes onto their offsprings. The weaker ones tend to die Stronger ones survive and produce new offsprings. Rarely mutation occurs BIOLOGICAL BASIS OF GA
GA’s TERMINOLOGY • String Structures Chromosomes (Hypothesis) • Elements of the String Gene's (Allele) • Hypothesis/String Structures • Composed of features, or detectors. A Hypothesis gene1 gene2 gene-n
REPRESENTATION of HYPOTHESIS • Bit Strings (most of the time) • Examples: Outlook={Sunny,Overcast,Rain} 001 encodes Outlook=Rain 011 encodes Outlook=Overcast or Rain 111 encodes Outlook=Sunny or Overcast or Rain
Rule Representation Example • Rule : Whatever the outlook, if wind=strong then PlayTennis. Given: Wind {Strong,Weak} PlayTennis {Yes,No} Solution: OutlookWindPlayTennis 1111010
BASIC GA OPERATIONS • Inner workings of GA is simple. • Based on • simple string copying • substracting concetation • nothing more,nothing less.
FITNESS FUNCTION • Returns a single numerical fitness value. • Problem dependent • For a function optimization search, it is simply the value of the function. • For learning classification rules, it is classification accuracy over a set of training examples
REPRODUCTION • Allows individual strings to be copied for possible inclusion in the next generation. • Copying chance is based on the fitness value. StringFtnPerc. 01001 5 19% 10000 12 46% 01110 9 35% • 46% of the time 10000 is selected.
CROSSOVER • Produce 2 new offspring from 2 parent string. Initial Strings Offsprings Single Point Crossover Two-point Crossover
MUTATION • Applied after crossover. • Randomly alters each gene with a small probability, typically 0,001. Point Mutation Example Initial StringAfter Mutation 11101101 11111101
Pseudo Code for GA • choose initial population at random • evaluate each individual's fitness • repeat select individuals to reproduce mate pairs at random apply crossover operator apply mutation operator evaluate each individual's fitness • until terminating condition
RECENT WORKS • Using real numbers instead of bit strings • Use of knowledge-augmented genetic operators for GAs. • Strings may be variable lengths.
GENETIC PROGRAMMING • Branch of genetic algorithms • Difference is the representation of the solution • Creates computer programs in the Lisp or Scheme computer languages as the solution.
Pseudo Code for GP 1)Create an initial population of random composition of the functions and terminals of the problem. 2) Execute each program, and assign it a fitness value according to the how well it solves a problem. 3)Create new population of computer programs. • )Copy the best existing programs. • )Create new programs by mutation • )Create new computer programs by crossover 4) Announce the best computer program when predefined conditions are satisfied.
GA vs. GP • GP is much more powerfull than GA. • The output of the GA is a quantity, while output of GP is a another computer program. • Computer programs that program themselves. • Where there is no ideal solution, GP works best (ex. A program that drives a car).
Traditional Methods vs. GAs • .GA a coded form of the function values, rather than with the actual values themselves. • .GA use a set, or population not just a single point on the problem space. • .GA use only payoff information to guide themselves. • .Probabilistic, not deterministic. • .Inherently parallel.
A Brief Survey of GA Technology • .Criminal suspect recognition • .Music Composition • .Constructing and Training Neural Networks • .Scheduling Problems • .Games Playing • .Structural Optimization • .Function Optimization • .Database Query Optimization • Aircraft Design
GABIL by DeJong (1993) • Uses GA to learn boolean concepts, represented by disjunctive set of propositional rules. • Roughly comparable performance among C4.5, ID5R and AQ14 systems. • Over 12 synthetic problems Average generalization accuracy for GABIL 92.1% Others 91.6% to 96.6%
Summary & Conclusions • conduct a randomized, parallel, hill-climbing search for hypothesis that optimize a predefined fitness function. • operate on populations of strings/hypothesis/chromosomes. • string represents some underlying parameter set. • reproduction, crossover and mutation are used to create new populations.