230 likes | 346 Views
Coevolutionary Automated Software Correction. Josh Wilkerson PhD Candidate in Computer Science Missouri S&T. Technical Background. Evolutionary Algorithms (EAs) Subfield of evolutionary computation (in artificial intelligence) Based on biological evolution
E N D
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T
Technical Background • Evolutionary Algorithms (EAs) • Subfield of evolutionary computation (in artificial intelligence) • Based on biological evolution • Uses mutation, reproduction, and selection • Population composed of candidate solutions • Needed: • Solution representation • Fitness function • Applicable to a wide variety of fields • Makes no assumptions about the problem space (ideally)
Technical Background EA Operation Start with an initial population Each generation Create new individuals and evaluate them Population competition (survival of the fittest) Mutation and reproduction Explore the problem space Bring in new genetic material Selection Applies pressure to individuals More fit individuals are selected for mutation and reproduction more often
Technical Background Genetic Programming Type of EA Evolves tree representations E.g., computer program parse trees Coevolution Extension of standard EA Fitness dependency between individuals Dependency can be either cooperative or competitive CASC system uses competitive coevolution Evolutionary arms-race
Reproduction Phase: Programs Randomly select a genetic operation to perform Probability of operation selection is configurable Perform operation, generate new program(s) Add new individuals to population Repeat until specified number of individuals has been created
Reproduction Phase: Programs • Genetic Operations • Reset • Copy • Crossover • Two individuals are randomly selected based off fitness • Randomly select and exchange compatible sub-trees • Generates two new programs • Mutation • Randomly select individual based off fitness • Randomly select and change mutable node • Generate a new sub-tree (if necessary) • Architecture Altering Operations • Reselection is allowed for all operators
Reproduction Phase: Test Cases Reproduction employs uniform crossover Each offspring has a chance to mutate Genes to mutate are selected random Mutated gene is randomly adjusted The amount adjusted is selected from a Gaussian distribution
CASC Implementation Details • Adaptive parameter control • EAs typically have many control parameters • Difficult to find optimal settings for these parameters • In CASC genetic operator probabilities are adaptive parameters • Rewarded/punished based on performance • If one operator is generating improved individuals more than the others make it more likely to be used • Allows the system to adapt to the different phases in the search
CASC Implementation Details • Parallel Computation • Computational complexity is generally a problem for Eas • CASC writes, compiles, and executes hundreds (or even thousands) of C++ programs in a given run • To reduce run times this is done in parallel (on the NIC cluster here on campus) • Main node: responsible for generating and writing programs • Worker nodes: responsible for compiling and executing programs • Dramatically speeds up execution • Investigating new options for this (discussed later)
Current and Future Work • Fitness Function Design • For each new problem CASC needs a new fitness function • Fitness function design can often be difficult • Developing a guide for fitness function design • Starts a program specifications • Walks through the thought process for designing a fitness function for the problem • Long term goal: automate fitness function creation
Current and Future Work • File system slow down • CASC is writing and compiling many many programs each run • I.e., many many files in the file system each run • File system access is bottlenecking the speed of the CASC system • Currently reworking the system to store program files and executables in RAM • Uses a virtually mounted hard disk that stored data in RAM • Expecting a dramatic speed up (fingers crossed…) • Other option: distributed computing (like BOINC, Folding@home, etc.)
Current and Future Work • Scalability • As program size increases so does the problem space • Many more modifications possible • More genetic material • Investigating options to allow CASC to scale with problem size • Current idea: break the program up into pieces • Multiple program populations • Each population is based on a piece of the original program • Each population has its own objective • Cooperative coevolution