360 likes | 449 Views
Competent Program Evolution. Dissertation Defense Moshe Looks December 11 th , 2006. Synopsis. Competent optimization requires adaptive decomposition This is problematic in program spaces Thesis: we can do it by exploiting semantics Results: it works!. General Optimization.
E N D
Competent Program Evolution Dissertation Defense Moshe Looks December 11th, 2006
Synopsis • Competent optimization requires adaptive decomposition • This is problematic in program spaces • Thesis: we can do it by exploiting semantics • Results: it works!
General Optimization • Find a solution s in S • Maximize/minimize f(s) • f:S • To solve this faster than O(|S|), make assumptions about f
Near-Decomposability Complete separability would be nice… Near-decomposability (Simon, 1969) is more realistic Weaker Interactions Stronger Interactions
Exploiting Separability • Separability = independence assumptions • Given a prior over the solution space • represented as a probability vector • Sample solutions from the model • Update model toward higher-scoring points • Iterate... • Works well when interactions are weak
Exploiting Near-Decomposability • Bayesian optimization algorithm (BOA) • represent problem decomposition as a Bayesian Network • learned greedily, via a network scoring metric • Hierarchical BOA • uses Bayesian networks with local structure • allows smaller model-building steps • leads to more accurate models • restricted tournament replacement • promotes diversity • Solves the linkage problem • Competence: solving hard problems quickly, accurately, and reliably
Program Learning • Solutions encode executable programs • execution maps programs to behaviors • exec:PB • find a program p in P • maximize/minimize f(exec(p)) • f:B • To be useful, make assumptions about exec, P, and B
Properties of Program Spaces • Open-endedness • Over-representation • many programs map to the same behavior • Compositional hierarchy • intrinsically organized into subprograms • Chaotic Execution • similar programs may have very different behaviors
Properties of Program Spaces • Simplicity prior • simpler programs are more likely • Simplicity preference • smaller programs are preferable • Behavioral decomposability • f:B is separable / nearly decomposable • White box execution • execution function is known and constant
Thesis • Program spaces not directly decomposable • Leverage properties of program spaces as inductive bias • Leading to competent program evolution
Representation-Building • Organize programs in terms of commonalities • Ignore semantically meaningless variation • Explore plausible variations
Representation-Building • Common regions must be aligned • Redundancy must be identified • Create knobs for plausible variations
Representation-Building • What about… • changing the phase? • averaging two input instead of picking one? • … behavior (semantic) space program (syntactic) space
Statics & Dynamics • Representations span a limited subspace of programs • Conceptual steps in representation-building: • reduction to normal form (x, x + 0 → x) • neighborhood enumeration (generate knobs) • neighborhood reduction (get rid of some knobs) • Create demes to maintain a sample of many representations • deme: a sample of programs living in a common representation • intra-deme optimization: use the hBOA • inter-deme: • based on dominance relationships
Meta-Optimizing Semantic Evolutionary Search (MOSES) • Create an initial deme based on a small set of knobs (i.e., empty program) and random sampling in knob-space • Select a deme and run hBOA on it • Select programs from the final hBOA population meeting the deme-creation criterion (possibly displacing existing demes) • For each such program: • create a new representation centered around the program • create a new random sample within this representation • add as a deme • Repeat from step 2
Artificial Ant →### # # ### # # # # # # #### ##### ## # # # # # # # # # # # # # # # # # ### # # # # # # # # # # # # # # # # # ## ##### # # # # # # ####### # # # #### • Eat all food pellets within 600 steps • Existing evolutionary methods not significantly than random • Space contains many regularities • To apply MOSES: • three reductions rules for normal form • e.g., left, left, left → right • separate knobs for rotation,movement, & conditionals • no neighborhood reduction needed
Artificial Ant • How does MOSES do it? • Searches a greatly reduced space • Exploits key dependencies: • “[t]hese symmetries lead to essentially the same solutions appearing to be the opposite of each other. E.g. either a pair of Right or pair of Left terminals at a particular location may be important.” – Langdon & Poli, “Why ants are hard” • hBOA modeling learns linkage between rotation knobs • Eliminate modeling and the problem still gets solved • but with much higher variance • computational effort rises to 36,000
Elegant Normal Form (Holman, ’90) • Hierarchical normal form for Boolean formulae • Reduction process takes time linear in formula size • 99% of random 500-literal formulae reduced over 98%
Syntactic vs. Behavioral Distance • Is there a correlation between syntactic and behavioral distance? • 5000 unique random formulae of arity 10 with 30 literals each • qualitatively similar results for arity 5 • Computed the set of pairwise • behavioral distances (truth-table Hamming distance) • syntactic distances (tree edit distance, normalized by tree size) • The same computation on the same formulae reduced to ENF
Syntactic vs. Behavioral Distance • Is there a correlation between syntactic and behavioral distance? Random Formulae Reduced to ENF
Neighborhoods & Knobs • What do neighborhoods look like, behaviorally? • 1000 unique random formulae, arity 5, 100 literals each • qualitatively similar results for arity 10 • Enumerate all neighbors (edit distances <2) • compute behavioral distance from source • Neighborhoods in MOSES defined based on ENF • neighbors are converted to ENF, compared to original • used to heuristically reduce total neighborhood size
Neighborhoods & Knobs • What do neighborhoods look like, behaviorally? Random formulae Reduced to ENF
Hierarchical Parity-Multiplexer • Study decomposition in a Boolean domain • Multiplexer function of arity k1 computed from k1 parity function of arity k2 • total arity is k1k2 • Hypothesis: • parity subfunctions will exhibit tighter linkages
Hierarchical Parity-Multiplexer • Computational effort decreases 42% with model-building (on 2-parity-3-multiplexer) • Paritysubfunctions(adjacent pairs)have tightest linkages • Hypothesis validated
Program Growth • 5-parity, minimal program size ~ 53
Program Growth • 11-multiplexer, minimal program size ~ 27
Where do the Cycles Go? N is population size, O(n1.05) l is program size, a is the arity of the space n is representation size, O(a·program size) c is number of test cases
Supervised Classification • Goals: • accuracies comparable to SVM • superior accuracy vs. GP • simpler classifiers vs. SVM and GP
Supervised Classification • How much simpler? • Consider average-sized formulae learned for the 6-multiplexer • MOSES • 21 nodes • max depth 4 • GP (after reduction to ENF!) • 50 nodes • max depth 7 and(or(not(x2) and(or(x1 x4) or(and(not(x1) x4) x6))) or(and(or(x1 x4) or(and(or(x5 x6) or(x2 and(x1 x5))) and(not(x1) x3))) and(or(not(x1) and(x2 x6)) or(not(x1) x3 x6) or(and(not(x1) x2) and(x2 x4) and(not(x1) x3))))) or(and(not(x1) not(x2) x3) and(or(not(x2) and(x3 x6)) x1 x4) and(not(x1) x2 x5) and(x1 x2 x6))
Supervised Classification • Datasets taken from recent comp. bio. papers • Chronic fatigue syndrome (101 cases) • based on 26 SNPs • genes either in homozygosis, in heterozygosis, or not expressed • 56 binary features • Lymphoma (77 cases) & aging brains (19 cases) • based on gene expression levels (continuous) • 50 most-differentiating genes selected • preprocessed into binary features based on medians • All experiments based on 10 independent runs of 10-fold cross-validation
Quantitative Results • Classification average test accuracy:
Quantitative Results • Benchmark performance: • artificial ant • 6x less computational effort vs. EP, 20x less vs. GP • parity problems • 1.33x less vs. EP, 4x less vs. GP on 5-parity • found solutions to 6-parity (none found by EP or GP) • multiplexer problems • 9x less vs. GP on 11-multiplexer
Qualitative Results • Requirements for competent program evolution • all requirements for competent optimization • + exploit semantics • + recombine programs only within bounded subspaces • Bipartite conception of problem difficulty • program-level: adapted from the optimization case • deme-level: theory based on global properties of the space (deme-level neutrality, deceptiveness, etc.)
Qualitative Results • Representation-building for programs: • parameterization based on semantics • transforms program space properties • to facilitate program evolution • probabilistic modeling over sets of program transformations • models compactly represent problem structure
Competent Program Evolution • Competent: not just good performance • explainability of good results • robustness • Vision: representations are important • program learning is unique • representations must be specialized • based on semantics • MOSES: meta-optimizing semantic evolutionary search • exploiting semantics and managing demes
Committee • Dr. Ron Loui (WashU, chair) • Dr. Guy Genin (WashU) • Dr. Ben Goertzel (Virginia Tech, Novamente LLC) • Dr. David E. Goldberg (UIUC) • Dr. John Lockwood (WashU) • Dr. Martin Pelikan (UMSL) • Dr. Robert Pless (WashU) • Dr. William Smart (WashU)