150 likes | 263 Views
Mutation Operator Evolution for EA-Based Neural Networks. By Ryan Meuth. Environment. Reward. State. Agent. Action. State Value Estimate. Action Policy. Reinforcement Learning. Reinforcement Learning. Good for On-Line learning where little is known about environment
E N D
Mutation Operator Evolution for EA-Based Neural Networks By Ryan Meuth
Environment Reward State Agent Action State ValueEstimate Action Policy Reinforcement Learning
Reinforcement Learning • Good for On-Line learning where little is known about environment • Easy to Implement in Discrete Environments • Value estimate can be stored for each state • In infinite time, optimal policy guaranteed. • Hard to Implement in Continuous Environments • Infinite States! Must estimate Value Function. • Neural Networks Can be used for function approximation.
Neural Network Overview • Feed Forward Neural Network • Based on biological theories of neuron operation
Neural Network Overview • Traditionally used with Error Back-Propagation • BP uses Samples to Generalize to Problem • Few “Unsupervised” Learning Methods • Problems with No Samples: On-Line Learning • Conjugate Reinforcement Back Propagation
EA-NN • Both Supervised and Unsupervised Learning Method. • Uses weight set as genome of individual • Fitness Function is Mean-Squared Error over target function. • Mutation Operator is a sample from a Gaussian Distribution. • Possible that mutation operator might not be best.
Uh… Why? • Could improve EA-NN efficiency • Faster Online Learning • Revamped tool for Reinforcment Learning • Smarter Robots. • Why Use an EA? • Knowledge – Independent
Experimental Implementation • First Tier – Genetic Programming • Individual is Parse-tree representing Mutation operator • Fitness is Inverse of sum of MSE’s from EA Testbed • Second Tier – EA Testbed • 4 EA’s, spanning 2 classes of problems • 2 Feed-Forward Non-Linear Approximations • 1 High-Order, 1 Low-Order • 2 Recurrent Time Series Predictions • 1 Will be Time-Delayed, 1 Not Time-Delayed
GP Implementation • Functional Set: {+,-,*,/} • Terminal Set: • Weight to be Modified • Random Constant • Uniform Random Variable • Over-Selection: 80% of Parents from top 32% • Rank-Based Survival • Initialized by Grow Method (Max Depth of 8) • Fitness: 1000/(AvgMSE) – num_nodes • P(Recomb) = 0.5; P(Mutation) = 0.5; • Repair Function • 5 runs, 100 generations each. • Steady State: Population of 1000 individuals, 20 children per generation.
EA-NN Implementation • Recombination: Multi-Point Crossover • Mutation: Provided by GP • Fitness: MSE over test function (minimize) • P(Recomb) = 0.5; P(Mutation) = 0.5; • Non-Generational: Population of 10 individuals, 10 children per generation • 50 Runs of 50 Generations.
Results • This is where results would go. • Single Uniform Random Variable: ~380 • Observed Individuals: ~600 • Improvement! Just have to Wait and See…
Conclusions • I don’t know anything yet.
Questions? Thank You!