260 likes | 384 Views
Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping. Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science. Typical Uses of MOEAs. Where have MOEAs proven themselves? Wireless Sensor Networks (Woehrle et al, 2010)
E N D
Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science
Typical Uses of MOEAs • Where have MOEAs proven themselves? • Wireless Sensor Networks (Woehrle et al, 2010) • Groundwater Management (Siegfried et al 2009) • Hydrologic model calibration (Tang et al, 2006) • Epoxy polymerization (Deb et al, 2004) • Voltage-controlled oscillator design (Chu et al, 2004) • Multi-spindle gear-box design (Deb & Jain, 2003) • Foundry casting scheduling (Deb & Reddy, 2001) • Multipoint airfoil design (Poloni & Pediroda, 1997) • Design of aerodynamic compressor blades (Obayashi, 1997) • Electromagnetic system design (Michielssen & Weile, 1995) • Microprocessor design (Stanley & Mudge, 1995) • Design of laminated ceramic composites (Belegundu et al, 1994) • Many engineering/design problems!
New Domains for MOEAs • Simulated agents often face multiple objectives • Automatic discovery of intelligent behavior • Video game opponents in Unreal Tournament (van Hoorn, 2009) • Predator/prey scenarios (Schrum & Miikkulainen 2009) • Race car driving in TORCS (Agapitos et al, 2008) • Comparatively little so far • Direct application of MOEA seldom successful • Success often depends on “shaping”
What is Shaping? • Term from Behavioral Psychology • Identified by B. F. Skinner (1938) • Task-Based Example: Train rat to press lever • First reward proximity • Then any interaction with lever • Then actual pressing of lever
Evolutionary Shaping • Environment changes, making task harder • Evolution shapes behavior across generations • Example: Migration given continental drift [1] • Animals become accustomed to short migration • Continental drift increases distance of migration • Ability to travel increasing distances required • EC models with incremental evolution (ex. [2]) Arctic Tern Atlantic Salmon [1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975. [2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.
Fitness-Based Shaping • Not extensively used • Little/no domain knowledge needed • Multiobjective approach a good fit • Selection criteria change • Exploiting ignored objectives (TUG) • Exploiting unfilled niches (BD) Crowded Niches Objective Space Dominated, but exploiting mostly ignored objective Uncrowded Niches Uncrowded Niches Uncrowded Niches Behavior Space
Mutiobjective Optimization • Pareto dominance: iff • Assumes maximization • Want nondominated points • NSGA-II used in this work • What to evolve? • NNs as control policies Nondominated
Constructive Neuroevolution • Genetic Algorithms + Neural Networks • Build structure incrementally (complexification) • Good at generating control policies • Three basic mutations (no crossover used) Perturb Weight Add Connection Add Node
Targeting Unachieved Goals • Main ideas: • Temporarily deactivate “easy” objectives • Focus on “hard” objectives • “Hard” and “easy” defined in terms of goal values • Easy: average fitness “persists” above goal (achieved) • Hard: goal not yet achieved • Objectives reactivated when no longer achieved • Increase goal values when all achieved Evolution Hard Objectives
TUG Example Other goals also achieved → Goals increase Noisy evaluations Goal achieved Reset recency-weighted average
Behavioral Diversity • Originally developed for single-objective tasks [3] • Add behavioral diversity objective • Encourage exploration of new behaviors • Domain-specific behavior measure required • Extensions in this work: • Multiobjective task • Domain independent method • Only requires policy mapping ℝ to ℝ , e.g. NNs Senses N M Actions [3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009.
Behavioral Diversity Details • Behavior vector: • Given input vectors, concatenate outputs • Behavioral diversity objective: • AVG distance from other behavior vectors 0.1 2.3 4.3 5.2 3.2 Behavior vector 0.5 5.3 7.5 3.4 2.1 2.4 4.3 0.7 4.2 … 2.1 3.5 … 1.3 4.2 5.6 4.5 7.7 High average distance from other points
Battle Domain • Evolved monsters (blue) • Monsters can hurt fighter • Scripted fighter (green) • Bat can hurt monsters • Three objectives • Deal damage • Avoid damage • Stay alive • Previous work required incremental evolution to solve
Experimental Comparison • NN copied to 4 monsters • Homogeneous teams • In paper • Control: Plain NSGA-II • TUG: NSGA-II with TUG using expert initial goals • BD: NSGA-II with BD using random input vectors • Additional methods since publication • TUG-Low: NSGA-II with TUG using minimal initial goals • BD-Obs: NSGA-II with BD using inputs from evaluations • Each repeated 30 times
Attainment Surfaces [4] • Result attainment surface • Shows space dominated by single Pareto front • Summary attainment surface s • Union of space dominated in at least s out of n runs • Surface s weakly dominates s+1, etc. Surface 1 Individual surfaces intersect Surface 2 Surface 3 Pareto Fronts (Approximation Sets) Result Attainment Surfaces Summary Attainment Surfaces [4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005.
Final Summary Attainment Surfaces Animation: worst to best summary attainment surface Control TUG BD TUG-Low BD-Obs
Hypervolume Metric [5] • Hypervolume of result attainment surface • Simply “volume” for 3 domain objectives • WRT reference point • Slightly less than minimum scores • Pareto-compliant metric Hypervolume = A + B + C + D [5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998.
Successful Behaviors BD TUG BD-Obs TUG-Low
Discussion • Control: more extreme trade-offs • BD: more precise timing • BD-Obs and BD similar • “Real” inputs give no advantage • TUG: more teamwork • Particular initial objectives • TUG-Low more like BD than TUG • ALL are better than Control
Future Work • How to combine TUG and BD • Naïve combination doesn’t work • Scaling up • Many objectives • More complex domains • Current work in Unreal Tournament promising
Conclusion • BD and TUG improve MO evolution • Domain independence! • Contrast to task-based shaping • Expand MOEAs to a new range of domains
Questions? Email: schrum2@cs.utexas.edu See movies at: http://nn.cs.utexas.edu/?fitness-shaping
TUG Details • Persistence: • Recency-weighted average surpasses goal • Goals: • Initial values based on domain knowledge • Or simply the minimal values for objectives • Increase each goal when all are achieved • Objectives reactivated when no longer achieved Goal achieved