Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping Jacob Schrum and Risto Miikkulainen University of Texas at Austin Department of Computer Science

Typical Uses of MOEAs • Where have MOEAs proven themselves? • Wireless Sensor Networks (Woehrle et al, 2010) • Groundwater Management (Siegfried et al 2009) • Hydrologic model calibration (Tang et al, 2006) • Epoxy polymerization (Deb et al, 2004) • Voltage-controlled oscillator design (Chu et al, 2004) • Multi-spindle gear-box design (Deb & Jain, 2003) • Foundry casting scheduling (Deb & Reddy, 2001) • Multipoint airfoil design (Poloni & Pediroda, 1997) • Design of aerodynamic compressor blades (Obayashi, 1997) • Electromagnetic system design (Michielssen & Weile, 1995) • Microprocessor design (Stanley & Mudge, 1995) • Design of laminated ceramic composites (Belegundu et al, 1994) • Many engineering/design problems!

New Domains for MOEAs • Simulated agents often face multiple objectives • Automatic discovery of intelligent behavior • Video game opponents in Unreal Tournament (van Hoorn, 2009) • Predator/prey scenarios (Schrum & Miikkulainen 2009) • Race car driving in TORCS (Agapitos et al, 2008) • Comparatively little so far • Direct application of MOEA seldom successful • Success often depends on “shaping”

What is Shaping? • Term from Behavioral Psychology • Identified by B. F. Skinner (1938) • Task-Based Example: Train rat to press lever • First reward proximity • Then any interaction with lever • Then actual pressing of lever

Evolutionary Shaping • Environment changes, making task harder • Evolution shapes behavior across generations • Example: Migration given continental drift [1] • Animals become accustomed to short migration • Continental drift increases distance of migration • Ability to travel increasing distances required • EC models with incremental evolution (ex. [2]) Arctic Tern Atlantic Salmon [1] B. F. Skinner. The shaping of phylogenic behavior. Experimental Analysis of Behavior. 1975. [2] Schrum and Miikkulainen. Constructing Complex NPC Behavior via Multiobjective Neuroevolution. 2008.

Fitness-Based Shaping • Not extensively used • Little/no domain knowledge needed • Multiobjective approach a good fit • Selection criteria change • Exploiting ignored objectives (TUG) • Exploiting unfilled niches (BD) Crowded Niches Objective Space Dominated, but exploiting mostly ignored objective Uncrowded Niches Uncrowded Niches Uncrowded Niches Behavior Space

Mutiobjective Optimization • Pareto dominance: iff • Assumes maximization • Want nondominated points • NSGA-II used in this work • What to evolve? • NNs as control policies Nondominated

Constructive Neuroevolution • Genetic Algorithms + Neural Networks • Build structure incrementally (complexification) • Good at generating control policies • Three basic mutations (no crossover used) Perturb Weight Add Connection Add Node

Targeting Unachieved Goals • Main ideas: • Temporarily deactivate “easy” objectives • Focus on “hard” objectives • “Hard” and “easy” defined in terms of goal values • Easy: average fitness “persists” above goal (achieved) • Hard: goal not yet achieved • Objectives reactivated when no longer achieved • Increase goal values when all achieved Evolution Hard Objectives

TUG Example Other goals also achieved → Goals increase Noisy evaluations Goal achieved Reset recency-weighted average

Behavioral Diversity • Originally developed for single-objective tasks [3] • Add behavioral diversity objective • Encourage exploration of new behaviors • Domain-specific behavior measure required • Extensions in this work: • Multiobjective task • Domain independent method • Only requires policy mapping ℝ to ℝ , e.g. NNs Senses N M Actions [3] J.-B. Mouret and S. Doncieux. Using behavioral exploration objectives to solve deceptive problems in neuro-evolution. 2009.

Behavioral Diversity Details • Behavior vector: • Given input vectors, concatenate outputs • Behavioral diversity objective: • AVG distance from other behavior vectors 0.1 2.3 4.3 5.2 3.2 Behavior vector 0.5 5.3 7.5 3.4 2.1 2.4 4.3 0.7 4.2 … 2.1 3.5 … 1.3 4.2 5.6 4.5 7.7 High average distance from other points

Battle Domain • Evolved monsters (blue) • Monsters can hurt fighter • Scripted fighter (green) • Bat can hurt monsters • Three objectives • Deal damage • Avoid damage • Stay alive • Previous work required incremental evolution to solve

Experimental Comparison • NN copied to 4 monsters • Homogeneous teams • In paper • Control: Plain NSGA-II • TUG: NSGA-II with TUG using expert initial goals • BD: NSGA-II with BD using random input vectors • Additional methods since publication • TUG-Low: NSGA-II with TUG using minimal initial goals • BD-Obs: NSGA-II with BD using inputs from evaluations • Each repeated 30 times

Attainment Surfaces [4] • Result attainment surface • Shows space dominated by single Pareto front • Summary attainment surface s • Union of space dominated in at least s out of n runs • Surface s weakly dominates s+1, etc. Surface 1 Individual surfaces intersect Surface 2 Surface 3 Pareto Fronts (Approximation Sets) Result Attainment Surfaces Summary Attainment Surfaces [4] J. Knowles. A summary-attainment surface plotting method for visualizing the performance of stochastic multiobjective optimizers. 2005.

Final Summary Attainment Surfaces Animation: worst to best summary attainment surface Control TUG BD TUG-Low BD-Obs

Hypervolume Metric [5] • Hypervolume of result attainment surface • Simply “volume” for 3 domain objectives • WRT reference point • Slightly less than minimum scores • Pareto-compliant metric Hypervolume = A + B + C + D [5] E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms – a comparative case study. 1998.

Hypervolume

Successful Behaviors BD TUG BD-Obs TUG-Low

Discussion • Control: more extreme trade-offs • BD: more precise timing • BD-Obs and BD similar • “Real” inputs give no advantage • TUG: more teamwork • Particular initial objectives • TUG-Low more like BD than TUG • ALL are better than Control

Future Work • How to combine TUG and BD • Naïve combination doesn’t work • Scaling up • Many objectives • More complex domains • Current work in Unreal Tournament promising

Conclusion • BD and TUG improve MO evolution • Domain independence! • Contrast to task-based shaping • Expand MOEAs to a new range of domains

Questions? Email: schrum2@cs.utexas.edu See movies at: http://nn.cs.utexas.edu/?fitness-shaping

TUG Details • Persistence: • Recency-weighted average surpasses goal • Goals: • Initial values based on domain knowledge • Or simply the minimal values for objectives • Increase each goal when all are achieved • Objectives reactivated when no longer achieved Goal achieved

TUG Cycles

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Evolving Agent Behavior in Multiobjective Domains Using Fitness-Based Shaping

Presentation Transcript

Agent-based Composition of Behavior Models

Potential for using agent based modelling in government

Multiobjective Optimization

Shaping Behavior with Young Children

Agent-Based Evolving Societies From Tribes to Chiefdoms to Nations

Providing Intensive Intervention Using Data-Based Individualization in Behavior

Agent-Based Modeling

Communication model elements for societal behavior representation using agent based models

Build Forge Agent behavior

Agent-Based Computing in Economics

Collective Behavior in Multi-Agent Systems

Evolving Cooperative Strategies in Multi-Agent Systems Using a Coevolutionary Algorithm

Workstation-Based Traffic Shaping

Policy-Based Management of Coalition Domains using KAoS

Fitness Gym Classes In Helena Al - Shaping U

Evolving Multimodal Behavior

Using Application Domains Effectively

Agent-Based Modeling in ArcGIS

01/07/98 A Behavior-Based Reactive Agent