690 likes | 712 Views
This PhD proposal explores the challenge of discovering and evolving multimodal behavior in noisy and complex domains such as simulations, video games, and robotics. Previous approaches, including design approaches, value-function based approaches, and evolutionary approaches, have limitations that need to be addressed. The proposal suggests using multiobjective neuroevolution, specifically the NSGA-II algorithm, to evolve multimodal behavior. The battle domain is proposed as a testbed to experiment with evolving multimodal teamwork. Research questions include comparing NSGA-II with the weighted sum method and investigating the evolution of homogeneous and heterogeneous teams.
E N D
Evolving Multimodal Behavior PhD Proposal Jacob Schrum 11/4/09
Introduction • Challenge: Discover behavior automatically • Simulations, video games, robotics • Why challenging? • Noisy sensors • Complex domains • Continuous states/actions • Multiple agents, teamwork • Multiple objectives • Multimodal behavior required (focus)
What is Multimodal Behavior? • Working definition: • Agent exhibits distinct kinds of actions under different circumstances • Examples: • Offensive & defensive modes in soccer • Search for weapons or opponents in video game • Animal with foraging & fleeing modes • Very important for teams • Roles correspond to modes • Example domains will involve teamwork
Previous Approaches • Design Approaches • Hand code in a structured manner • Value-Function Based Approaches • Learn the utility of actions (RL) • Evolutionary Approaches • Selectively search based on performance
Design • Subsumption Architecture (Brooks 1986) • Hierarchical design • Lower levels independent of higher levels • Built incrementally • Common in robotics • Hand coded
Value-Function Based (1/2) • MAXQ (Dietterich 1998) • Hand-designed hierarchy • TD learning at multiple levels • Reduce state space • Taxi domain • Still just a grid world • Discrete state & action
Value-Function Based (2/2) • Basis Behaviors (Matarić 1997) • Low-level behaviors pre-defined • Learn high-level control • Discrete state space • High-level features (conditions) • Reward shaping necessary • Applied to real robots • Too much expert knowledge
Evolutionary (1/2) • Layered Evolution (Togelius 2004) • Evolve components of subsumption architecture • Applied to: • EvoTanks (Thompson and Levine 2008) • Unreal Tournament 2004 (van Hoorn et al. 2009) • Must specify: • Hierarchy • Training tasks • Similar to Layered Learning (Stone 2000)
Evolutionary (2/2) • Neuro-Evolving Robotic Operatives (Stanley et al. 2005) • ML game • Train robot army • Many objectives • Weighted sum: z-scores method • User changes weights during training • Dynamic objective management • Leads to multimodal behavior
Multiple Objectives • Multimodal problems are typically multiobjective • Modes associated with objectives • Traditional: weighted sum (Cohon 1978) • Must tune the weights • Only one solution • Bad for non-convex surfaces • Need better formalism Cannot be captured by weighted sum Each point corresponds to one set of specific weights
Greatest Mass Sarsa (Sprague and Ballard 2003) • Multiple MDPs with shared action space • Learn via Sarsa(0) update rule: • Best value is sum of component values: • Used in sidewalk navigation task • Like weighted sum
Convex Hull Iteration (Barrett and Narayanan 2008) • Changes MDP formalism: • Vector reward • Find solutions for all possible weightings where • Maximize: • Results in compact set of solutions • Different trade-offs • Cannot capture non-convex surfaces • Discrete states/actions only • Need a way to capture non-convex surfaces!
Pareto-based Multiobjective Optimization (Pareto 1890) • Imagine game with two objectives: • Damage Dealt • Health Remaining • dominates iff • Population of points not dominated are best: Pareto Front High health but did not deal much damage Tradeoff between objectives and Dealt lot of damage, but lost lots of health
Non-dominated Sorting Genetic Algorithm II(Deb et al. 2000) • Population P with size N; Evaluate P • Use mutation to get P´ size N; Evaluate P´ • Calculate non-dominated fronts of {P È P´} size 2N • New population size N from highest fronts of {P È P´}
Constructive Neuroevolution • Genetic Algorithms + Neural Networks • Build structure incrementally • Good at generating control policies • Three basic mutations (no crossover used) • Other structural mutations possible • More later Perturb Weight Add Connection Add Node
Evolution of Teamwork • Homogeneous • Shared policy • Individuals know how teammates act • Individuals fill roles as needed: multimodal • Heterogeneous • Different roles • Cooperation harder to evolve • Team-level multimodal behavior
Completed Work • Benefits of Multiobjective Neuroevolution • Pareto-based leads to multimodal behavior • Targeting Unachieved Goals (TUG) • Speed up evolution with objective management • Evolving Multiple Output Modes • Allow networks to have multiple policies/modes • Need a domain to experiment in …
Battle Domain • Evolved monsters (yellow) • Scripted fighter (green) • Approach nearest monster • Swing bat repeatedly • Monsters can hurt fighter • Bat can hurt monsters • Multiple objectives • Deal damage • Avoid damage • Stay alive • Can multimodal teamwork evolve?
Benefits of Multiobjective Neuroevolution • Research Questions: • NSGA-II better than z-scores (weighted sum)? • Homogeneous or heterogeneous teams better? • 30 trials for each combination • Three evaluations per individual • Average scores to overcome noisy evals
Incremental Evolution • Hard to evolve against scripted strategy • Could easily fail to evolve interesting behavior • Incremental evolution against increasing speeds • 0%, 40%, 80%, 90%, 95%, 100% • Increase speed when all goals are met • End when goals met at 100%
Goals • Average population performance high enough? • Then increase speed • Each objective has a goal: • At least 50 damage to bot (1 kill) • Less than 20 damage per monster on average (2 hits) • Survive at least 540 time steps (90% of trial) • AVG population objective score met goal value? • Goal achieved
Evolved Behaviors • Baiting + Side-Swiping • Lure fighter • Turns allow team to catch up • Attacks on left side of fighter • Taking Turns • Hit and run • Next counter-clockwise monster rushes in • Fighter hit on left side • Multimodal behaviors!
Multiobjective Conclusions • NSGA-II faster than z-scores • NSGA-II more likely to generate multimodal behavior • Many runs did not finish/were slow • Several “successful” runs did not have multimodal behavior
Targeting Unachieved Goals • Research Question: • How to speed up evolution, make more reliable • When objective’s goal is met, stop using it • Restore objective if scores drop below goal • Focuses on the most challenging objectives • Combine NSGA-II with TUG Evolution Tough Objectives
Evolved Behaviors • Alternating Baiting • Bait until another monster hits • Then baiting monster attacks • Fighter knocked back and forth • Synchronized Formation • Move as a group • Fighter chases one bait • Other monster rushes in with side swipe attacks • More multimodal behaviors!
TUG Conclusions • TUG results in huge speed-up • No wasted effort on achieved goals • TUG runs finish more reliably • Heterogeneous runs have more multimodal behavior than homogeneous • Some runs still did not finish • Some “successful” runs still did not have multimodal behavior
Fight or Flight • Separate Fight and Flight trials • Fight = Battle Domain • Flight: • Scripted prey (red) instead of fighter • Has no bat; has to escape • Monsters confine and damage • New objective: Deal damage in Flight • Flight task requires teamwork • Requires multimodal behavior
New-Mode Mutation • Encourage multimodal behavior • New mode with inputs from preexisting mode • Initially very similar • Maximum preference node determines mode
Evolving Multiple Output Modes • Research Question: • How to evolve teams that do well in both tasks • Compare 1Mode to ModeMutation • Three evals in Fight and three in Flight • Same networks for two different tasks
1Mode Behaviors • Aggressive + Corralling • Aggressive in Fight task • Take lots of damage • Deal lots of damage • Corralling in Flight task • Run/Rush + Crowding • Run/Rush in Fight task • Good timing on attack • Kill fighter w/o taking too much damage • Crowding in Flight task • Get too close to prey • Knock prey out and it escapes • Networks can’t handle both tasks!
ModeMutation Behaviors • Alternating Baiting + Corralling • Alternating baiting in Fight task • Corralling in Flight task • Spread out to prevent escape • Individuals rush in to attack • Hit into Crowd + Crowding • Hitting into Crowd in Fight task • One attacker knocks fighter into others • Crowding in Flight task • Rush prey, ricochet back and forth • Some times knocks prey free • Networks succeed at both tasks!
Mode Mutation Conclusions • ModeMutation slower than 1Mode • ModeMutation better at producing multimodal behaviors • Harder task resulted in more failed runs • Many unused output modes created • Slows down execution • Bloats output layer
Proposed Work • Extensions • Avoiding Stagnation by Promoting Diversity • Extending Evolution of Multiple Output Modes • Heterogeneous Teams Using Subpopulations • Open-Ended Evolution + TUG • Evaluate in new tasks • Killer App: Unreal Tournament 2004
1. Avoiding Stagnation by Promoting Diversity • Behavioral diversity avoids stagnation • Add a diversity objective (Mouret et al. 2009) • Behavior vector: • Given input vectors, concatenate outputs • Diversity objective: • AVG distance from other behavior vectors in pop. … -1 2.2 1.2 -2 … 0.5 1 0.2 1.7 -2 1.5 -1 0.6 0.3 2 …
2. Extending Evolution of Multiple Output Modes • Encourage mode differences • Random input sources • Probabilistic arbitration • Bad modes less likely to persist • Like softmax action selection • Restrict New-Mode Mutation • New objective: punish unused modes, reward used modes • Delete similar modes • Based on behavior metric • Limit modes: make best use of limited resources • Dynamically increase the limit?
3. Heterogeneous Teams Using Subpopulations • Each team member from different subpopulation (Yong 2007) • Encourages division of labor across teammates • Different roles leads to multimodal team behavior
4. Open-Ended Evolution + TUG • Keep increasing goals • Evolution has something to strive towards • Preserves benefits of TUG • Does not settle early • When to increase goals? • When all goals are achieved • As individual goals are achieved
New Tasks • More tasks require more modes • Investigate single-agent tasks • Only teams so far • Investigate complementary objectives • TUG only helps contradictory? • Are hard when combined with others • Tasks: • Predator • Opposite of Flight • Partial observability • Sink the Ball • Very different from previous • Needs more distinct modes? • Less mode sharing?
Unreal Tournament 2004 • Commercial First-Person Shooter (FPS) • Challenging domain • Continuous state and action • Multiobjective • Partial information • Multimodal behaviors required • Programming API: Pogamut • Competitions: • Botprize • Deathmatch
Unreal Deathmatch • Packaged bots are hand-coded • Previous winners of botprize hand-coded • Learning attempts • Simplified version of game (van Hoorn et al. 2009) • Limited to certain behaviors (Kadlec 2008) • Multimodal behavior in full game: not done yet
Unreal Teams • Team Deathmatch • Largely ignored? • Capture the Flag • Teams protect own flag • Bring enemy flag to base • GP approach could not beat UT bots (Kadlec 2008) • Domination • King of hill • Teams defend key locations • RL approach learned group strategy of hand-coded bots (Smith et al. 2007)
Review • System for developing multimodal behavior • Multiobjective Evolution • Targeting Unachieved Goals • New-Mode Mutation • Behavioral Diversity • Extending Mode Mutation • Subpopulations • Open-Ended Evolution • Final evaluation in Unreal Tournament 2004
Conclusion • Create system: • Automatically discovers multimodal behavior • No high-level hierarchy needed • No low-level behaviors needed • Works in continuous, noisy environments • Discovers team behavior as well • Agents w/array of different useful behaviors • Lead to better agents/behaviors in simulations, games and robotics