280 likes | 560 Views
Adaptive Game AI, Unreal Tournament, Reinforcement Learning ... Our contribution: Online Learning of Sub-symbolic Game AI for Team based First-Person ...
E N D
1. RETALIATE: Learning Winning Policies in First-Person Shooter Games Dept. of Computer Science & Engineering
Lehigh University
2. Outline Introduction
Adaptive Game AI
Domination games in Unreal Tournament
Reinforcement Learning
Adaptive Game AI with Reinforcement Learning
RETALIATE architecture and algorithm
Empirical Evaluation
Final Remarks Main Lessons
3. Introduction Adaptive Game AI, Unreal Tournament, Reinforcement Learning
4. Adaptive AI in Games Symbolic Compositional syntax where atoms have meaning in themselves. (fol, other interpreted logical theory)
Sub-symbolic lack atomic elements that are themsleves meaningful representations (pixels
GA as search technique, not learning technique
Symbolic Compositional syntax where atoms have meaning in themselves. (fol, other interpreted logical theory)
Sub-symbolic lack atomic elements that are themsleves meaningful representations (pixels
GA as search technique, not learning technique
5. Adaptive Game AI and Learning Learning Motivation
Combinatorial explosion of possible situations
Tactics (e.g., competing teams tactics)
Game worlds (e.g., map where the game is played)
Game modes (e.g., domination, capture the flag)
Little time for development
Learning the Cons
Difficult to control and predict Game AI
Difficult to test
6. Reinforcement Learning Agents learn policies through rewards and punishments
Policy - Determines what action to take from a given state (or situation)
Agents goal is to maximize some reward
Tabular vs. Generalization Techniques
We maintain a Q-Table:
Q-table: State Action ? value
Supervisor -> labeled training; access to correct output
Curse of dimensionality
Tabular limited to small numbers of states and actions; not just memory, but time and data needed to fill accurately
Generalization uses limited subset of state space to produce good approximation over much larger subsetSupervisor -> labeled training; access to correct output
Curse of dimensionality
Tabular limited to small numbers of states and actions; not just memory, but time and data needed to fill accurately
Generalization uses limited subset of state space to produce good approximation over much larger subset
7. Unreal Tournament (UT) Online FPS developed by Epic Games Inc. 1999
Six gameplay modes including team deathmatch and domination games
Gamebots: a client-server architecture for controlling bots started by U.S.C. Information Sciences Institute (ISI) Direct competetor to Quake III arena
U so cal.Direct competetor to Quake III arena
U so cal.
8. UT Domination Games A number of fixed domination locations.
Ownership: the team of last player to step into location
Scoring: a team point awarded for every five seconds location remains controlled
Winning: first team to reach pre-determined score (50) Top down view of the mapTop down view of the map
9. Adaptive Game AI with RL RETALIATE (Reinforced Tactic Learning in Agent-Team Environments)
Tactic versus strategy?Tactic versus strategy?
10. The RETALIATE Team Controls two or more UT bots
Commands bots to execute actions through the GameBots API
The UT server provides sensory (state and event) information about the UT world and controls all gameplay
Gamebots acts as middleware between the UT server and the Game AI Emphasize that bots are plug-ins. We learn strategies, not individual bot tactics.Emphasize that bots are plug-ins. We learn strategies, not individual bot tactics.
11. The RETALIATE Algorithm
12. Initialization
13. Rewards and Utilities
14. Rewards and Utilities ???But will not necessarily converge to an optimal policy
???But will not necessarily converge to an optimal policy
15. State Information and Actions Curse of dimensionalityCurse of dimensionality
16. Managing (State x Action) Growth Our Table:
States: ( {E,F,N}, {E,F,N}, {E,F,N} ) = 27
Actions: ( {L1, L2, L3}, ) = 27
27 x 27 = 729
Generally, 3#loc x #loc#bot
Adding health, discretized (high, med, low)
States: (, {h,m,l}) = 27 x 3 = 81
Actions: ( {L1, L2, L3, Health}, ) = 43 = 64
81 x 64 = 5184
Generally, 3(#loc+1) x (#loc+1)#bot
Number of Locations, size of team frequently varies.
17. Empirical Evaluation Opponents, Performance Curves, Videos
18. The Competitors Htn is previous work that beat the other three. We didnt modify the HTN bots .. And its KB was successful. Therefore HTN bots is NOT a straw-man for proving retaliate, just a good opponent that IS currently FREELY available (since 2005)Htn is previous work that beat the other three. We didnt modify the HTN bots .. And its KB was successful. Therefore HTN bots is NOT a straw-man for proving retaliate, just a good opponent that IS currently FREELY available (since 2005)
19. Summary of Results
Against the opportunistic, possessive, and greedy control strategies, RETALIATE won all 3 games in the tournament.
within the first half of the first game, RETALIATE developed a competitive strategy.
Possibly get rid of OLD, and replace with the graphs depicting performance on the static opponents
Make a point to say HTNBots won all games against non RL opponents.
Maybe include the raph that cycles through opponents w/ retained q-table because it motivates that the strategy needs to remain dynamic!Possibly get rid of OLD, and replace with the graphs depicting performance on the static opponents
Make a point to say HTNBots won all games against non RL opponents.
Maybe include the raph that cycles through opponents w/ retained q-table because it motivates that the strategy needs to remain dynamic!
20. Summary of Results: HTNBots vs RETALIATE (Round 1) Epsilon isnt changing so cant talk about exploring vs. exploiting
What I really mean is how well the exploitation is working.Epsilon isnt changing so cant talk about exploring vs. exploiting
What I really mean is how well the exploitation is working.
21. Summary of Results: HTNBots vs RETALIATE (Round 2) Same thing about exploit vs. explore be carefulSame thing about exploit vs. explore be careful
22. Video: Initial Policy Add caption on left. What is red, blue? Draw wedge of a circle, explain apex is center, and rest is angle of viewAdd caption on left. What is red, blue? Draw wedge of a circle, explain apex is center, and rest is angle of view
23. Video: Learned Policy Keep same key on this slide.Keep same key on this slide.
24. Final Remarks Lessons Learned, Future Work
25. Final Remarks (1) From our work with RETALIATE we learned the following lessons, beneficial to any real-world application of RL for these kinds of games:
Separate individual bot behavior from team strategies.
Model the problem of learning team tactics through a simple state formulation.
The use of non-discounted rewards works well in this domain. - Need to say that we originally tried much more state information, but learned through experiments to keep the state information small.
- Explain how we learned we should separate strat from plug-in bots (ie state info again)- Need to say that we originally tried much more state information, but learned through experiments to keep the state information small.
- Explain how we learned we should separate strat from plug-in bots (ie state info again)
26. Final Remarks (2) It is very hard to predict all strategies beforehand
As a result, RETALIATE was able to find a weakness and exploit it to produce a winning strategy that HTNBots could not counter
On the other hand HTNBots produce winning strategies against the other opponents from the beginning while it took RETALIATE half a game in some situations
Tactics emerging from RETALIATE might be difficult to predict, a game developer will have a hard time maintaining the Game AI
Future Work: This suggest that a combination of HTN Planning to lay down initial strategies and using Reinforcement Learning to tune these strategies should address individual weaknesses from both approaches
Future Work: This suggest that a combination of HTN Planning to lay down initial strategies and using Reinforcement Learning to tune these strategies should address individual weaknesses from both approaches
27. Thank you! Questions?
28. REMBER Emphasis should be given on main lessons: simple domain representation, gamma = 1, etc. I think we outline a response to John and we included some of this in the book chapter, so we should include this in the presentation.
There might be questions about other RL approaches and whether we didn't try them. We should think of an answer to that. Also a question if we tried gamma less than one (my recollection is that we did and Megan reported that it was converging too slowly towards a competitive policy)