340 likes | 475 Views
UCT for Tactical Assault Battles in Real-Time Strategy Games. Radha-Krishna Balla 19 February, 2009. Overview. Introduction Related Work Method Experiments & Results Conclusion. Introduction Related Work Method Experiments & Results Conclusion. Domain. RTS games Resource Production
E N D
UCT for Tactical Assault Battles in Real-Time Strategy Games Radha-Krishna Balla 19 February, 2009
Overview Introduction Related Work Method Experiments & Results Conclusion
Introduction Related Work Method Experiments & Results Conclusion
Domain • RTS games • Resource Production • Tactical Planning • Tactical Assault battles
RTS game - Wargus Screenshot of a typical battle scenario in Wargus
Planning problem Large state space Temporal actions Spatial reasoning Concurrency Stochastic actions Changing goals
Introduction Related Work Method Experiments & Results Conclusion
Related Work • Board games – bridge, poker, Go etc., • Monte Carlo simulations • RTS games • Resource Production • Means-ends analysis – Chan et al., • Tactical Planning • Monte Carlo simulations – Chung et al., • Nash strategies – Sailer et al., • Reinforcement learning – Wilson et al., • Bandit-based problems, Go • UCT – Kocsis et al., Gelly et al.,
Our Approach • Monte Carlo simulations • UCT algorithm Advantage • Complex plans from simple abstract actions • Exploration/Exploitation tradeoff • Changing goals
Introduction Related Work Method Experiments & Results Conclusion
Method Planning architecture UCT Algorithm Search space formulation Monte Carlo simulations Challenges
Planning Architecture • Online Planner • State space abstraction • Grouping of units • Abstract actions • Join(G) • Attack(f,e)
UCT Algorithm • Exploration/Exploitation tradeoff • Monte Carlo simulation – get subsequent states • Search tree • Root node – current state • Edges – available actions • Intermediate nodes – subsequent states • Leaf nodes – terminal states • Rollout-based construction • Value estimates
UCT Algorithm – Pseudo Code 1 At each interesting time point in the game: build_UCT_tree(current state); choose argmax action(s) based on the UCT policy; execute the aggregated actions in the actual game; wait until one of the actions get executed; build_UCT_tree(state): for each UCT pass do run UCT_rollout(state); (.. continued)
UCT Algorithm – Pseudo Code 2 UCT_rollout(state): recursive algorithm if leaf node reached then estimate final reward; propagate reward up the tree and update value functions; return; populate possible actions; if all actions explored at least once then choose the action with best value function; else if there exists unexplored action choose an action based on random sampling; run Monte-Carlo simulation to get next state based on current state and action; call UCT_rollout(next state);
UCT Algorithm - Formulae Value Updation: Action Selection:
Domain-specific Challenges State space abstraction - Grouping of units (proximity-based) Concurrency of actions - Aggregation of actions - Join actions – simple - Attack actions – complex (partial simulations)
Planning problem - revisited Large state space – abstraction Temporal actions – Monte Carlo simulations Spatial reasoning – Monte Carlo simulations Concurrency – aggregation of actions Stochastic actions – UCT (online planning) Changing goals – UCT (different objective functions)
Introduction Related Work Method Experiments & Results Conclusion
Experiments Table 1: Details of the different game scenarios
Planners • UCT Planners • UCT(t) • UCT(hp) Number of rollouts – 5000 Averaged over – 5 runs
Planners • Baseline Planners • Random • Attack-Closest • Attack-Weakest • Stratagus-AI • Human
Video – Planning in action Simple scenario <add video> Complex scenario <add video>
Results Figure 1: Time results for UCT(t) and baselines.
Results Figure 2: Hit point results for UCT(t) and baselines.
Results Figure 3: Time results for UCT(hp) and baselines.
Results Figure 4: Hit point results for UCT(hp) and baselines.
Results - Comparison Time results Hit point results U C T (t) U C T (hp) Figures 1, 2, 3 & 4: Comparison between UCT(t) and UCT(hp) metrics
Results Figure 5: Time results for UCT(t) with varying rollouts.
Introduction Related Work Method Experiments & Results Conclusion
Conclusion • Conclusion • Future Work • Engineering aspects • Machine Learning techniques • Beyond Tactical Assault