830 likes | 968 Views
Testbed for Integrating and Evaluating Learning Techniques. TIELT. David W. Aha 1 & Matthew Molineaux 2 1 Intelligent Decision Aids Group Navy Center for Applied Research in AI Naval Research Laboratory; Washington, DC 2 ITT Industries; AES Division; Alexandria, VA
E N D
Testbed for Integrating and Evaluating Learning Techniques TIELT David W. Aha1 & Matthew Molineaux2 1Intelligent Decision Aids Group Navy Center for Applied Research in AI Naval Research Laboratory; Washington, DC 2ITT Industries; AES Division; Alexandria, VA first.surname@nrl.navy.mil 17 November 2004
Thanks to our sponsor: Outline • Motivation: Learning in cognitive systems • Objectives: • Encourage machine learning research on complex tasks that require knowledge-intensive approaches • Provide industry & military with access to the results • Design: TIELT functionality & components • Example: Knowledge base content • Status: • Implementation & documentation • Collaborations & events • Task list • Summary
IPTO IXO MTO … Information Processing Technology Office • Selected previous achievements • Timesharing, Internet, Email, Speech Understanding, LISP, … • Current focus: Cognitive Systems DARPA Defense Advanced Research Projects Agency (~$2.3B/yr)
Cognitive Systems • A cognitive system is one that • can reason, using substantial amounts of appropriately represented knowledge • can learn from its experience so that it performs better tomorrow than it did today • can explain itself and be told what to do • can be aware of its own capabilities and reflect on its own behavior • can respond robustly to surprise “Systems that know what they’re doing”
Affect Anatomy of a Cognitive Agent Reflective Processes LTM CognitiveAgent Concepts STM Deliberative Processes Learning Other reasoning Sentences Communication (language, gesture, image) Prediction, planning Perception Action Reactive Processes Sensors Effectors External Environment Attention (Brachman, 2003)
Learning in Cognitive Systems(Langley & Laird, 2002) Many opportunities exist for learning in cognitive systems
Complication Machine learning (ML) researchers tend to investigate: ¬Rapid: Knowledge poor algorithms ¬Enduring: Learning over a short time period ¬Embedded: Stand-alone evaluations Status of Learning in Cognitive Systems Few deployed cognitive systems integrate techniques that exhibit rapid & enduring learning behavior on complex tasks • It’s costly to integrate & evaluate embedded learning techniques Problem
TIELT Motivation • We want Cognitive Agents that Learn • Rapidly, • in context, and • over the long-term. • We have few (if any) of them
TIELT Objective Encourage the study of research on learning in cognitive systems, with subsequent transition goals Learning Modules Cognitive Agents That Learn Military ML Researchers Cognitive Agents Industry
Current ML Research Focus Benchmark studies of multiple algorithms on simple (e.g., supervised) learning tasks from many static datasets ML Researcher Database1 ML System1 m results on System1 Analysis Benchmark Analysis Database2 ML System2 m results on System2 . . . . . . . . . Databasem ML Systemn m results on Systemn This was encouraged (in part) by the availability of datasets in a standard (interface) format
Supervised Learning Supervised Learning ML System ML System Reasoning System Reasoning System Interface (standard format) Interface (standard format) Database Database (e.g., UCI Repository) (e.g., UCI Repository) Limitation • Only useful for isolated ML studies • Has not encouraged studies of ML in cognitive systems Previous API for ML Investigations Inspiration • UC Irvine Repository of Machine Learning (ML) Databases • An interface for empiricalbenchmarking studies on supervised learning • 1525 citations (and many publications use it w/o citing) since 1986 Supervised Learning ML Systemj Decision Systemk Interface (standard format) Databasei
Supervised Learning Supervised Learning ML System ML System Reasoning System Reasoning System Interface (standard format) Interface (standard format) Database Database (e.g., UCI Repository) (e.g., UCI Repository) Cognitive Learning Cognitive Learning Cognitive Learning Reasoning Modules Reasoning Modules Decision Systemk ML Module ML Module ML Module Sensors Sensors Sensors Supervised Learning ML Module ML Module ML Module World (Simulated/Real) Worldi (Simulated/Real) World (Simulated/Real) Interface (standard API) Interface (standard API) Interface (standard API) ML Module ML Module ML Modulej ML Systemj Decision Systemk Interface (standard format) Databasei Effectors Effectors Effectors (e.g., TIELT) (e.g., TIELT) (e.g., TIELT) (e.g., UCI Repository of ML Databases) Accomplishing TIELT’s Objective One approach: Shift ML research focus from static datasets to dynamic simulators of rich environments
Refining TIELT’s Objective Objective Develop a tool for evaluating decision systems in simulators • Specific support for evaluating learning techniques • Demonstrate research utility prior to approaching industry/military Benefits • Reduce system-simulator integration costs from m*n to m+n (see next) • Permits benchmark studies on selected simulator tasks • Encourages study of ML for knowledge-intensive problems • Provide support for DARPA Challenge Problems on Cognitive Learning
Integrating a simulator & cognitive system: Its expensive! (time, $) Simulator1 Cognitive System1 Problem: Prohibitive integration costs retard research progress m*n integrations Simulator1 Cognitive System1 . . . . . . Simulatorm Cognitive Systemn Proposed Solution: Standardize integrations to reduce costs m+n integrations Simulator1 Cognitive System1 TIELT . . . . . . Simulatorm Cognitive Systemn Reducing Integration Costs
What Domain? Desiderata • Available implementations (cheap to acquire & run) • Challenging problems for CogSys/ML research • Significant interest (academia, military, industry,funding, public) Simulation Games?
Gaming Genres of Interest(modified from (Laird & van Lent, 2001)) Genre Example Description Sub-Genres AI Roles Action Quake, Unreal Control a character 1st vs. 3rd person, solo vs team play Control enemies Role-Playing Temple of Elemental Evil Be a character (includes puzzle solving, etc.) Solo vs. (massively) multi-player Control enemies, partners, and supporting characters Strategy (real-time, discrete) Empire Earth 2, AoE, Civilization Controlling at multiple levels (e.g., strategic, tactical warfare) God, first-person perspectives Control all units and strategic enemies Team Sports Madden NFL Football Act as coach and a key player Control units and strategic enemy (i.e., other coach), commentator 1st vs. 3rd person Individual Sports Many (e.g., driving games) Individual competition Control enemy
Some Game Environment Challenges • Significant background knowledge available • e.g., Processes, tasks, objects, actions • Use: Provide opportunities for rapid learning • Adversarial • Collaborative • Multiple reasoning levels (e.g., strategic, tactical) • Real-time • Uncertainty (“Fog of War”) • Noise (e.g., imprecision) • Relational (e.g., social networks) • Temporal • Spatial
Academia: Learning in Simulation Games Focus: Broad interests • Game engines (e.g., GameBots, ORTS, RoboCup Soccer Server) • Use (other) open source engines (e.g., FreeCiv, Stratagus) • Representation (e.g., Forbus et al., 2001; Houk, 2004; Munoz-Avila & Fisher, 2004) • Learning opponent unit models (e.g., Laird, 2001; Hill et al., 2002) • … (see table) Evidence of commitment • Interactive Computer Games: Human-Level AI’s Killer Application(Laird & van Lent, AAAI’00 Invited Talk) • Meetings • AAAI symposia (several in recent years) • International Conference on Computers and Games • AAAI’04 Workshop on Challenges in Game AI • AI in Interactive Digital Entertainment Conference (2005-) … • New journals focusing on (e.g., real-time) simulation games • J. ofGame Development • Int. J. of Intelligent Games and Simulation
Survey: Selected Previous Work onLearning & Gaming Simulators
Industry: Learning in Simulation Games Focus: Increase sales via enhanced gaming experience • USA: $7B in sales in 2003 (ESA, 2004) • Strategy games: $0.3B • Simulators: Many! (e.g., SimCity, Quake, SoF, UT) • Target: Control avatars, unit behaviors Evidence of commitment • Developers: “keenly interested in building AIs that might learn, both from the player & environment around them.” (GDC’03 Roundtable Report) • Middleware products that support learning (e.g., MASA, SHAI, LearningMachine) • Long-term investments in learning (e.g., iKuni, Inc.) • Conferences: • Game Developer’s Conference • Computer Game Technology Conference
Industry: Learning in Simulation Games Status • Few deployed systems have used learning (Kirby, 2004): e.g., • Black & White: on-line, explicit (player immediately reinforces behavior) • C&C Renegade: on-line, implicit (agent updates set of legal paths) • Re-volt: off-line, implicit (GA tunes racecar behaviors prior to shipping) • Problems: Performance, constraints (preventing learning “something dumb”), trust in learning system Some Promising Techniques (Rabin, 2004) • Belief networks for probabilistic inference • Decision tree learning • Genetic algorithms (e.g., for offline parameter tuning) • Statistical prediction (e.g., using N-grams to predict future events) • Neural networks (e.g., for offline applications) • Player modeling (e.g., to regulate game difficulty, model reputation) • Reinforcement learning • Weakness modification learning (e.g., don’t repeat failed strategies)
Military: Learning in Simulation Games Focus: Training, analysis, & experimentation • Learning: Acquisition of new knowledge or behaviors • Simulators: JWARS, OneSAF, Full Spectrum Command, etc. • Target: Control strategic opponent or own units Evidence of commitment • “Learning is an essential ability of intelligent systems” (NRC, 1998) • “To realize the full benefit of a human behavior model within an intelligent simulator,…the model should incorporate learning” (Hunter et al., CCGBR’00) • “Successful employment of human behavior models…requires that [they] possess the ability to integrate learning” (Banks & Stytz, CCGBR’00) • Conferences: BRIMS, I/ITSEC Status: No CGF simulator has been deployed with learning (D. Reece, 2003) • Some problems (Petty, CGFBR’01): • Cost of training phase • Loss of training control • Learning non-doctrinal behaviors • Learning unpredictable behaviors
Analysis: Conclusions State-of-the-art • Research on learning in complex gaming simulators is in its infancy • Knowledge-poor approaches are limited to simple performance tasks • Knowledge-intensive approaches require huge knowledge bases, which to date have been manually encoded • Existing approaches have many simplifying assumptions • Scenario limitations (e.g., on number and/or capabilities of adversaries) • Learning is (usually) performed only off-line • Learned knowledge is not transferred (e.g., to playing other games) Significant advances would include: • Fast acquisition approaches for a large amount of domain knowledge • This would enable rapid learning without requiring manual encoding • Demonstrations of on-line learning (i.e., within a single simulation run) • Increasing knowledge transfer among tasks & simulators over time • e.g., knowledge of processes, strategies, tasks, roles, objects, & actions
TIELTSpecification • Simplifies integration & evaluation! • Learning-embedded decision systems & gaming simulators • Supports communications, game model, perf. task, evaluation • Free & available • Learning foci • Task (e.g., learn how to execute, or advise on, a task) • Player (e.g., accept advice, predict a player’s strategies) • Game (e.g., learn/refine its objects, their relations, & behaviors) • Learning methods • Supervised/unsupervised, immediate/delayed feedback, analytic, active/passive, online/offline, direct/indirect, automated/interactive • Learning results should be available for inspection • Gaming simulators: Those with challenging learning tasks • Reuse: • Communications are separated from the game model & perf. task • Provide access to libraries of simulators & decision systems
Distinguishing TIELT • Provides an interface for message-passing interfaces • Supports composable system-level interfaces
Decision System Library Reasoning System Reasoning System Decision System Learning Module Learning Module Learning Module Stratagus . . . . . . . . . Learning Module Learning Module Learning Module Full Spectrum Command TIELT: Integration Architecture TIELT’s User Interface Game Engine Library Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System . . . Learned Knowledge (inspectable) Game Player(s) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM
TIELT’s Knowledge Bases Game Interface Model Defines communication processes with the game engine Decision System Interface Model Defines communication processes with the decision system Game Model • Defines interpretation of the game • e.g., initial state, classes, operators, behaviors (rules) • Behaviors could be used to provide constraints on learning Agent Description Defines what decision tasks (if any) TIELT must support Experiment Methodology Defines selected performance tasks (taken from Game Model Description) and the experiment to conduct
Types of Problem Solving Tasks Analysis Synthesis Decision Support Classification Planning Design Scheduling Diagnosis Parametric Structural TIELT: Supported Performance Tasks Performance vs. learning tasks Performance: Application of the learned knowledge (e.g., classification) Learning: Activity of learning system (e.g., update weights in a neural net) TIELT users will define complex, user-configurable performance tasks
An Example Complex Learning Task Task description This involves several challenging learning tasks Win a real-time strategy game Subtasks and supporting operations • Diagnosis: Identify (computer and/or human) opponent strategies & goals • Classification: Opponent recognition • Recording: Actions of opponents and their effects • This repeatedly involves classification • Diagnosis: Identify goal(s) being solved by these effects • Classification: Identify goal(s), if solved, that prevents opponent goals • Planning: Select/adapt or create plan to achieve goals and win the game • Classification: Select top-level actions to achieve goals • Iteratively identify necessary sub-goals and, finally, primitive actions • Design (parametric): Identify good initial layout of controllable assets • Execute plan • Recording: Collect measures of effectiveness, to provide feedback • Planning: If needed, re-plan, based on feedback, at Step 2
Game Engine Library Game Player(s) Decision System Library Reasoning System Decision System Reasoning System Processed State Learning Module Learning Module Learning Module Stratagus Raw State Action Decision . . . . . . . . . Learning Module Learning Module Learning Module Full Spectrum Command Use: Controlling a Game Character TIELT’s User Interface Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System Learned Knowledge (inspectable) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM
UT Example: Game Model State Description Operators Players : Array[ ] of Player Self : Player Score : Integer … Shoot(Player) Preconditions: Player.isVisible Effects: Player.Health -= rand(10) MoveTo(Location) Preconditions: Location.isReachable() Effects: Self.position == Location … Classes Player Team: String Number: Integer Position: Location Rules GetShotBy(Player) Preconditions: Player.hasLineOfSight(Self) Effects: Self.Health -= rand(10) EnemyMovements(Enemy, Location1, Location2) Preconditions: Location2.isReachableFrom(Location1) Enemy.position == Location1 Effects: Enemy.position == Location2 … Location x: Integer y: Integer z: Integer
Action Templates TURN(Pitch: real, Yaw: real, Roll: real) SETWALK(Walk: boolean) //Start walking or running RUNTO(Target: integer) //ID of object in world … Sensor Templates CWP(Weapon: integer) //Change Weapon to Weapon with this Id FLG(Id: integer, Reachable: boolean, State: Symbol <held, dropped, home> … UT Example: Game Interface Model Communication Medium: TCP/IP, Port 3000 Message Format: <name> {<attr1> <value1>} {<attr2> <value2>} … • Examples interface messages from the GameBots API • http://www.planetunreal.com/gamebots/docapi.html
Communication Medium: Standard I/O Message Format: (<name> <value1> <value2> <value3> … ) Example Template Messages Sent By TIELT InitializeGameRules(ruleSet: Array [ ] of Rule) SendStateUpdates(CurrentState: Array [ ] of Object) LoadScenario(SavedGameFilename: String) … Template Messages Received By TIELT GiveAdvice(AdviceMessage: String) PerformAction(OperatorName: String, Parameters: Array [ ] of String) AskForValue(AttributeName: String) … UT Example: Decision System Interface Model
UT Example: Agent Description Think-Act Cycle Shoot Something Go Somewhere Else Pick up a Healthpack Call Shoot Operator Ask Decision System: Where Do I Go? Call Pickup Operator Ask Decision System: Where Do I Go?
Metrics FragCount: Self.kills FragsPerSecond: Self.kills/LengthOfGame AverageHealth: Self.health x Plot FragCount vs. Runs AverageHealth vs. # of players FragsPerSecond vs. Outdegree of net nodes UT Example: Experiment Methodology Initialization Game Model: Unreal Tournament.xml Game Interface: GameBots.xml Decision System: MyUTBot.xml Runs: 100 Call slowdown(0.5)
Game Engine Library Game Player(s) Decision System Library Decision System Reasoning System Reasoning System Prediction Learning Module Learning Module Learning Module Stratagus Prediction . . . . . . . . . Learning Module Learning Module Learning Module Full Spectrum Command Use: Predicting Opponent Actions TIELT’s User Interface Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User Processed State Raw State TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System Learned Knowledge (inspectable) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM
Game Engine Library Game Player(s) Decision System Library Reasoning System Decision System Reasoning System Processed State Learning Module Learning Module Learning Module Stratagus Raw State Edit . . . . . . . . . Learning Module Learning Module Learning Module Full Spectrum Command Edit Use: Updating a Game Model TIELT’s User Interface Advice Interface Evaluation Interface Prediction Interface Coordination Interface TIELT User TIELT’s Internal Communication Modules Selected Game Engine Selected Decision System Learned Knowledge (inspectable) TIELT’s KB Editors Selected/Developed Knowledge Bases Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology TIELT User Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM
Game Engine Library Decision System Library Decision System Reasoning System Reasoning System Learning Module Learning Module Learning Module Stratagus Selected Game Engine Selected Decision System . . . . . . . . . Learning Module Learning Module Learning Module Full Spectrum Command Game Interface Model Decision System Interface Model Game Model Agent Description Experiment Methodology TIELT: A Researcher Use Case • Define/store decision system interface model • Select game simulator & interface • Select game model • Select/define performance task(s) • Define/select expt. methodology • Run experiments • Analyze displayed results Selected/Developed Knowledge Bases Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM
Game Engine Library Decision System Library Decision System Reasoning System Reasoning System Learning Module Learning Module Learning Module Stratagus Selected Game Engine Selected Decision System . . . . . . . . . Learning Module Learning Module Learning Module Full Spectrum Command Game Interface Model Decision System Interface Model Game Model Agent Description Experiment Methodology TIELT: A Game Developer Use Case • Define/store game interface model • Define/store game model • Select decision system/interface • Define performance task(s) • Define/select expt. methodology • Run experiments • Analyze displayed results Selected/Developed Knowledge Bases Knowledge Base Libraries GM AD EM GIM DSIM GM AD EM GIM DSIM GM AD EM GIM DSIM
TIELT’s Internal Communication Modules Database Evaluation Interface Advice Interface Database Engine State Evaluator Controller Stored State Current State Model Updater Learning Translator (Mapper) Translated Model (Subset) Selected Decision System Learning Task Selected Game Engine Percepts Action / Control Translator (Mapper) Learning Outputs Actions Perf. Task Game Model Game Interface Model Decision System Interface Model Agent Description Experiment Methodology User Game Model Editor Game Interface Model Editor Decision System Interface Model Editor Agent Descr. Editor Expt. Method. Editor
Sensing the Game State (City placement example, inspired by Alpha Centauri, etc.) 1 In Game Engine, thegame begins; a colony pod is created and placed. TIELT Current State 2 The Game Engine sends a “See” sensor message identifying the pod’s location. 4 Actions Action Translator 3 Updates Game Engine 1 The Model Updater receives the sensor message and finds the corresponding message template in the Game Interface Model. Sensors 3 5 2 Model Updater Controller 3 4 4 Game Model Game Interface Model This message template provides updates (instructions) to the Current State, telling it that there is a pod at the location See describes. User Game Model Editor Game Interface Model Editor 5 The Model Updater notifies the Controller that the See action event has occurred.
Fetching Decisions from the Decision System (City placement example) 1 The Controller notifies the Learning Translator that it has received a See message. TIELT Controller 2 The Learning Translator finds a city location task, which is triggered by the See message. It queries the controller for the learning mode, then creates a TestInput message to send to the reasoning system with information on the pod’s location and the map from the Current State. Selected Decision System 1 Learning Outputs 4 Action Translator Translated Model (Subset) Learning Module #1 Learning Translator Current State 3 . . . 2 2 Learning Module #n Decision System Interface Model Agent Description 3 The Learning Translator transmits the TestInput message to the Decision System. User 4 The Decision System transmits output to the Action Translator. Decision System Interface Model Editor Agent Desc. Editor
Acting in the Game World (City placement example) 1 The Action Translator receives a TestOutput message from the Decision System. 4.b, c The Advice Interface receives Move and displays advice to a human player on what to do next, or makes a Prediction. 2 The Action Translator finds the TestOutput message template, determines it is associated with the city location task, and builds a MovePod operator (defined by the Current State) with the parameters of TestOutput. TIELT Advice Interface Prediction Interface 4.b 4.c Current State 1 2 Actions 4.a Action Translator Game Engine 3 3 The Action Translator determines that the MoveAction from the Game Interface Model is triggered by the MovePodOperator and binds Moveusing information from MovePod. 3 2 Game Interface Model Decision System Interface Model User Game Interface Model Editor Decision System Interface Model Editor 4.a The Game Engine receives Move and updates the game to move the pod toward its destination, or
TIELT Status (November 2004) Implementation • TIELT (v0.5) available • Features • Message protocols • Current: Console I/O, TCP/IP, UDP • Future: Library calls, HLA interface, RMI (possibly) • Message content: Configurable • Instantiated templates tell it how to communicate with other modules • Initialization messages: Start, Stop, Load Scenario, Set Speed • Game Model representations (w/ Lehigh University) • Simple programs • TMK process models • PDDL (language used in planning competitions)
TIELT Status (November 2004) Documentation • TIELT User’s Manual (82 pages) • TIELT Overview • The TIELT User Interface • Scripting in TIELT • Theory of the Game Model • Communications • TMK Models • Experiments • TIELT Tutorial (45 pages) • The Game Model • The Game Interface Model • Decision System Interface Model • Agent Description • Experiment Methodology
TIELT Status (November 2004) Access • TIELT www site (new) • Selected Components • Documents: Documentation, publications, XML Spec • Status • Forum: A full-featured web forum/bulletin board • Bug Tracker: TIELT bug/feature tracking facility • FAQ-o-Matic: Questions and problem solutions; user-driven • Download
You Are Here TIELT Issues (November 2004) 1. Communication TIELT TIELT is a multilingual application; this provides interfacing with many different games. TCP/IP Library Calls SWIG 2. Resources for learning to use TIELT • TIELT Scripting syntax highlighting • Map of TIELT Component Interactions • Thanks, Megan • Typed script interface
“We’re working on it” TIELT Issues (November 2004) 3. Formatting Game Model • To no one’s surprise, everyone agrees that TIELT’s Game Model representation is inadequate. • Requests have been made for: • 3D Maps (Quake) • A different programming language • A relational operator representation • Standardized events
ToEE Platform Library Stratagus U. Minn-D. Lehigh U. Urban Terror FSC/R UT Arl. USC/ICT RoboCup TIELT Collaborations (2004-05) TIELT’s User Interface TIELT User Advice Interface Evaluation Interface Prediction Interface Coordination Interface U.Mich. U.Minn-D. USC/ICT Decision System Library Game Library TIELT’s Internal Communication Modules Soar: U.Mich ICARUS: ISLE DCA: UT Arlington EE2 Learning Modules Troika Mad Doc Neuroevolution: UT Austin FreeCiv Others: Many ISLE NWU TIELT’s KB Editors Selected/Developed Knowledge Bases Many LU, USC U. Mich. Mich/ISLE Many Game Interface Model Decision System Interface Model Game Model Task Descriptions Experiment Methodology TIELT User Knowledge Base Libraries