Advanced Learning Strategies for Complex Dynamic Domains

Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Tommi Jaakkola

Three Subprojects • Learning to behave in huge domains • Transfer of learned knowledge across problems and domains • Learning to recognize objects and interpret scenes

Learning Objective • Learn to act effectively in highly complex dynamic domains • Learn models of complex world dynamics involving objects, properties, and relations • Learn “meta-cognition” strategies for deciding how to focus computational attention for action selection • Learning is crucial for both problems because human designers are unable to build appropriate models by hand

What Is Being Learned? • Learning probabilistic dynamic rules pickup(X):on(X,Y), clear(X), table(Z), inhand-nil 0.8 : inhand(X), ¬on(X,Y), clear(Y), ¬clear(X) ¬inhand-nil 0.2:¬on(X,Y), clear(Y), on(X,Z) • Important goal is to learn partial models: some aspects will be easy to learn to predict, others will take longer • Take advantage of partial models as soon as they’re learned

How is it Being Learned? • Search in rule space • logic-based methods for learning structure • convex optimization for probabilities • Effectiveness of learned models tested using planner to select actions • Learning is automatic • Amount of data needed depends on the frequency and reliability of phenomenon being modeled

How is the Knowledge Represented? • Probabilistic dynamics rules • No background knowledge currently, but it would be easy to build in some rules • Knowledge is task-independent (though we may use utility to focus learning) • Models can account for only parts of the state evolution; and they’re probabilistic • Currently, no

What is the Domain? • Currently: physics simulator of blocks world • Would like simulation of more complex environment, e.g., • battlefield • disaster relief • making breakfast

How is Progress Being Measured? • First, human inspection of rules for plausibility • Second by performance of agent using rules for planning • Nothing changes in the experimental set-up except the learned rules • Metrics: • utility gained by the agent • computation speed • Easily done overnight on a workstation

What are the Technical Milestones? • Defined by model sophistication rather than overt performance in the task • Learn rules with quantifiers • Learn to ground symbolic predicates in perception • Learn rules in partially observable environments • Postulate hidden causes • Focus rule-learning based on utility

What is Being Learned? • Learning to formulate small planning problem, from a huge state space and competing goals • what are useful subgoals? • when is it appropriateto ignore certain aspectsof the domain? learninginferenceplanning perception action

How is it Being Learned? • Learning parameters in abstract models • partial observability makes it hard • gradient descent works, but may be weak • take advantage of Russell’s methods? • Compare speed and utility of resulting action-selection system • Learning is automatic • Amount of data needed depends on the frequency and reliability of phenomenon being modeled

How is the Knowledge Represented? • Parameters in strategies for building abstractions • Currently most of the abstraction structure is hand-coded • The knowledge depends on the distribution of problems an agent has to solve, but not on particular low-level tasks • Uncertainty isn’t represented explicitly, but is handled implicitly in statistical learning • We are learning at multiple levels of abstraction

What is the Domain? • Nethack • Would like more complex simulated domain

What are the Technical Milestones? • Meta-learning • Learn parameters in hand-built abstractions for MDPs • Learn new abstractions for MDPs • Learn to compose abstractions • Do it all for POMDPs

Advanced Learning Strategies for Complex Dynamic Domains

Advanced Learning Strategies for Complex Dynamic Domains

Presentation Transcript

Norfolk State University Information Technology Meeting April 2004

Information Technology Security Office

Law Office Technology Overview

HIMSS 2004 Health Information Technology Institute Revenue Cycle Overview May, 2004

INFORMATION TECHNOLOGY SYSTEMS Supporting Information Processing

UNIT 12 Information Technology

COSC1078 Introduction to Information Technology Lecture 12 Machine Processing

Office of Information Technology

Learning and information processing

Information Technology Overview

LEARNING AND INFORMATION PROCESSING

Office of Information Technology

Information Visualization Toolkits Workshop 2004

Office of Information Technology

CITRIS OVERVIEW April 2004

Information Technology Outage Workshop

Office of Information Technology

Information Assurance Workshop 2004

EE150a – Genomic Signal and Information Processing On DNA Microarrays Technology October 12, 2004

Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview

Learning the “Epitome” of a Video Sequence Information Processing Workshop 2004

Learning the “Epitome” of a Video Sequence Information Processing Workshop 2004