150 likes | 190 Views
Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Tommi Jaakkola . Three Subprojects. Learning to behave in huge domains
E N D
Information Processing Technology Office Learning Workshop April 12, 2004 Seedling Overview Learning in the Large MIT CSAIL PIs: Leslie Pack Kaelbling, Tomás Lozano-Pérez, Tommi Jaakkola
Three Subprojects • Learning to behave in huge domains • Transfer of learned knowledge across problems and domains • Learning to recognize objects and interpret scenes
Three Subprojects • Learning to behave in huge domains • Transfer of learned knowledge across problems and domains • Learning to recognize objects and interpret scenes
Learning Objective • Learn to act effectively in highly complex dynamic domains • Learn models of complex world dynamics involving objects, properties, and relations • Learn “meta-cognition” strategies for deciding how to focus computational attention for action selection • Learning is crucial for both problems because human designers are unable to build appropriate models by hand
What Is Being Learned? • Learning probabilistic dynamic rules pickup(X):on(X,Y), clear(X), table(Z), inhand-nil 0.8 : inhand(X), ¬on(X,Y), clear(Y), ¬clear(X) ¬inhand-nil 0.2:¬on(X,Y), clear(Y), on(X,Z) • Important goal is to learn partial models: some aspects will be easy to learn to predict, others will take longer • Take advantage of partial models as soon as they’re learned
How is it Being Learned? • Search in rule space • logic-based methods for learning structure • convex optimization for probabilities • Effectiveness of learned models tested using planner to select actions • Learning is automatic • Amount of data needed depends on the frequency and reliability of phenomenon being modeled
How is the Knowledge Represented? • Probabilistic dynamics rules • No background knowledge currently, but it would be easy to build in some rules • Knowledge is task-independent (though we may use utility to focus learning) • Models can account for only parts of the state evolution; and they’re probabilistic • Currently, no
What is the Domain? • Currently: physics simulator of blocks world • Would like simulation of more complex environment, e.g., • battlefield • disaster relief • making breakfast
How is Progress Being Measured? • First, human inspection of rules for plausibility • Second by performance of agent using rules for planning • Nothing changes in the experimental set-up except the learned rules • Metrics: • utility gained by the agent • computation speed • Easily done overnight on a workstation
What are the Technical Milestones? • Defined by model sophistication rather than overt performance in the task • Learn rules with quantifiers • Learn to ground symbolic predicates in perception • Learn rules in partially observable environments • Postulate hidden causes • Focus rule-learning based on utility
What is Being Learned? • Learning to formulate small planning problem, from a huge state space and competing goals • what are useful subgoals? • when is it appropriateto ignore certain aspectsof the domain? learninginferenceplanning perception action
How is it Being Learned? • Learning parameters in abstract models • partial observability makes it hard • gradient descent works, but may be weak • take advantage of Russell’s methods? • Compare speed and utility of resulting action-selection system • Learning is automatic • Amount of data needed depends on the frequency and reliability of phenomenon being modeled
How is the Knowledge Represented? • Parameters in strategies for building abstractions • Currently most of the abstraction structure is hand-coded • The knowledge depends on the distribution of problems an agent has to solve, but not on particular low-level tasks • Uncertainty isn’t represented explicitly, but is handled implicitly in statistical learning • We are learning at multiple levels of abstraction
What is the Domain? • Nethack • Would like more complex simulated domain
What are the Technical Milestones? • Meta-learning • Learn parameters in hand-built abstractions for MDPs • Learn new abstractions for MDPs • Learn to compose abstractions • Do it all for POMDPs