210 likes | 287 Views
Learning in Worlds with Objects. Leslie Pack Kaelbling MIT Artificial Intelligence Laboratory With Tim Oates, Natalia Hernandez, Sarah Finney. What is an Agent?. A system that has an ongoing interaction with an external environment household robot factory controller web agent
E N D
Learning in Worlds with Objects Leslie Pack Kaelbling MIT Artificial Intelligence Laboratory With Tim Oates, Natalia Hernandez, Sarah Finney
What is an Agent? • A system that has an ongoing interaction with an external environment • household robot • factory controller • web agent • Mars explorer • pizza delivery robot Environment Observation Action
Agents Must Learn • Learning is a crucial aspect of intelligent behavior • human programmers lack required knowledge • agents should work in a variety of environments • agents should work in changing environments • What to learn? • World dynamics: What happens when I take a particular action? • Reward: What world states are good?
Crisis • Current state-of-the-art learning methods will not work in domains with multiple objects: • These are crucial domains for robots of the future. ?
Representation • Learning requires some sort of representation of states of the world. • The choice of representation affects • what information can be represented • what kinds of generalizations the agent can make
Attribute Vector • State-of-the-art representation for learning temperature = 48.2 pressure = 57.9 mB valve1 = open valve2 = closed time = 10:48AM backlog = 78 volume = 32.2 production = 45.5 …
Generalization over Attribute Vectors x temp > 22 time pressure < 3 temp time < 10AM close valve open valve add reagent increase temp
Complex Everyday Domains Attribute vector is impossibly big • book1-on-book2: true • book2-on-book1: false • pen-is-yellow: true • pen-is-blue: false • lamp-on: true • lamp-off: false • ink-bottle-level: 50% • lamp-in-bottle: false • bottle-on-lamp: false • paper1-color: gray • paper2-color: white • fabric-behind-lamp: true • book2-is-clear: false • book4-is-clear: false • book1-is-clear: true • block1-on-block2: false • block3-unstable: true • block2-on-table: false • block1-in-front-of-lamp: true • …
Generalization over Objects • If book1 is on book2 and I move book2, then book1 will move • If the cup is on the table and I move the table, then the cup will move • If the pen is on the paper and I move the paper, then the pen will move • If the coat is on the chair and I move the chair, then the coat will move • For all objects A and B: • If A is on B and I move B, then A will move
Referring to Objects • Traditional symbolic AI has the problem of “symbol grounding”: • How do I know what object is named by book1? on(book1,book2)
Deictic Expressions • “Deixis” is Greek for “pointing” ima koko watashi-ga motteiru hako watashi-ga miteiru hako
Automatic Generalization • If I have an object in my hand and I open my hand, then the object that was in my hand is now on the table • This is true, no matter what object is in your hand.
Communicating with Humans • Natural language communication • speaks of the world in terms of objects and their relationships • uses deictic expressions • Our robots of the future will have to be able to understand and generate human descriptions of the world
Long-Term Research Goal • A robotic system with hand and cameras that can • learn to achieve tasks efficiently through trial and error • acquire natural language descriptions of the objects and their properties through “conversation” with humans
Short-Term Research Plan • Explore deictic, object-based representation for learning algorithms • build simulated hand-eye robot system that manipulates blocks (with real physics) • have simulated robot learn to carry out tasks from trial and error • Demonstrate empirically and theoretically that deictic representation is crucial for efficient learning
First Example Domain • Unreliable block stacking: • robot is rewarded for making tall piles of blocks • the taller a pile is, the more likely it is to fall over when another block is added • a pile can be made more stable by building piles to its sides • Once the robot learns to do this task, keep the physics of the domain the same, but reward a more complex behavior.
Learning by Doing • Having an initial task to perform focuses the robot’s attention on aspects of the environment • Use extension of Utree learning algorithm to select important aspects of the environment • Generate new deictic expressions dynamically: the-block-on-top-of(the-block-I-am-looking-at) • Extend reinforcement learning methods to apply to object-based representations
Extracting General Rules • There are too many facts that are true in any interesting environment. • Solving tasks focuses attention on • particular objects (named with deictic expressions) • particular properties of those objects • These objects and properties are likely of general importance: use them as input to association-rule learning algorithm to learn facts like: • The thing that is on the thing that I am holding will probably fall off if I move
Enabling Planning • Given general rules, the agent can “think” about the consequences of its actions and decide what to do, rather than learn through trial and error.
In Future • An ambitious research project • vision algorithms for learning segmentation and object recognition • learning good properties and relations for characterizing the domain (“concept learning”) • connect with natural language learning for word meanings
Don’t miss any dirt!