480 likes | 576 Views
Experience-Oriented Artificial Intelligence. Rich Sutton with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone, Lawrence Saul, and Harry Browne. Experience matters!.
E N D
Experience-OrientedArtificial Intelligence Rich Sutton with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone, Lawrence Saul, and Harry Browne
Experience matters! • Not in the obvious sense - that you have to do a thing many times to get good at it • But just in the sense that you do things, • that you live a life • that you take actions, receive sensations • that you pass through a trajectory of states over time • This is so obvious that it passes unnoticed • Like air, gravity
experience Agent World Experience is • the actions taken and the sensations received, • by the agent from its world • a continuing time sequence over the life of the agent • Experience is the minimal ontology
Experience matters, and must be respected • Experience matters because • It is what life is all about. • Experience is the final common path, • the only result of all that goes on • in the agent and world
Experience matters computationally • Experience is the most prominant feature of the computational problem we call AI • It’s the central data structure, revealed and chosen over time • It has a definite temporal structure • Order is important • Speed of decision is important • There is a continuous flow of long duration (a lifetime!) • not a sequence of isolated interactions, whose order is irrelevant
Experience in AI Many, many AI systems have no experience They don't have a life! Expert Systems Knowledge bases like CYC Question-answering systems Puzzle solvers, or any planner that is designed to receive problem descriptions and emit solutions Part of the new popularity of agent-oriented AI is that it highlights experience Other AI systems have experience, but don’t “respect” it
Orienting around experiencesuggests radical changes in AI • Knowledge of the world should be • knowledge of possible experiences • Planning should be about • foreseeing and controlling experience • The state of the world should be • a summary of past experience, • relevant to future experience • Yet we rarely see these basic AI issues discussed in terms of experience • Is it possible or plausible that they could be? Yes! • Would it matter if they were? Yes!
I am not claiming that knowledge comes from experience. (I take no position on the nature/nuture controvery) But only that knowledge is about experience. And that, given that, it should be predictive.
Key Points • Computational Theory vs. just making it work • What to compute and why • Experience is central to AI • Knowledge should be about experience • The minimal ontology • Grounding in experience from the bottom up • A computational theory of knowledge must support • Abstraction • Composition • Decomposition - Explicitness, verifiability • Such Modularity is the whole point of knowledge
Outline • Experience as central to AI • Predictive knowledge in General • Generalized Transition Predictions (GTPs, or option models) • Planning with GTPs (rooms-world example) • State as predictions (PSRs) • Prospects and conclusion
The I/O View of the World We are used to taking an I/O view of the mind, of the agent It does not matter what it is physically made of What matters is what it does So we should be willing to consider the same I/O view of the world It does not matter what it is physically made of What matters is what it does The only thing that matters about the world is the experience it generates
Then the only thing to know or say about the world is what experience it generates Thus, world knowledge must really be about future experience. In other words, it must be a prediction
AI could be about Predictions • Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving What will I see if I go around the corner? Objects: What will I see if I turn this over? Active vision: What will I see if I look at my hand? Value functions: What is the most reward I know how to get? Such knowledge is learnable, chainable, verifiable • Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful
Philosophical and Psychological Roots • Like classical british empiricism (1650–1800) • Knowledge is about experience • Experience is central • But not anti-nativist (evolutionary experience) • Emphasizing sequential rather than simultaneous events • Replace association/contiguity with prediction/contingency • Close to Tolman’s “Expectancy Theory” (1932–1950) • Cognitive maps, vicarious trial and error • Psychology struggled to make it a science (1890–1950) • Introspection • Behaviorism, operational definitions • Objectivity
Tolman & Honzik, 1930“Reasoning in Rats” Food box Block B Path 1 Block A Path 2 Path 3 Start box
An old, simple, appealing idea • Mind as prediction engine! • Predictions are learnable, combinable • They represent cause and effect, and can be pieced together to yield plans • Perhaps this old idea is essentially correct. • Just needs • Development, revitalization in modern forms • Greater precision, formalization, mathematics • The computational perspective to make it respectable • Imagination, determination, patience • Not rushing to performance
Outline • Experience as central to AI • Predictive knowledge in general • Generalized Transition Predictions (GTPs, or option models) • Planning with GTPs (rooms-world example) • State as predictions (PSRs) • Prospects and conclusion
Machinery for General Transition Predictions • In steps of increasing expressiveness: • Simple state-transition predictions • Mixtures of predictions • Closed-loop termination • Closed-loop action conditioning
The Simplest Transition Predictions state action Experience 1-step Prediction a A B k-step Prediction p A B
Mixtures of k-step Predictions: Terminating over a period of time time steps of interest Where will I be in 10–20 steps? Where will I be in roughly k steps? now k=10 steps k=20 steps Arbitrary termination profiles are possible now k steps short term But sometimes anything like this is too loose and sloppy... medium term long term
Closed-loop Termination • Terminate depending on what happens • E.g., instead of “Will I finish this report soon”which uses a soft termination profile: • Use “Will I be done when my boss gets here?” 1 hr probably in about an hour Prob. time boss arrives 1 only one precise but uncertain time matters Prob. 0
Closed-loop terminationallows time specification to be both flexible and precise • Instead of “what will I see at t+100?” • Can say “what will I see when I open the box?” • Will we elect a black or a woman president first? • Where will the tennis ball be when it reaches me? • What time will it be when the talk starts? or “when John arrives?” “when the bus comes?” “when I get to the store?” A substantial increase in expressiveness
Closed-loop Action Conditioning • Each prediction has a closed-loop policy Policy: States --> Actions (or Probs.) • If you follow the policy, then you predict and verify • Otherwise not • If partly followed, temporal-difference methods can be used
General Transition Predictions (GTPs) Closed-loop terminationsand policies Correspond to arbitrary experiments and the results of those experiments What will I see if I go into the next room? What time will it be when the talk is over? Is there a dollar in the wallet in my pocket? Where is my car parked? Can I throw the ball into the basket? Is this a chair situation? What will I see if I turn this object around?
Anatomy of a General Transition Prediction States Measurement space 1 Predictor Recognizes the conditions, makes the prediction 2 Experiment - policy - termination condition - measurement function(s) knowledge Actions verifier
Sutton, Precup, & Singh, 1999 Room-to-Room GTPs (General Transition Predictions) Target (goal) hallway “Options” Precup 2000 Sutton, Precup, & Singh 1999 4 stochastic primitive actions Policy u p F a i l 3 3 % l e f t r i g h t o f t h e t i m e Termination hallways d o w n 8 multi-step GTPs ( t o e a c h r o o m ' s 2 h a l l w a y s ) Predict:Probability of reaching each terminal hallway Goal: minimize # steps + values for target and other outcome hallway
Example: Open-the-door • PredictorUse visual input to estimate • Probabilities of succeeding in opening the door, and of other outcomes (door locked, no handle, no real door) • expected cumulative cost (sub-par reward) in trying • Experiment • Policy for walking up to the door, shaping grasp of handle, turning, pulling, and opening the door • Terminate on successful opening or various failure conditions • Measure outcome and cumulative cost
Example: RoboCup Soccer Pass • Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of • Successful pass, openness of receiver • Interception • Reception failure • Aborted pass, in trouble • Aborted pass, something better to do • Loss of time • Experiment • Policy for maneuvering ball, or around ball, to set up and pass • Termination strategy for aborting, recognizing completion • Measurement of outcome, time
Outline • Experience as central to AI • Predictive knowledge in General • Generalized Transition Predictions (GTPs, or option models) • Planning with GTPs (rooms-world example) • State as predictions (PSRs) • Prospects and conclusion
Combining Predictions • If the mind is about predictions, • Then thinking is combining predictions to produce new ones • Predictions obviously compose • If A->B and B->C, then A->C • GTPs are designed to do this generally • Fit into “Bellman equations” of semi-Markov extensions of dynamic programming • Can also be used for simulation-based planning
Composing Predictions A B B C A C • Final measurement • (e.g., partial distribution • of outcome states) Transient measurement (e.g., elapsed time, cumulative reward)
Composing Predictions B’ .1 A B B C .8 B’’ .1 B’ .1 p b p b then if B A C 1 1 2 2 .8 T . 8 T + B’’ .1 1 2
Sutton, Precup, & Singh, 1999 Room-to-Room GTPs (General Transition Predictions) Target (goal) hallway “Options” Precup 2000 Sutton, Precup, & Singh 1999 4 stochastic primitive actions Policy u p F a i l 3 3 % l e f t r i g h t o f t h e t i m e Termination hallways d o w n 8 multi-step GTPs ( t o e a c h r o o m ' s 2 h a l l w a y s ) Predict:Probability of reaching each terminal hallway Goal: minimize # steps + values for target and other outcome hallway
Planning with GTPs (GTPs)
1 0 0 0 S t e p s p e r 1 0 0 e p i s o d e 1 0 1 0 0 1 0 1 0 0 0 1 0 , 0 0 0 1 E p i s o d e s Learning Path-to-Goal with and without GTPs Primitives GTPs & primitives GTPs
0 . 7 u p p e r h a l l w a y 0 . 6 s u b g o a l i d e a l v a l u e s R M S E r r o r i n l o w e r 0 . 5 h a l l w a y l l s u b g o a l 0 . 4 0 . 3 l e a r n e d T w o s u b g o a l v a l u e s 0 . 2 s t a t e v a l u e s 0 . 1 0 2 0 , 0 0 0 4 0 , 0 0 0 6 0 , 0 0 0 8 0 , 0 0 0 1 0 0 , 0 0 0 0 T i m e S t e p s T i m e s t e p s Rooms Example: Simultaneous Learning of all 8 GTPs from their Goals 0 . 4 0 . 3 goal prediction 0 . 2 0 . 1 0 0 2 0 , 0 0 0 4 0 , 0 0 0 6 0 , 0 0 0 8 0 , 0 0 0 1 0 0 , 0 0 0 All 8 hallway GTPs were learned accurately and efficiently while actions are selected totally at random
Outline • Experience as central to AI • Predictive knowledge in General • Generalized Transition Predictions (GTPs, or option models) • Planning with GTPs (rooms-world example) • State as predictions (PSRs) • Prospects and conclusion
Predictive State Representations • Problem: So far we have assumed statesbut world really just gives information, “observations” • Hypothesis: What we normally think of as stateis a set of predictions about outcomes of experiments • Wallet’s contents, John’s location, presence of objects… • Prior work: • Learning deterministic FSAs - Rivest & Schapire, 1987 • Adding stochasticity: An alternative to HMMs - Herbert Jaeger, 1999 • Adding action: An alternative to POMDPs - Littman, Sutton, & Singh 2001
Summary of Results for Predictive State Rep’ns (PSRs) • Exist compact, linear PSRs • # tests ≤ # states in minimal POMDP • # tests ≤ Rivest & Schapire’s Diversity • # tests can be exponentially fewer than diversity and POMDP • Compact simulation/update process • Construction algorithm from POMDP • Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs • There are natural EM-like algorithms (current work)
Empty Gridworld with Local Sensing Four actions: Up, Down, Right, Left And four sensory bits
Distance to Wall Predictions 0 R 0RR 1RRR 1RRRR . . . 0D 1DD 1DDD . . . “meaning” of predictions 4 GTPs suffice to identify each state More needed to update PSR Many more are computed from PSR Predictive State Representation (PSR)
Suppose we add one non-uniformity 0 R 0RR 1RRR 1RRRR . . . 0D 1DD 1DDD . . . Now there is much more to know It would be challenging to program it all correctly
Other Extension Ideas • Stochasticity • Egocentric motion • Multiple Rooms • Second agent • Moveable objects • Transient goals It’s easy to make such problems arbitrarily challenging
Outline • Experience as central to AI • Predictive knowledge in general • Generalized Transition Predictions (GTPs, or option models) • Planning with GTPs (rooms-world example) • State as predictions (PSRs) • Prospects and conclusion
How Could These Ideas Proceed? • Build systems! Build Gridworlds! • A performance orientation would be problematic • The “Knowledge Representation” guys may not be impressed • But others I think will be very interested and appreciative - throughout modern probabalistic AI
The Experience Manifesto • Experience is the input and output of AI • An AI must have experience; it must have a life! • Knowledge is about experience • Not about objects, or people, or space, or time…except in so far as these things can be restated in terms of experience. • Knowledge is well expressed as predictions of experience • Predictions of experience have a much clearer meaning than any previously proposed kind of knowledge • Predictions of experience can be autonomously verified • Predictive knowledge is completely in the machine, not in a person! • Planning is about composing predictions to search • through the space of attainable experiences • World-state rep’ns are also predictions of experience
Key Points • We should not try to fake intelligence or understanding • Computational Theory vs. just making it work • What to compute and why • Experience is central to AI • Knowledge should be about experience • The minimal ontology • Grounding in experience from the bottom up • A computational theory of knowledge must support • Abstraction • Composition • Decomposition - Explicitness, verifiability • Such Modularity is the whole point of knowledge
Summary of the Predictive View of AI • Knowledge is Predictions About what-leads-to-what, under what ways of behaving Such knowledge is learnable, chainable • Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful • Predictions are verifiable A natural way to self-maintain knowledge,which is essential for scaling AI beyond programming • Most of the machinery is simple but potentially powerful • Is it powerful enough?