480 likes | 489 Views
Delve into Rich Sutton's proposal on AI focusing on predictions, self-maintenance, and the mind's capacity for foreseeing outcomes. Explore the future prospects and the crucial role of forecasting in AI advancement. Discover the philosophical roots and modern computational views shaping the predictive AI paradigm.
E N D
Artificial Intelligence Should Be About Predictions Rich Sutton AT&T Labs with special thanks to Michael Littman, Doina Precup, Satinder Singh, David McAllester, Peter Stone
Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion
It’s Hard to Build Large AI Systems • Brittleness • Unforeseen interactions • Scaling • Requires too much manual complexity management • people must understand, intervene, patch and tune • like programming • Need more self-maintenance • learning, verification • internal coherence of knowledge and experience
AI at a Impasse • We can’t go beyond ourselves • We can’t make AI systems more complex than we can understand • All the representations • All the possible meanings • All the interactions • Beyond that, we get bogged down • Brittleness • Continual manual tuning • Teams of people diverge on rep’ns and meanings • No big return for our efforts
What keeps the knowledge in an AI system correct? • People do! • But eventually this is a dead end. • The key to a successful AI is that it can tell for itself if it is working correctly.
The Verification Principle • The Verification Principle An AI system can successfully maintain knowledge only to the extent that it can verify that knowledge itself
Two Strategies for Self-maintenance • Logical self-consistency • Check statements for consistency with each other • Establishes an internal coherence within the AI • But tells us nothing about the external world • Consistency with data • Make predictions, see if they happen • Establishes a coherence between the AI and its world
Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion
Mind is About Predictions • Hypothesis: Knowledge is predictive About what-leads-to-what, under what ways of behaving What will I see if I go around the corner? Objects: What will I see if I turn this over? Active vision: What will I see if I look at my hand? Value functions: What is the most reward I know how to get? Such knowledge is learnable, chainable • Hypothesis: Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful
Philosophical and Psychological Roots • Like classical british empiricism (1650–1800) • Knowledge is about experience • Experience is central • But not anti-nativist (evolutionary experience) • Emphasizing sequential rather than simultaneous events • Replace association/contiguity with prediction/contingency • Close to Tolman’s “Expectancy Theory” (1932–1950) • Cognitive maps, vicarious trial and error • Psychology struggled to make it a science (1890–1950) • Introspection • Behaviorism, operational definitions • Objectivity
Modern Computional View of Mind • OK to talk about insides of minds • Ok to talk about the function and purpose of a design • We talk about Why • Why a system works • Why it should compute X and in manner Y • Why such a system should achieve purpose Z • This is new, and resolves classical struggles • Servo-mechanisms, state-transition probabilities • Utility and decision theory • Information as signal – subjective (private) yet clear • Purpose defines and constrains mental constructs
Informational View of Mind • Mind does information processing • Mind exchanges information with the world • Only experience is known for sure • Anything more public or “objective” is suspect • World is an I-O entity, a black box • Although we often seem to talk about what is inside,All we can sensibly talk about is I-O behavior experience Mind World
Is Mind about Predictions?ORIs Mind about Action (or Policies)? • Of course it is ultimately about action • But action generation methods are relatively clear • Value functions and decision theory • Pick action that maximizes expected cumulative reward • OR • Policy gradient RL methods • Execution-time search • Reflexes and behavior-based robotics • Learning-extended reflexes and conditioning • Flexible cognition requires more than action generation • Most mental activity is working with predictions
An old, simple, appealing idea • Mind as prediction engine! • Predictions are learnable, combinable • They represent cause and effect, and can be pieced together to yield plans • Perhaps this old idea is essentially correct. • Just needs • Development, revitalization in modern forms • Greater precision, formalization, mathematics • The computational perspective to make it respectable • Imagination, determination, patience • Not rushing to performance
Requisites of Prediction Proposal • The AI has to have a life • Predictions must be very flexible, expressive • To capture a wide variety of world knowledge • Mixtures of transition predictions • Closed-loop action conditioning • Closed-loop termination • And yet be grounded, directly comparable to data • Predictions must be combinable, compositional • Support varieties of planning • Projection and anticipation of futures
Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion
Machinery for General Transition Predictions • In steps of increasing expressiveness • Simple state-transition predictions • Mixtures of predictions • Closed-loop termination • Closed-loop action conditioning • While staying grounded in data • Predictions and State
The Simplest Transition Predictions state action Experience 1-step Prediction a X Y k-step Prediction p X Y
Mixtures of k-step Predictions: Terminating over a period of time time steps of interest Where will I be in 10–20 steps? Where will I be in roughly k steps? now k=10 steps k=20 steps Arbitrary termination profiles are possible now k steps short term But sometimes anything like this is too loose and sloppy... medium term long term
Closed-loop Termination • Terminate depending on what happens • E.g., instead of “Will I finish this report soon”which uses a soft termination profile: • Use “Will I be done when my boss gets here?” 1 hr probably in about an hour Prob. time boss arrives 1 only one precise but uncertain time matters Prob. 0
Closed-loop Termination • Terminate depending on what happens • E.g., instead of “Will I finish this report soon”which uses a soft termination profile: • Use “Will I be done when my boss gets here?” 1 hr probably in about an hour Prob. time boss arrives 1 only one precise but uncertain time matters Prob. 0
Closed-loop terminationallows time specification to be both flexible and precise • Instead of “what will I see at t+100?” • Can say “what will I see when I open the box?” • Will we elect a black or a woman president first? • Where will the tennis ball be when it reaches me? • What time will it be when the talk starts? or “when John arrives?” “when the bus comes?” “when I get to the store?” A substantial increase in expressiveness
Closed-loop Action Conditioning • What happens depends on what you do • What you do depends on what happens • Each prediction has a closed-loop policy Policy: States --> Actions (or Probs.) • If you follow the policy, then you predict and verify • Otherwise not • If partly followed, temporal-difference methods can be used
General Transition Predictions (GTPs) Closed-loop terminations And closed-loop policies Correspond to arbitrary experiments and the results of those experiments What will I see if I go into the next room? What time will it be when the talk is over? Is there a dollar in the wallet in my pocket? Where is my car parked? Can I throw the ball into the basket? Is this a chair situation? What will I see if I turn this object around?
Anatomy of a General Transition Prediction States Measurement space 1 Predictor Recognizes the conditions, makes the prediction 2 Experiment - policy - termination condition - measurement function(s) knowledge Actions verifier
Example: Open-the-door • PredictorUse visual input to estimate • Probabilities of succeeding in opening the door, and of other outcomes (door locked, no handle, no real door) • expected cumulative cost (sub-par reward) in trying • Experiment • Policy for walking up to the door, shaping grasp of handle, turning, pulling, and opening the door • Terminate on successful opening or various failure conditions • Measure outcome and cumulative cost
RoboCup-Soccer Example Safe to pass? Predict the outcome of choosing to pass • The pass will take several steps to set up • – choosing to pass involves a whole action policy • You may choose to not to pass half way through • Terminations and outcomes: • – pass is aborted • – opponents touch the ball before teammate • – teamate touches first, appears to control ball • – ball goes out of bounds
Example: Pass-to-Teammate • Predictor uses perceived positions of ball, opponents, etc. to estimate probabilities of • Successful pass, openness of receiver • Interception • Reception failure • Aborted pass, in trouble • Aborted pass, something better to do • Loss of time • Experiment • Policy for maneuvering ball, or around ball, to set up and pass • Termination strategy for aborting, recognizing completion • Measurement of outcome, time
More Predictive Knowledge • John is in the coffee room • My car is in the South parking lot • What we know about geography, navigation • What we know about how an object looks, rotates • What we know about how objects can be used • Recognition strategies for objects and letters • The portrait of Washington on the dollar in the wallet in my other pants in the laundry, has a mustache on it • Composing experiments creates a productive rep’n language
Relational, Propositional, and Deictic • objectsX, If I drop X, then X will be on the floor • Holding object X means predicting certain sensations if, for example, one directs one’s eyes toward one’s hand • Thus, on dropping, the predicted sensations are merely transferred from the looking-at-hand prediction to the looking-at-floor prediction • Such transfer of existing predictions should be a common part of visual knowledge - updated every time the eyes move • X,Y, such that Red(X), Blue(Y), and Above(X,Y) • There is some place I can foveate and see Red • There is some place I can foveate and see Blue • If I foveate first the Red place, “mark” it, then the Blue place, the mark will be Above the fovea (may need to search) • These are typical ideas of modern, active, deictic vision X X
Combining Predictions • If the mind is about predictions, • Then thinking is combining predictions to produce new ones • Predictions obviously compose • If A->B and B->C, then A->C • GTPs are designed to do this generally • Fit into “Bellman equations” of semi-Markov extensions of dynamic programming • Can also be used for simulation-based planning
Composing Predictions X Y Y Z X Z • Final measurement • (e.g., partial distribution • of outcome states) Transient measurement (e.g., elapsed time, cumulative reward)
Composing Predictions Y’ .1 X Y Y Z .8 Y’’ .1 Y’ .1 p b p b then if Y X Z 1 1 2 2 .8 T . 8 T + Y’’ .1 1 2
Sutton, Precup, & Singh, 1999 Room-to-Room GTPs (General Transition Predictions) Target (goal) hallway “Options” Precup 2000 Sutton, Precup, & Singh 1999 4 stochastic primitive actions Policy u p F a i l 3 3 % l e f t r i g h t o f t h e t i m e Termination hallways d o w n 8 multi-step GTPs ( t o e a c h r o o m ' s 2 h a l l w a y s ) Predict:Probability of reaching each terminal hallway Goal: minimize # steps + values for target and other outcome hallway
Planning with GTPs (GTPs)
1 0 0 0 S t e p s p e r 1 0 0 e p i s o d e 1 0 1 0 0 1 0 1 0 0 0 1 0 , 0 0 0 1 E p i s o d e s Learning Path-to-Goal with and without GTPs Primitives GTPs & primitives GTPs
0 . 7 u p p e r h a l l w a y 0 . 6 s u b g o a l i d e a l v a l u e s R M S E r r o r i n l o w e r 0 . 5 h a l l w a y l l s u b g o a l 0 . 4 0 . 3 l e a r n e d T w o s u b g o a l v a l u e s 0 . 2 s t a t e v a l u e s 0 . 1 0 2 0 , 0 0 0 4 0 , 0 0 0 6 0 , 0 0 0 8 0 , 0 0 0 1 0 0 , 0 0 0 0 T i m e S t e p s T i m e s t e p s Rooms Example: Simultaneous Learning of all 8 GTPs from their Goals 0 . 4 0 . 3 goal prediction 0 . 2 0 . 1 0 0 2 0 , 0 0 0 4 0 , 0 0 0 6 0 , 0 0 0 8 0 , 0 0 0 1 0 0 , 0 0 0 All 8 hallway GTPs were learned accurately and efficiently while actions are selected totally at random
Machinery for General Predictions • In steps of increasing expressiveness • Simple state-transition predictions • Mixtures of predictions • Closed-loop termination • Closed-loop action conditioning • While staying grounded in data • Predictions and State
Predictive State Representations • Problem: So far we have assumed statesbut world really just gives information, “observations” • Hypothesis: What we normally think of as stateis a set of predictions about outcomes of experiments • Wallet’s contents, John’s location, presence of objects… • Prior work: • Learning deterministic FSAs - Rivest & Schapire, 1987 • Adding stochasticity: An alternative to HMMs - Herbert Jaeger, 1999 • Adding action: An alternative to POMDPs - Littman, Sutton, & Singh 2001
Empty Gridworld with Local Sensing Four actions: Up, Down, Right, Left And four sensory bits
Distance to Wall Predictions 0 R 0RR 1RRR 1RRRR . . . 0D 1DD 1DDD . . . “meaning” of predictions 4 GTPs suffice to identify each state More needed to update PSR Many more are computed from PSR Predictive State Representation (PSR)
Suppose we add one non-uniformity 0 R 0RR 1RRR 1RRRR . . . 0D 1DD 1DDD . . . Now there is much more to know It would be challenging to program it all correctly
Other Extension Ideas • Stochasticity • Egocentric motion • Multiple Rooms • Second agent • Moveable objects • Transient goals It’s easy to make such problems arbitrarily challenging
Outline • AI at an Impasse • A Predictive Proposal • Some of the Machinery • Prospects and Conclusion
How Could These Ideas Proceed? • Build systems! Build Gridworlds! • A performance orientation would be problematic • The “Knowledge Representation” guys may not be impressed • But others I think will be very interested and appreciative - throughout modern probabalistic AI
Conclusion: Predictions are the Coin of the Mental Realm • Knowledge is Predictions About what-leads-to-what, under what ways of behaving Such knowledge is learnable, chainable • Mental activity is working with predictions Learning them Combining them to produce new predictions (reasoning) Converting them to action (planning, reinforcement learning) Figuring out which are most useful • Predictions are verifiable A natural way to self-maintain knowledge,which is essential for scaling AI beyond programming • Most of the machinery is simple but potentially powerful
Reliable Knowledge Requires Verification • We can distinguish • 1. Having knowledge • 2. Having the ability to verify knowledge • I.e., there is somethingbeyondhaving knowledgewhich we might call understanding its meaningand which is key in practice to building powerful AIs
Summary of Results for Predictive State Rep’ns (PSRs) • Exist compact, linear PSRs • # tests ≤ # states in minimal POMDP • # tests ≤ Rivest & Schapire’s Diversity • # tests can be exponentially fewer than diversity and POMDP • Compact simulation/update process • Construction algorithm from POMDP • Learning/discovery algorithms of Rivest and Schapire, and of Jaeger, do not immediately extend to PSRs • There are natural EM-like algorithms (current work)