330 likes | 353 Views
Explore the concepts of rationality, algorithms, and bounded optimality in defining intelligence in AI. Learn about different agent architectures and their implications on decision-making.
E N D
Intelligence • Turing’s definition of intelligence as human mimicry is nonconstructive, informal • AI needs a formal, constructive, broad definition of intelligence – call it Int – that supports analysis and synthesis • “Look, my system is Int !!!” • Is this claim interesting? • Is this claim ever true? • What research do we do on Int?
What things might be Int ? Agents! • Agents perceive O and act A in environment E • An agent function f : O∗ → A specifies an act for any percept sequence • Global measure V(f ,E) evaluates fin E
Int1 : perfect rationality • Agent fopt is perfectly rational: fopt= argmaxf V(f ,E) i.e., the best possible behaviour • Logical AI: always act to achieve the goal • Economics, modern AI, stochastic optimal control, OR: act to maximize expected utility
Int1 : perfect rationality • Agent fopt is perfectly rational: fopt= argmaxf V(f ,E) i.e., the best possible behaviour • Logical AI: always act to achieve the goal • Economics, modern AI, stochastic optimal control, OR: act to maximize expected utility • Interesting? Yes, I’d like one of those • True? Almost never – requires instant computation!! • Research? Global→local constraint; definability under uncertainty, multiagent settings; evolution, etc
Machines and programs • Agent is a machine Mrunning a program p • This defines an agent function f = Agent(p,M)
Int2 : calculative rationality • Agent program p calculates fopt(o1,..,ot), which would have been the perfectly rational action had it been done immediately • Agent(p,M) = foptwhen Mis infinitely fast
Int2 : calculative rationality • Agent program p calculates fopt(o1,..,ot), which would have been the perfectly rational action had it been done immediately • Agent(p,M) = foptwhen Mis infinitely fast • Interesting? Not as such! Hopeless in the real world • True? Yes, algorithms known for many settings covers much of classical and modern AI • Research? Algorithms for planning, MDPs, POMDPs, etc.; KR to encode suitable models
Life: play and win in 20,000,000,000,000 moves • 100 years x 365 days x 24 hrs x 3600 seconds x 640 muscles x 10/second = 20 trillion actions • (Not to mention choosing brain activities!) • And the world has a very large state space • Continuous variables, partially observable, unknown
Int3 : metalevel rationality • Agent(p,M) is metalevelly rational if it controls its computations optimally (I. J. Good’s Type II) • View computations as actions; agent chooses the ones providing maximal utility
Int3 : metalevel rationality • Agent(p,M) is metalevelly rational if it controls its computations optimally (I. J. Good’s Type II) • View computations as actions; agent chooses the ones providing maximal utility • Interesting? Yes – super-efficient anytime approximation algorithms!! • True? Never! Metalevel decisions are even harder • Also somewhat ill-defined: • Program can only “control” some aspects of its overall activity • Metalevel chooses a computation that picks the optimal action! • Research? Halfway decent metalevel control
Nearly rational metareasoning • Do the Right Thinking: • Computations are actions • Cost=time Benefit=better decisions • Value ≈ benefit minus cost • General agent program: • Repeat until no computation has positive value: • Do the best computation • Do the current best action • Myopic approximation fairly effective in practice • 10-100 times more efficient than alpha-beta
The role of algorithms • Metareasoning should replace devious algorithms • The brain must have built-in mechanisms for this • We still need a good “architecture” so that high-value computation steps are available to be selected
Int4 : bounded optimality • Agent(popt,M) is bounded-optimal iff popt= argmaxpV(Agent(p,M),E) • i.e., the best program given M.
Int4 : bounded optimality • Agent(popt,M) is bounded-optimal iff popt = argmaxpV(Agent(p,M),E) • i.e., the best program given M. • Interesting? Yes, I’d like one of those • True? Yes, exists by definition! May be hard to find • Research? Towards a theory of agent architectures
Int4 : bounded optimality • Agent(popt,M) is bounded-optimal iff popt = argmaxpV(Agent(p,M),E) • i.e., the best program given M. • Interesting? Yes, I’d like one of those • True? Yes, exists by definition! May be hard to find • Research? Towards a theory of agent architectures • Bounded optimality is a reasonable substitute for the informal notion of intelligence in the definition of AI
Bounded optimality contd. • Imposes nonlocal constraints on action: Optimize over programs at design time, not actions at runtime • Actions are not rational or irrational by themselves • BO agents make mistakes; fixing those mistakes can only make things worse • BO agents may not “know” they are BO, so may be in constant flux • Related ideas appear in other literatures • Rule utilitarianism vs. act utilitarianism • Dennett’s “Moral First-Aid Manual” • Game-theoretic results on bounded-memory equilibria
A simple parametric example • Tetris agent • Depth-1 search with value estimate Û • Ûhas fixed runtime (e.g., NN) 1. No time limit • ⇒ standard RL converges∗ to BO agent 2. Time limit for whole game (e.g., chess) • Same “cycle time” but different BO agent • RL in SMDP converges∗ to BO agent • Feedback mechanism design is crucial
Composite systems I • E: Letters arrive at random times • M: Runs one or more neural networks • Can design popt: a sequence of networks • δ, ε -learned networks ⇒ δ′, ε′ -BO agent
Asymptotic bounded optimality • Strict bounded optimality is too fragile • pis asymptotically bounded-optimal (ABO) iff ∃k EεEV(Agent(p, kM),E) ≥ V(Agent(popt,M),E) I.e., speeding up Mby k compensates for p’s inefficiency • ABO generalizes standard O() complexity • Here the algorithms choose how long to run and what to “return”
Composite Systems II • Suppose programs can be constructed easily for fixed deadlines • Let pibe ABO for a fixed deadline at t = 2iε • Construct the following universal program pU • pUis ABO for any deadline distribution; as good as knowing the deadline in advance.
Composite systems III • Use universal program + internal scheduling to build complex anytime systems with function composition, anytime conditionals, loops, subroutines, etc. • Need a more “organic” notion of composition • We must explore the space of agent architectures, proving ABO dominance theorems
Good old-fashioned modern AI • Standard methodology: Start with calculative rationality, then cut corners • But much of human cognitive structure is about coping with boundedness: • compilation • hybrid decision methods (e.g., deliberative + reactive) • anticipatory computation • focus of attention • forming specific goals in context • hierarchical decisions (Go to PTAI = 3,000,000,000 actions)
Starting from the other end • Several “forces” affect agent design: • Need optimal decisions • Need instantaneous decisions • Need continual adaptation • Complex architectures have several adaptation mechanisms: • Reinforcement learning (object- and meta-level) • Model learning • Compilation
Recent developments • Rigorous formal framework for metareasoning • Joint computational/physical decision process • Spectacular results from bandit-theoretic metalevel heuristics in Monte Carlo tree search for Go • Currently replacing bandit theory by selection theory • General RL in arbitrary agent architectures • Partial program defines machine M; completion is p • Distributed RL process converges* to popt • In principle, this approach can do metalevel RL and construct highly nontrivial BO agents
Metalevel RL in partial programs Alisp partial program defines a space of agent programs (the “fixed architecture”); RL converges to the bounded-optimal solution This gets interesting when the agent program is allowed to do some deliberating: ... (defun select-by-tree-search (nodes n) (for i from 1 to n do ... (setq leaf-to-expand (choose nodes)) ...
Recent developments contd. • Formal angelic semantics for high-level actions • Effects defined by (outer envelope of) reachable set • Provably sound high-level plans for long timescales • Real-time agents using hierarchical lookahead • Potential for metalevel control: • Deliberating over high-level choices is very valuable • Focusing on urgent detailed steps is also a good idea
Conclusions • Brains cause minds • Effective metareasoning is feasible, in the brain too • Away with algorithms (eventually) • Bounded optimality: • Fits intuitive idea of Intelligence • A bridge between theory and practice • Interesting prospects for AI
Conclusions Computational limitations • Brains cause minds • Effective metareasoning is feasible, in the brain too • Away with algorithms (eventually) • Bounded optimality: • Fits intuitive idea of Intelligence • A bridge between theory and practice • Interesting prospects for AI
Thank you!L'orateur est soutenu par, et cette présentation est donnée sous les auspices de, la Chaire Internationale de Recherche Blaise Pascal financée par l'État et la Région Île de France, gérée par la Fondation de l'École Normale Supérieure.Questions?