230 likes | 357 Views
Intelligent Adaptive Mobile Robots. Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack Kaelbling (PI). Project Overview. Hierarchical POMDP Models Parameter learning; heuristic action-selection (Theocharous)
E N D
Intelligent Adaptive Mobile Robots Georgios Theocharous MIT AI Laboratory with Terran Lane and Leslie Pack Kaelbling (PI) DARPA MARS PI Meeting
Project Overview • Hierarchical POMDP Models • Parameter learning; heuristic action-selection (Theocharous) • Near-optimal action-selection; Hierarchical structure learning; (Theocharous, Murphy) • Near-deterministic abstractions for MDPs (Lane) • Near-deterministic abstractions for POMDPs (Theocharous, Lane) • Visual map building • Optic flow-based navigation (Temizer, Steinkraus) • Automatic spatial decomposition (Temizer) • Selecting visual landmarks (Temizer) • Enormous simulated robotic domain (Steinkraus) • Learning low-level behaviors with human training (Smart) DARPA MARS PI Meeting
Today’s talk • Hierarchical POMDP Models • Parameter learning; heuristic action-selection (Theocharous) • Near-optimal action-selection; Hierarchical structure learning; (Theocharous, Murphy) • Near-deterministic abstractions for MDPs (Lane) • Near-deterministic abstractions for POMDPs (Theocharous, Lane) • Visual map building • Optic flow-based navigation (Temizer, Steinkraus) • Automatic spatial decomposition (Temizer) • Selecting visual landmarks (Temizer) • Enormous simulated robotic domain (Steinkraus) • Learning low-level behaviors with human training (Smart) DARPA MARS PI Meeting
The Problem • How to sequentially select actions in a very large uncertain domain? • Markov decision processes are a good formalization for uncertain planning • Optimization algorithms for MDPs are polynomial in the size of the state space • which is exponential in the number of state variables!! DARPA MARS PI Meeting
Abstraction and Decomposition • Our only hope is to divide and conquer • state abstraction: treat sets of states as if they were the same • state decomposition: solve restricted problems in sub-parts of the state space • action abstraction: treat sequences of actions as if they were atomic • teleological abstraction: goal based abstraction DARPA MARS PI Meeting
vertical transitions + ACTIONS entry states abstract states exit states horizontal transitions product states, which generate observations Hierarchical POMDPs DARPA MARS PI Meeting
Representing Spatial Environments as HPOMDPs States The approach was easy to transfer from the MSU Engineering Building to the 7th floor of the AI lab Vertical transitions Horizontal transitions Obser. models DARPA MARS PI Meeting
Hierarchical Learning and Planning in Partially Observable Markov Decision Processes for Robot Navigation(Theocharous Ph.D. Thesis MSU 2002) • Derived the HPOMDP model and learning/planning algorithms • Learning HPOMDPs for robot navigation • Exact and approximate algorithms provide: • Faster convergence/ Less time per iteration/ Better learned models/ Better robot localization /Ability to infer structure • Planning with HPOMDPs for robot navigation • Spatial and temporal abstraction • Robot computes plans faster /Able to get to locations in the environment starting from unknown location/ Performance scales gracefully to large scale environments DARPA MARS PI Meeting
POMDP macro-actions can be formulated as policy graphs (Theocharous, Terran) • POMDP-macro-actions can be represented as policy graphs with some termination condition Termination condition DARPA MARS PI Meeting
Reward and transition models of macro-actions • Expected sum of discounted reward for executing controller C from hidden state sand memory state xuntil termination. • In a similar manner we can • compute the transition models corr else F x1 x2 S7 S1 S2 S3 S6 S5 DARPA MARS PI Meeting
Planning and Execution • Planning: (Solve it as an MDP) • States : The hidden states • Actions: Pairs of controller, memory states (C,x) • Reward and transition models (as defined before) • Execution: • 1-step look ahead (transition models allow us to predict the belief state far into the future) • Take the actions that maximizes the immediate reward from current belief states plus discounted value of next belief state DARPA MARS PI Meeting
else corr F Robot navigation experiments All O F All O L Robot successfully navigates from uniform initial beliefs to goal locations All O R DARPA MARS PI Meeting
Structure Learning & DBN Representations(Theocharous, Murphy) • We are investigating methods for structure learning of HHMM/HPOMDPs (hierarchy depth, number of states, reusable substructures) • Bayesian model selection methods • Approaches for learning compositional hierarchies (e.g., recurrent neural networks, sparse hierarchical n-grams) • Other Language acquisition approaches (e.g., Carl de Marcken) • DBN representation of HPOMDPs • Linear time training • Extensions to online-training • Factorial HPOMDPs • Sampling techniques for fast inference DARPA MARS PI Meeting
Near Determinism(Lane) • Some common action abstractions • put it in the bag • go to the conference room • take out the trash • What’s important about them? • even if the world is highly stochastic, • you can very nearly guarantee their success • Encapsulate uncertainty at the lower level of abstraction DARPA MARS PI Meeting
Example domain • 10 Floors • ~1800 locations per floor • 45 mail drops per floor • Limited battery • 11 actions • Total: |S|>2500 states DARPA MARS PI Meeting
y x A simple example • State space: • X • Y • b (reached goal) • Actions: • N, S, E, W • Transition function: • Noise, walls • Rewards: • -/step until b =1 • 0 thereafter DARPA MARS PI Meeting
Macros deliver single packages • Macro is a plan over a restricted state space • Defines how to achieve one goal from any <x,y> location • Terminates at any goal • Can be found quickly • Encapsulates uncertainty Goal b2 DARPA MARS PI Meeting
Combining Macros • Formally: solve semi-MDP over {b}k • Gets all macro interactions & probs right • Still exponential, though… • These macros are close to deterministic • Low prob. of delivering wrong package • Macros form graph over {b1 … bk} • Reduce SMDP to graph optimization problem DARPA MARS PI Meeting
Solve deterministic graph problem DARPA MARS PI Meeting
But does it work? • Yes! (Well, in simulation, anyway…) • Small, randomly generated scenarios • Up to ~60k states ( 6 packages) • Optimal solution directly • 5.8% error on avg • Larger scenarios, based on one-floor building model • Up to ~255 states (~45 packages) • Can’t get optimal solution • 600 trajectories; no macro failures DARPA MARS PI Meeting
Near Determinism in POMDPs(Terran, Theocharous) • Current research project: near-deterministic abstraction in POMDPs • macros map belief states to actions • choose macros that reliably achieve subsets of belief states • “dovetailing” DARPA MARS PI Meeting
Really Big Domain(Steinkraus) DARPA MARS PI Meeting
Working in Huge Domains • Continually remap the huge problem to smaller subproblems of current import • Decompose along lines of utility function; recombine solutions • Track belief state approximately; devote more bits to currently important aspects of the problem, and those predicted to be important in the future • Adjust belief representation dynamically DARPA MARS PI Meeting