1 / 27

Adaptive Intelligent Mobile Robots

Adaptive Intelligent Mobile Robots. Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT. Two projects. Making reinforcement learning work on real robots Solving huge problems dynamic problem reformulation explicit uncertainty management. Reinforcement learning.

Antony
Download Presentation

Adaptive Intelligent Mobile Robots

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT

  2. Two projects • Making reinforcement learning work on real robots • Solving huge problems • dynamic problem reformulation • explicit uncertainty management

  3. Reinforcement learning • given a connection to the environment • find a behavior that maximizes long-run reinforcement Environment Observation Reinf Action

  4. Why reinforcement learning? • Unknown or changing environments • Easier for human to provide reinforcement function than whole behavior

  5. Q-Learning • Learn to choose actions because of their long-term consequences • Given experience: • Given a state s , take the action a that maximizes

  6. Does it Work? • Yes and no. • Successes in simulated domains: backgammon, elevator scheduling • Successes in manufacturing and juggling with strong constraints • No strong successes in more general online robotic learning

  7. Why is RL on robots hard? • Need fast, robust supervised learning • Continuous input and action spaces • Q-learning slow to propagate values • Need strong exploration bias

  8. Making RL on robots easier • Need fast, robust supervised learning • locally weighted regression • Continuous input and action spaces • search and caching of optimal action • Q-learning slow to propagate values • model-based acceleration • Need strong exploration bias • start with human-supplied policy

  9. Start with human-provided policy action Human Policy Environment state

  10. Do supervised policy learning Human Policy action Train Policy s a Environment state

  11. When the policy is learned, let it drive Human Policy Train action Policy Environment state

  12. Q-Learning Train action Policy RL s Q-Value v a D Environment state

  13. Acting based on Q values s Q-Value max index a1 Q-Value a2 a Q-Value an

  14. s Q-Value v a Letting the Q-learner drive Train Policy action RL max D Environment state

  15. s Q-Value v a Train policy with max Q values Train action Policy RL max s’ D Environment state

  16. Add model learning Train action Policy RL s Q-Value v a Train s s Model a r D Environment state

  17. When model is good, train Q with it Train action Policy RL s Q-Value v a Train s’ Model a’ D Environment state

  18. Other forms of human knowledge • hard safety constraints on action choices • partial models or constraints on models • value estimates or value orderings on states

  19. We will have succeeded if • It takes less human effort and total development time to • provide prior knowledge • run and tune the learning algorithm • than to • write and debug the program without learning

  20. Test domain • Indoor mobile-robot navigation and delivery tasks • quick adaptation to new buildings • quick adaptation to sensor change or failure • quick incorporation of human information

  21. Solving huge problems • We have lots of good techniques for small-to-medium sized problems • reinforcement learning • probabilistic planning • Bayesian inference • Rather than scale them to tackle huge problems directly, formulate right-sized problems on the fly

  22. Dynamic problem reformulation working memory perception action

  23. Reformulation strategy • Dynamically swap variables in and out of working memory • constant sized problem always tractable • adapt to changing situations, goals, etc • Given more time pressure, decrease problem size • Given less time pressure, increase problem size

  24. Multiple-resolution plans Fine view of near-term high-probability events Coarse view of distant low-probability events

  25. Information gathering • Explicit models of the robot’s uncertainty allow information gathering actions • drive to top of hill for better view • open a door to see what’s inside • ask a human for guidance Two miles up this road Where is the supply depot?

  26. Explicit uncertainty modeling • POMDP work gives us theoretical understanding • Derive practical solutions from • learning explicit memorization policies • approximating optimal control

  27. Huge-domain experiments • Simulation of very complex task environment • large number of buildings and other geographical structures • concurrent, competing tasks such as • surveillance • supply delivery • self-preservation • other agents from whom information can be gathered

More Related